Posts

Showing posts from May 1, 2024

[Day 121] Uncovering the full reason behing multicollinearity + Frequent itemset mining lecture

Image
 Hello :) Today is Day 121! A quick summary of today: finally  uncovered the full story behind multicollinearity covered lecture 2 from Stanford's CS246: Mining massive datasets:  Frequent itemset mining Whenever I read books, or see blogs, ask an LLM, what happens when there is multicollinearity - the same generic answer is given: multicollinearity causes the estimated parameters to be unreliable. I always wanted to ask why? and had a ton of questions that follow. Well today I finally 'got my hands dirty' and got into the depths of it. Not only does it feel awesome to uncover the truth, but also I am seeing some of the math concepts that I learned about - eigenvalues/vectors, SVD, matrix's rank, determinant and condition number actually being used.  I wrote my notes, which I shared on r/learnmachinelearning  to get opinions and feedback. But also I put it all in a colab/ kaggle notebook . Here is the gist Building up to multicollinearity Eigenvalues and Eigenvectors: w