50+ days of Machine Learning

Posts

Showing posts from May 1, 2024

[Day 121] Uncovering the full reason behing multicollinearity + Frequent itemset mining lecture

5/01/2024 10:14:00 pm

Hello :) Today is Day 121! A quick summary of today: finally uncovered the full story behind multicollinearity covered lecture 2 from Stanford's CS246: Mining massive datasets: Frequent itemset mining Whenever I read books, or see blogs, ask an LLM, what happens when there is multicollinearity - the same generic answer is given: multicollinearity causes the estimated parameters to be unreliable. I always wanted to ask why? and had a ton of questions that follow. Well today I finally 'got my hands dirty' and got into the depths of it. Not only does it feel awesome to uncover the truth, but also I am seeing some of the math concepts that I learned about - eigenvalues/vectors, SVD, matrix's rank, determinant and condition number actually being used. I wrote my notes, which I shared on r/learnmachinelearning to get opinions and feedback. But also I put it all in a colab/ kaggle notebook . Here is the gist Building up to multicollinearity Eigenvalues and Eigenve...