Hello :) Today is Day 182! A quick summary of today: learning about IV, WoE, and finding a best model for an imbalanced insurance fraud imbalanced dataset The time has come to start thinking about the project for MLOps zoomcamp. I was looking around for some interesting dataset related to PD (probability of default) or LGD (loss given default) or EAD (exposure at default), and I found this notebook. Warning - it is fairly long. But inside I saw something that interested me - it talked about WoE and IV. It says that they are good estimators for evaluating features for fraud and similar classification tasks. This website's definition was the most clear. Weight of Evidence (WoE) It is a technique used in credit scoring and predictive modeling to assess the predictive power of independent variables relative to a dependent variable. Originating from the credit risk world, WoE measures the separation between "good" and "bad" customers. Here, "bad" custom...
https://ivanstudyblog.github.io/ Hello :) After 220 days posting here, I am moving my blog to https://ivanstudyblog.github.io/ Please head onto there for the latest days The UI there is a bit more flexible and customisable so I will continue my learning journey there. That is all for today :) See you in the new blog.
모험을 시작하기 전에 기초 지식을 복습하고자 했다. 오늘 SPSS 대안 프로그램을 찾아보려고 해서 JASP에 대해 알게 되었다. 유용한 프로그램인 것 같아서 선회귀와 기술통계를 내려고 했는데 재미있었다. 그런데 JASP에 대해 더 알기 전에 '기초 통계 지식을 좀 복습을 하고자 하면 좋을 것 같아'란 생각을 들었다. 다행히, Coursera에서 Stanford University의 Guenther Walther 교수님께서 진행된 Introduction to Statistics 무료 강좌가 있다. 좀 부족한 부분은 다양한 검정통계 하는 거고 (F test, t-test, chi-square 등) 이제 JASP 아니면 다른 통계 프로그램 사용하게 되어도 이런 부분을 좀 더 자세히 집중하여 공부하면 된다. 특히 homoscedasticity 및 heteroscedasticity 개념을 기억에 남았다. Homoscedasticity (선): Definition: In a homoscedastic dataset, the variance of the errors (residuals) is constant across all levels of the independent variable(s). In simpler terms, the spread of the residuals is the same throughout the range of predictor values. Heteroscedasticity (악): Definition: Heteroscedasticity occurs when the variance of the errors is not constant across all levels of the independent variable(s). In other words, the spread of residuals changes as the values of the independe...