Posts

Showing posts from July 1, 2024

[Day 182] Learning about feature selection in fraud detection and finding a classifier model with low recall

Image
 Hello :) Today is Day 182! A quick summary of today: learning about IV, WoE, and finding a best model for an imbalanced insurance fraud imbalanced dataset The time has come to start thinking about the project for MLOps zoomcamp.  I was looking around for some interesting dataset related to PD (probability of default) or LGD (loss given default) or EAD (exposure at default), and I found this  notebook. Warning - it is fairly long. But inside I saw something that interested me - it talked about WoE and IV. It says that they are good estimators for evaluating features for fraud and similar classification tasks. This website's definition was the most clear. Weight of Evidence (WoE) It is a technique used in credit scoring and predictive modeling to assess the predictive power of independent variables relative to a dependent variable. Originating from the credit risk world, WoE measures the separation between "good" and "bad" customers. Here, "bad" custom