Hello :) Today is Day 182! A quick summary of today: learning about IV, WoE, and finding a best model for an imbalanced insurance fraud imbalanced dataset The time has come to start thinking about the project for MLOps zoomcamp. I was looking around for some interesting dataset related to PD (probability of default) or LGD (loss given default) or EAD (exposure at default), and I found this notebook. Warning - it is fairly long. But inside I saw something that interested me - it talked about WoE and IV. It says that they are good estimators for evaluating features for fraud and similar classification tasks. This website's definition was the most clear. Weight of Evidence (WoE) It is a technique used in credit scoring and predictive modeling to assess the predictive power of independent variables relative to a dependent variable. Originating from the credit risk world, WoE measures the separation between "good" and "bad" customers. Here, "bad" custom...
https://ivanstudyblog.github.io/ Hello :) After 220 days posting here, I am moving my blog to https://ivanstudyblog.github.io/ Please head onto there for the latest days The UI there is a bit more flexible and customisable so I will continue my learning journey there. That is all for today :) See you in the new blog.
Hello :) Today is Day 185! A quick summary of today: creating a flow for model training creating a flow for uploading data to GCS Well, I decided to go with Prefect as my orchestrator. I have little experience using it and I liked it, so went with it. All the code from today is on my repo . As seen from the pic, I ran jobs many times and have many successes and failures. The first flow (as they are called in prefect) I created is for model training. This is an example visualisation of the flow from one of the runs: It contains 4 tasks: read data (not from GCS, I need to make it read from GCS) split data get best params for the model from mlflow train the model, and produce reports on it The reports are quite nice as we can use markdown and python f-strings together to make them: Model report: MLflow info: Deploy instructions: Instead of going to mlflow to get info, I tried to make it a bit easier and clearer on how to get a model, and also showing basic performan...