[Day 204] Transaction data EDA + MLflow & minIO docker setup
Hello :)
Today is Day 204!
A quick summary of today:- some EDA on the KB AI competition data
- setting up mlflow and minIO
Firstly, about doing some basic cleaning and EDA on my part of the data for the Kukmin Bank project
These are the variables assigned to myself: trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,city,state
No missing/null values.
Box plot of log(amount) in Not Fraud vs Fraud
Distribution of Not Fraud vs FraudOther graphs
Interesting ~ only Fraud transactions in the state of Delaware (of course this is not real data, but interesting nonetheless)
On another note ~ mlflow and minIO
I found this website (in Korean but can be translated) that provides an easy "plug-n-play" Dockerfile and docker-compose serivces code for setting up mlflow, and minIO as an artifact and backend store. I have never used minIO before but from a quick online search (before using it) it seemed like a UI similar to any cloud provider's storage, and is built on AWS so it seems that it is scalable too.
Following the guide I set up the Dockerfile and the services for mlflow-backend-store
mlflow-artifact-storemlflow-serverran it and it works fine. This was the 1st time I saw the UI of minIOIt is empty for now, but reminds me of AWS S3 and GCSFrom the setup, there is already a created bucket as well:
Another thing related to kafka. For some reason the setup I had on this project was using some schema-registry image which was ~1.5GB and taking lots of my space in Docker, and it stopped working, so I switched the services in my docker-compose file with the same as from my transaction-stream-data-pipeline project. At the moment my docker containers are:
That is all for today!
See you tomorrow :)