[Day 165] Starting to use mlflow for my research's model tracking + homework 4 of the MLOps zoomcamp
Hello :)
Today is Day 165!
A quick summary of today:- started using mlflow to track experiments for my research in the lab
Firstly, about experiment tracking with mlflow
I got a feeling that I will run lots of models from now on with different parameters, and hey, I have been learning and using this cool library called mlflow in my studies - this would be a great opportunity to use it in a real scenario.
So I set it up and just started running models. The training is a bit slow because my GPU is not that strong, but I found ways to get free GPU hours. I also want to find a way to host a simple sqlite db to use for the experiment trackings/artifact store.
My initial thoughts on sharing code that I will use for my research - I will definitely do it. I do not like that many papers do not share it for one reason or another, but for now I will keep it closer to my chest while I am still replicating base models.
Secondly, about finishing module 4 and its homework
I think the main new thing I got was about batch vs streaming model deployment. The former involves processing data in large, pre-defined chunks at specific intervals. The model is applied to a batch of data all at once, and the results are produced collectively after processing the entire batch. The latter, involves processing data in real-time as it arrives. The model is applied to each data point (or small batch) as soon as it is available, and the results are produced immediately. There was a nice 1 hour video on streaming deployment using AWS, but I just watched that one for now after yesterday's fiasco with the incurred bill.
The homework was about converting a jupyter notebook to a script and making it executable by taking argparser params: something like python starter.py --year 2024 --month 3 which runs the taxi duration prediction model on March 2024 NYC taxi data.
Then setting up an environment with pipenv, and creating a docker image and Dockerfile
Which then allowed me to run the script through docker-run.Tomorrow, I am going up to Seoul for a 'pseudocon'. The poster is in Korean but the even tldr is ... AI. Tomorrow I will share about the sessions/tutorials.
That is all for today!
See you tomorrow :)