[Day 165] Starting to use mlflow for my research's model tracking + homework 4 of the MLOps zoomcamp

 Hello :)
Today is Day 165!


A quick summary of today:
  • started using mlflow to track experiments for my research in the lab


Firstly, about experiment tracking with mlflow

I got a feeling that I will run lots of models from now on with different parameters, and hey, I have been learning and using this cool library called mlflow in my studies - this would be a great opportunity to use it in a real scenario. 

So I set it up and just started running models. The training is a bit slow because my GPU is not that strong, but I found ways to get free GPU hours. I also want to find a way to host a simple sqlite db to use for the experiment trackings/artifact store. 

My initial thoughts on sharing code that I will use for my research - I will definitely do it. I do not like that many papers do not share it for one reason or another, but for now I will keep it closer to my chest while I am still replicating base models. 

Secondly, about finishing module 4 and its homework

I think the main new thing I got was about batch vs streaming model deployment. The former involves processing data in large, pre-defined chunks at specific intervals. The model is applied to a batch of data all at once, and the results are produced collectively after processing the entire batch. The latter, involves processing data in real-time as it arrives. The model is applied to each data point (or small batch) as soon as it is available, and the results are produced immediately. There was a nice 1 hour video on streaming deployment using AWS, but I just watched that one for now after yesterday's fiasco with the incurred bill. 

The homework was about converting a jupyter notebook to a script and making it executable by taking argparser params: something like python starter.py --year 2024 --month 3 which runs the taxi duration prediction model on March 2024 NYC taxi data. 

Then setting up an environment with pipenv, and creating a docker image and Dockerfile

Which then allowed me to run the script through docker-run. 
 

Tomorrow, I am going up to Seoul for a 'pseudocon'. The poster is in Korean but the even tldr is ... AI. Tomorrow I will share about the sessions/tutorials.


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project