[Day 185] Using prefect as my orchestrator for my MLOps project

 Hello :)
Today is Day 185!


A quick summary of today:
  • creating a flow for model training
  • creating a flow for uploading data to GCS


Well, I decided to go with Prefect as my orchestrator. I have little experience using it and I liked it, so went with it. 

All the code from today is on my repo.

As seen from the pic, I ran jobs many times and have many successes and failures. 

The first flow (as they are called in prefect) I created is for model training. 

This is an example visualisation of the flow from one of the runs:

It contains 4 tasks:

  • read data (not from GCS, I need to make it read from GCS)
  • split data
  • get best params for the model from mlflow
  • train the model, and produce reports on it
The reports are quite nice as we can use markdown and python f-strings together to make them:
Model report:
MLflow info:
Deploy instructions:
Instead of going to mlflow to get info, I tried to make it a bit easier and clearer on how to get a model, and also showing basic performance stats. 

I am using Prefect's latest (and recently) released version 3.x. So to deploy a flow - make it available to be ran from the UI there is a .serve method, and I spent some time figuring that out, but finally it works fine. 

Next, I created a flow to upload the data for the model to GCS. 
First I needed to create a prefect 'block' (like env variables) for my GCP credentials in the UI, and a block to connect to my GCS bucket. Then, creating the flow is quite short:
Here I faced an issue. The flow deployment (.serve()) method comes with Prefect 3.x. However, to upload to GCS, another library is used - prefect-gcp which uses an older version of Prefect (2.19.7) and I cannot deploy this workflow, but I guess it is fine for now because this data does not need to be re-uploaded multiple times to GCS. Nevertheless, I mentioned it in Prefect's community discord so hopefully it gets picked up and fixed. 

Today, I spent some time figuring out how to run the prefect UI from docker, but ultimately I failed. Maybe I will tackle it again tomorrow.


That is all for today!

See you tomorrow :) 

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project