[Day 185] Using prefect as my orchestrator for my MLOps project
Hello :)
Today is Day 185!
A quick summary of today:- creating a flow for model training
- creating a flow for uploading data to GCS
Well, I decided to go with Prefect as my orchestrator. I have little experience using it and I liked it, so went with it.
All the code from today is on my repo.As seen from the pic, I ran jobs many times and have many successes and failures.
The first flow (as they are called in prefect) I created is for model training.
This is an example visualisation of the flow from one of the runs:
It contains 4 tasks:- read data (not from GCS, I need to make it read from GCS)
- split data
- get best params for the model from mlflow
- train the model, and produce reports on it
The reports are quite nice as we can use markdown and python f-strings together to make them:
Model report:
Instead of going to mlflow to get info, I tried to make it a bit easier and clearer on how to get a model, and also showing basic performance stats.
I am using Prefect's latest (and recently) released version 3.x. So to deploy a flow - make it available to be ran from the UI there is a .serve method, and I spent some time figuring that out, but finally it works fine.
Next, I created a flow to upload the data for the model to GCS.
I am using Prefect's latest (and recently) released version 3.x. So to deploy a flow - make it available to be ran from the UI there is a .serve method, and I spent some time figuring that out, but finally it works fine.
Next, I created a flow to upload the data for the model to GCS.
First I needed to create a prefect 'block' (like env variables) for my GCP credentials in the UI, and a block to connect to my GCS bucket. Then, creating the flow is quite short:
Here I faced an issue. The flow deployment (.serve()) method comes with Prefect 3.x. However, to upload to GCS, another library is used - prefect-gcp which uses an older version of Prefect (2.19.7) and I cannot deploy this workflow, but I guess it is fine for now because this data does not need to be re-uploaded multiple times to GCS. Nevertheless, I mentioned it in Prefect's community discord so hopefully it gets picked up and fixed.
Today, I spent some time figuring out how to run the prefect UI from docker, but ultimately I failed. Maybe I will tackle it again tomorrow.
That is all for today!
See you tomorrow :)