[Day 183] Failing to install Kubeflow, and setting up mlflow on GCP

 Hello :)
Today is Day 183!


Today is actually past the half-way point in this blog journey. 182 days to go!


A quick summary of today:
  • setting up mlflow for experiment tracking in GCP


Today I started looking at tools I can use for my MLOps zoomcamp project. I used Mage for the data engineering project so now for an orchestrator I want to try another tool.

My first idea (which I had on my mind for a while) is Kubeflow. Tldr - it is so hard to set up locally and I could not find a guide that clearly explains how :( 

I had this 1hr video in my 'to watch' for a bit and it goes over building an ML pipeline in kubeflow however it does not go over how to install from kubeflow from 0 to what is needed in the video. Then I started looking in kubeflow's documentation - I tried to set it up on GCP but I could not and actually was a bit adamant to use it because it explicitly said that the free credits on GCP do not cover kubeflow hosting costs. So ~ I started looking into local deployment. There were 3 different posts (1 of which from IBM) that were from 2023 and all set it up differently from nothing to kubeflow, but ... none of them work :( I was losing hope as I was investing so much time and I could not, not even use it, just install and set up kubeflow...
At some points I got to the part where I have a kubeflow cluster running and can go to localhost:8080 but I could not run a pipeline due to some error. 

I decided to step back a bit, and set up something more basic - mlflow. 
I did it locally, but for this project I wanted to use GCP. 

I created a Compute Engine 
Created a storage bucket for mlflow artifacts
Using SSH I went into the VM and installed mlflow and set up a sqlite db. I decided to use sqlite because:
  1. I will not have that many experiments to keep
  2. creating an SQL db on GCP seems expensive to keep active
Then I tried running a simple experiment locally, but I was getting a weird error from mlflow: Anonymous credentials cannot be refreshed

it seemed related to my GCP credentials so I thought that I needed to add my creds.json file to the VM so that it can access them - did that but it was not the issue.

Turns out, in the jupyter notebook I needed to explicitly point to the creds.json file 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'my-creds.json'
and with that my issue was fixed and now mlflow hosted on GCP worked. 

Though I failed to do anything with kubeflow today, I will not give up. I used Mage for orchestration in the data eng project, so now I will try to use another tool like Prefect (if not Kuberflow). But still, nothing is certain, so I will not be surprised if I write in tomorrow's post that I am using Mage for this project too. Mage is pretty nice ^^

I uploaded the parts of my project to this GitHub repo.


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project