[Day 153] First steps into orchestration and ML pipelines (module 3 from MLOps zoomcamp)

 Hello :)
Today is Day 153!


A quick summary of today:


It is my 26th birthday today, so my study time was pretty limited and studied during my 2 hour bus rider home from Seoul. 


The below is the full outline

I only got to complete 3.1. Data preparation, and below are some pics/notes I took.

So before this, in the MLOps zoomcamp we covered mlflow, experiment tracking and model management. I guess orchestration is the next step which automates the whole data prep, training, testing process (note: I am not sure what else so far). 

The 2024 cohort of the camp, uses mage.ai as a free-to-use platform, so today's and the next steps will be done on its platform. 

First it was setup using docker. 

git clone https://github.com/mage-ai/mlops.git

cd mlops

./scripts/start.sh

And a local mage.ai webapp was started


In the 3.1 part we created a data preparation pipeline, and here is the final version:
It is not hard to use, and fairly easy to set up. Each block is a piece of code. For example the ingest block is:

It downloads taxi data. Then when we create a next block, following it, it seems to automatically chain outputs from previous to inputs of current block. Below is the second 'prepare' block:

And the final 'build' block is where we have the dataset split and data vectorized (using util functions)

I will try to complete the rest of the module in the coming week ^^


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project