Posts

Showing posts from June 9, 2024

[Day 160] Simple data engineering pipeline with Prefect, and... MLOps with mage.ai (tons of problems)

Image
 Hello :) Today is Day 160! A quick summary of today: simple data engineering pipeline with prefect tons of trouble learning about orchestration with mage.ai After yesterday's journey with prefect the youtube algorithm recommended me another tutorial for prefect  - this time for creating data pipelines with prefect. So I decided to give it a go.  What is data engineering? data scientists can do data engineering, but in specific cases where the two jobs cannot or are not needed to be separate data engineers build databases, they build lots of data pipelines and manage infrastructure (also care about cost, security) What are data pipelines? ETL(ELT)/batch pipelines that move data from A to B databases, APIs, files streaming pipelines - as data comes in, we consume that data and send it wherever it needs to go  message queues, polled data The main github repo used is here . After some basic setup, when we run 'pipeline' in the terminal which runs the main.py file: In prefect w