Posts

Showing posts from August 8, 2024

[Day 220] Chapter 2 The Data Engineering Lifecycle

Image
 Hello :) Today is Day 220! A quick summary of today: read chapter 2 from the book 'Fundamentals of Data Engineering' transferred more posts onto the new blog UI What Is the Data Engineering Lifecycle? The data engineering lifecycle by getting data from source systems and storing it. Next, we transform the data and then proceed to our central goal, serving data to analysts, data scientists, ML engineers, and others. In reality, storage occurs throughout the lifecycle as data flows from beginning to end. There are 5 stages: Generation Storage Ingestion Transformation  Serving data Generation: Source Systems Sources produce data consumed by downstream systems, including humangenerated spreadsheets, IoT sensors, and web and mobile applications. Each source has its unique volume and cadence of data generation. A data engineer should know how the source generates data, including relevant quirks or nuances. Data engineers also need to understand the limits of the source systems they