Posts

Showing posts from June 30, 2024

[Day 181] Lending club data engineering project - Done

Image
 Hello :) Today is Day 181! A quick summary of today: completed and documented my data engineering project Everything is on my github repo , but below I will provide an overview. A diagram overview of the tech used: Raw Lending Club data from Kaggle Mage is used to orchestrate an end to end process including: extract data using kaggle's API and load it to the Google Cloud Storage (used as a data lake) create tables in BigQuery (used as a data warehouse) run dbt transformation jobs Terraform is used to manage and provision the infrastructure needed for the data pipeline on Google Cloud Platform dbt is used to transform the data into dimension tables, add data tests, and create data documentation Looker is used to create a visualisation dashboard For the dbt documentation, I was using dbt cloud IDE for the development, but to deploy a docs I needed to get its files, so the easiest way was to sync and run dbt locally. Setting up dbt to sync with local files was not hard, and this gave