[Day 180] From Kaggle to BigQuery dimension tables - an end2end pipeline
Hello :) Today is Day 180! A quick summary of today: finished data modelling in dbt set up PROD in dbt set up automatic dbt job runs in mage created an end to end pipeline All code from today is on my github repo . 1. Settling down on a data model in dbt I went over a few different today, but I ended up with the above one. Because all my data is coming from 1 source I felt like, in order to avoid redundancy - I just decided to have dim_loans as the main table, and then have dim_borrower which includes just info about the borrower and dim_date just about the loan issue date. I also added data description fields, and some tests. The below pics are taken from the dbt generated documentation: dim_borrower dim_date dim_loans (image is truncated as there are many fields) 2. Setting up a PROD environment for dbt After I finally settled on a data modelling architecture, I created a PROD env to run all the models in a job. Not seen in the pic, but there is an 'API trigger' button which...