[Day 179] Using Docker, Makefile, and starting Data modelling for my Lending club project

 Hello :)
Today is Day 179!


A quick summary of today:
  • continued working on my Lending club data engineering project


Today I added a few cool features, and I learned plenty

1. Introduced docker to the project

Here is the Dockerfile I created

Today I learned more about the docker folder structure and where to copy what, and where things live. At first I was not sure where things go, in which directory should I point my env vars, and where should I copy files. But then I also found that in Docker desktop I can view the files in a running image, so that is how I figured out what and where.

The bash script referenced is here:

And my docker-compose.yml (before I added the volumes, the code I was writing in mage was not persisting, so now I know what happens without volumes)

2. Added a Makefile

I also made a Makefile (using this for the first time). I saw that adding a Makefile is good in the data eng zoomcamp project advices. And is good for reproducability.

These are the options that can be executed

(this make 'interface' is looks so nice)

3. Began thinking and designing my data dimension modelling strategy

I created the below using lucidchart

I do not have natural unique identifiers in my data, so I had to go with surrogate keys. Initially I had a loan dimension table as well, but I felt that I could not create a surrogate key using unique enough columns.  
At the moment the data lineage looks like the above. 
I am adding data documentation:

I have descriptions for other columns too. 

I will look to add some tests as well. 


That is all for today!

See you tomorrow :) 

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[Day 107] Transforming natural language to charts

[Day 54] I became a backprop ninja! (woohoo)