Posts

Showing posts from June 22, 2024

[Day 173] Terraform, GCP, virtual machines, data pipelines

Image
 Hello :) Today is Day 173! A quick summary of today: learned more about terraform and how to set up a GCP VM and connect to it locally used mage for some data engineering pipelines with GCP Last videos from Module 1: terraform variables , GCP set up Turns out there is a bit more of terraform from the data eng zoomcamp, and today I covered it. After learning how to connect to gcp using terraform and create a storage bucket, the first thing today was creating a bigquery dataset Adding the above to main.tf which now looks like: terraform apply, creates a demo_dataset as well Then I learned about variables in terraform Create a variables.tf file and put a variable like: and in main.tf we can directly use the created variables like: Great intro to terraform - being able to define infrastructure code, create resources, and destroy resources. The next part was an instruction on setting up GCP (cloud VM + SSH access) First was creating an ssh key locally And add it to the metadata in GCP'

[Day 172] Learning about terraform + adding more data to the Glaswegian audio dataset

Image
 Hello :) Today is Day 172! A quick summary of today: preprocessed and loaded more audio into the Glaswegian audio dataset on huggingface learned about Terraform from Data Eng Zoomcamp As for the Glaswegian dataset Preprocessing took a bit longer because now I was cutting multiple >5min audio clips and matching to the transcription that my Scottish collaborator had written down. I actually did only 4 out of the 6 so I have 2 more to do from Limmy - a famous Scottish comedian. The total time at the moment is 63 minutes. I started fine-tuning whisper-small again, but turns out the colab subscription I got (the cheapest one) includes a limited amount of computer units. So right now I am finetuning it on the free limited amount. It says it takes about 4 hours... but hopefully it finishes before the free TPU hours run out.  As for Terraform Continuing yesterday's videos. The final part of Module 1 from the DataTalksClub's data engineering camp is an intro to terraform. Fortunatel