[Day 172] Learning about terraform + adding more data to the Glaswegian audio dataset

 Hello :)
Today is Day 172!


A quick summary of today:
  • preprocessed and loaded more audio into the Glaswegian audio dataset on huggingface
  • learned about Terraform from Data Eng Zoomcamp


As for the Glaswegian dataset

Preprocessing took a bit longer because now I was cutting multiple >5min audio clips and matching to the transcription that my Scottish collaborator had written down. I actually did only 4 out of the 6 so I have 2 more to do from Limmy - a famous Scottish comedian.
The total time at the moment is 63 minutes.

I started fine-tuning whisper-small again, but turns out the colab subscription I got (the cheapest one) includes a limited amount of computer units. So right now I am finetuning it on the free limited amount. It says it takes about 4 hours... but hopefully it finishes before the free TPU hours run out. 

As for Terraform

Continuing yesterday's videos. The final part of Module 1 from the DataTalksClub's data engineering camp is an intro to terraform. Fortunately, I have met terraform before - at the Microsoft Azure hackathon, but then I had no idea what I was running, I was just running it. So today it got (a little but) clearer.  

First, I created a service account in GCP and give it access

Create a manage key

Create a main.tf file and use google provider to set up terraform (using terraform init)

After initialising terraform with my service account.

Next is creating a cloud storage bucket resource (adding the below to main.tf) (and then executing terraform plan)

First I needed to set my GOOGLE_APPLICATION_CREDENTIALS env var and then it worked.

Next, to create the bucket, I executed terraform apply, and then I see the created storage in GCP

We can destroy the created bucket with terraform destroy.
The main.tf file is on my repo.


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[Day 107] Transforming natural language to charts

[Day 54] I became a backprop ninja! (woohoo)