[Day 210] 118 minutes of Glaswegian accent audio clips

 Hello :)
Today is Day 210!


A quick summary of today:
  • final audio clips preprocessing to reach our audio dataset mark


Final dataset for the glaswegian voice assistant AI (link to HuggingFace). Today I preprocessed the final audios from 2 of Limmy's youtube videos (Limmy accidentaly kills the city and The writer of Saw called Limmy a ...). 

Just an update on how the process goes now ~ 

Since our transcription AI is pretty good (according to my Glaswegian speaking project partner), we pass the full raw audio to our fine-tuned whisper model hosten on HuggingFace spaces. Then the transcript is put into a docs file (where first I check over it for obious mistakes and flag if I see something odd and cannot understand it from re-listening to the audio) and split into sensible (small) bits while listening to the audio, like:

(this is the start from Limmy accidentaly kills the city)

Then using an audio tool, I cut the full audio length into clips according to the cut text, then I match clip name and transcript in excel, then using python I get the clip length and sampling rate. Finally I add static info like gender, age, class, location, speaker id to the data, and finally I get a csv which I upload onto the hugging face dataset along with the cut audio clips.

Next steps are to train a final whisper model, and then a Text-To-Speech model using this final dataset. Maybe tomorrow when we go to breakfast I will leave whisper to fine-tune. 


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[Day 107] Transforming natural language to charts

[Day 54] I became a backprop ninja! (woohoo)