[Day 210] 118 minutes of Glaswegian accent audio clips

7/29/2024 10:33:00 pm

Hello :)
Today is Day 210!

A quick summary of today:
final audio clips preprocessing to reach our audio dataset mark

Final dataset for the glaswegian voice assistant AI (link to HuggingFace). Today I preprocessed the final audios from 2 of Limmy's youtube videos (Limmy accidentaly kills the city and The writer of Saw called Limmy a ...).

Just an update on how the process goes now ~

Since our transcription AI is pretty good (according to my Glaswegian speaking project partner), we pass the full raw audio to our fine-tuned whisper model hosten on HuggingFace spaces. Then the transcript is put into a docs file (where first I check over it for obious mistakes and flag if I see something odd and cannot understand it from re-listening to the audio) and split into sensible (small) bits while listening to the audio, like:

(this is the start from Limmy accidentaly kills the city)

Then using an audio tool, I cut the full audio length into clips according to the cut text, then I match clip name and transcript in excel, then using python I get the clip length and sampling rate. Finally I add static info like gender, age, class, location, speaker id to the data, and finally I get a csv which I upload onto the hugging face dataset along with the cut audio clips.

Next steps are to train a final whisper model, and then a Text-To-Speech model using this final dataset. Maybe tomorrow when we go to breakfast I will leave whisper to fine-tune.

That is all for today!

See you tomorrow :)

Search This Blog

50+ days of Machine Learning