Posts

Showing posts from June 7, 2024

[Day 158] 50 minutes of audio in the Scottish dataset + exploring Mixture Density Networks in GNNs

Image
 Hello :) Today is Day 158! A quick summary of today: created a better dataset preprocessing 'pipeline' for new audio files read a bit about Mixture density networks and their application Firstly, about the Scottish dataset The latest dataset has ~50 minutes worth of Glaswegian (Scottish) accent clips. Amazing ^^ [ huggingface link ] I also finetuned microsoft's SpeechT5 on this latest data, but I am still getting a bit robotic outputs. I need to play more around with the trainer setup.  As for the 'pipeline' ~  It starts with renaming the audio files (so that we have some kind of tracking), I rename them by adding the preprocess date. Second is a bit longer, so I will just paste the execute function: create_audio_metadata_csv(transcriptions_csv, filenames_df, audio_files_path, output_csv) It takes a transcription csv that has 1 column with transcriptions, a 2nd csv with file names, path to the audio files, and where to output the csv which contains file_name, trans