Posts

Showing posts from July 31, 2024

[Day 212] Final Glaswegian TTS model

Image
 Hello :) Today is Day 212! A quick summary of today: creating a simple Glaswegian assistant app After finetuning whisper to get the final version of the Glaswegian ASR model, the next task was to do a final fine-tuning on the T5 speech model to get a final version of the Glaswegian TTS model. Well I did that today.  Here is a link to the model on HuggingFace. And its training results: Now that we have the final 2 hour dataset, I was hoping for better results. Before, the generated audio (while with a little accent) sounded robotic. First thing I had to do was fix the HuggingFace space where the previous version of the glaswegian TTS was running. The issue was related to voice embeddings, and after a quick fix ~ It was up again, and I loaded the latest glaswegian_tts model. Well now, it *does* sound better. There are cases where it is robotic, but there is definitely improvement compared to the previous version. That previous version was trained or around 30 mins of audio, compared to