[Day 212] Final Glaswegian TTS model

7/31/2024 07:29:00 pm

Hello :)
Today is Day 212!

A quick summary of today:
creating a simple Glaswegian assistant app

After finetuning whisper to get the final version of the Glaswegian ASR model, the next task was to do a final fine-tuning on the T5 speech model to get a final version of the Glaswegian TTS model. Well I did that today.

Here is a link to the model on HuggingFace. And its training results:

Now that we have the final 2 hour dataset, I was hoping for better results. Before, the generated audio (while with a little accent) sounded robotic. First thing I had to do was fix the HuggingFace space where the previous version of the glaswegian TTS was running. The issue was related to voice embeddings, and after a quick fix ~

It was up again, and I loaded the latest glaswegian_tts model. Well now, it *does* sound better. There are cases where it is robotic, but there is definitely improvement compared to the previous version. That previous version was trained or around 30 mins of audio, compared to now 2 hours.

Next - create a full assistant app

Gradio and HF spaces make it very easy -> here.
Audio input -> transcribed using glaswegian_asr -> send to gpt2 -> answer from gpt2 is turned to speech and returned to the user

At the start I used gpt2, but as its not that good, I switched to using gpt3.5-turbo.

That is all for today!

See you tomorrow :)

Search This Blog

50+ days of Machine Learning