TOOLS: TextAudioTool v0.2 (files) (Patreon)

Published:

2023-08-21 18:02:25

Imported:

2023-08

Content

I made a few days ago a quick update to my TextAudioTool on github. In v0.2 I added a config file that lets you tweak some of the setup for speech to text and text to speech in VAM:

- Speech rate for offline voices - for some voices for some people the default rate was too fast and asked about this. The value TTS_LOCAL_SPEECH_RATE is the words per minute speed of the voice. By default until now it was 200. I set it now to 170, but it can be customized from the file

- Speech time limit - initially I had this hardcoded to 15 seconds. I figured that would be enough for a chat message. And also promote shorter exchanges, since the AI can focus only on so much text. But it can be changed now from STT_TIME_LIMIT to higher or lower as preferred.

- Voice recognition language - with this is now possible technically to do foreign languages chat in Alive, given that you have AI models that are language specific. I haven't explored that space, I've only seen chinese models, but probably there are many more. Until now the voice recognition was forced to english. In the config you can change STT_LANGUAGE to a different language. You can check the note in the config or go here for the available languages and how good whisper (the AI model used for speech recognition) is at each of them. I only tested japanese briefly in my demo that I did last month

- Voice recognition quality - in v0.1 I used the the smallest and fastest AI model for speech recognition. It's super fast but it's not the best. There are 4 bigger models that are better at recognizing words. They're considerably slower though. I added that as an option too in the config as STT_MODEL for people that want better speech recognition. If you change that value, the next time you run TextAudioTool it will take a bit to load because it will download the new model. The larger ones are GBs worth (and again, very slow)

For people out of the loop, Text Audio Tool is a tool I built a while ago to allow VAM to do speech to text and text to speech. I shared it as a free 3rd party tool rather than force it into Alive to help others that might need it for different purposes, and for transparency too, because it's code that runs outside of vam. When running in the background it can be used by Alive to speak chat messages and for you to speak to it back instead of typing in the chat.

To update from v0.1 to v0.2 you can delete the old SPQR.TextAudioTool folder and replace it with the one from the latest zip. Or more quickly, to skip the script installing everything again, you can just replace SPQR.TextAudioTool.script.py & config.py and everything else can stay the same.

MORE INFO & INSTRUCTIONS
TextAudioTool on github

FILES
SPQR.TextAudioTool.0.2.zip

Content

Files