Home Artists Posts Import Register

Content

I made a few days ago a quick update to my TextAudioTool on github. In v0.2 I added a config file that lets you tweak some of the setup for speech to text and text to speech in VAM:

- Speech rate for offline voices - for some voices for some people the default rate was too fast and asked about this. The value TTS_LOCAL_SPEECH_RATE is the words per minute speed of the voice. By default until now it was 200. I set it now to 170, but it can be customized from the file

- Speech time limit - initially I had this hardcoded to 15 seconds. I figured that would be enough for a chat message. And also promote shorter exchanges, since the AI can focus only on so much text. But it can be changed now from STT_TIME_LIMIT to higher or lower as preferred.

- Voice recognition language - with this is now possible technically to do foreign languages chat in Alive, given that you have AI models that are language specific. I haven't explored that space, I've only seen chinese models, but probably there are many more. Until now the voice recognition was forced to english. In the config you can change STT_LANGUAGE to a different language. You can check the note in the config or go here for the available languages and how good whisper (the AI model used for speech recognition) is at each of them. I only tested japanese briefly in my demo that I did last month

- Voice recognition quality - in v0.1 I used the the smallest and fastest AI model for speech recognition. It's super fast but it's not the best. There are 4 bigger models that are better at recognizing words. They're considerably slower though. I added that as an option too in the config as STT_MODEL for people that want better speech recognition. If you change that value, the next time you run TextAudioTool it will take a bit to load because it will download the new model. The larger ones are GBs worth (and again, very slow)


For people out of the loop, Text Audio Tool is a tool I built a while ago to allow VAM to do speech to text and text to speech. I shared it as a free 3rd party tool rather than force it into Alive to help others that might need it for different purposes, and for transparency too, because it's code that runs outside of vam. When running in the background it can be used by Alive to speak chat messages and for you to speak to it back instead of typing in the chat.

To update from v0.1 to v0.2 you can delete the old SPQR.TextAudioTool folder and replace it with the one from the latest zip. Or more quickly,  to skip the script installing everything again, you can just replace SPQR.TextAudioTool.script.py & config.py and everything else can stay the same.

MORE INFO & INSTRUCTIONS
TextAudioTool on github

FILES
SPQR.TextAudioTool.0.2.zip

Files

Comments

Saint66

Wow, that was lightfast, the speed adjustment is another great addition!

SPQRAeternum

ah no, that's just for speech recognition. If you speek in french, to transcribe french words. Has nothing to do with elevenlabs TTS, I haven't looked at that yet. That part is directly in Alive, it will have a setting in the UI if it's possible to change that stuff

Anonymous

Hey this is awesome, keep up the amazing work! I've got everything working (ooba, texaudiotool, elevenlabs etc.) however my person's mouth/lips aren't moving? Any ideas? edit: Nevermind! I just had to delete and add a new person 🤣

SPQRAeternum

Thank you! Yeah, the audio comes from a person only if a person is selected. If there isn't one, it comes as a general audio, like narration. That's more of a quick & general thing for now to force audio to work in any circumstance but it should change in the future so that chat stuff will be linked to specific person atoms. Right now it's a disconnected between actual atoms and the character system I'm trying to build for them, they're not linked yet

Smumblepooch

Hi! I've been going through and getting everything set up, oob model with good interaction, elevenlabs API, everything is green in my services when in VAM with Alive running, decently powerful PC so running both isn't an issue. I've been going through your guides but I'm missing something. How do I get my 'android' to have a conversation using my local oobabooga instance instead of being a virtual assistant? It identifies as the android identity, not my glib ooba-based one. Also in general - wow. I'm pretty excited to see this evolve, have my character auto-react to parts of the conversation, have them generate SD images via API based on the LLM output, etc. But is what I'm trying to accomplish currently possible, conversing with my own LLM via the android/avatar?

SPQRAeternum

Hi, stuff set in the web ui for OB like characters should not work with Alive. The OB API connects to the LLM directly. That's like the main functionality, characters are just some frontend stuff done by the web ui, alive is like an alternative web ui in a way, it's not a continuation of the OB web ui, it's not linked to it. In v52 just released that characters part should be more intuitive and easy to use. The idea is to add a description of the character in settings, and an example chat if possible. The chat AI in alive, with OB enabled, will be with whatever AI Model you have selected in OB. "Virtual assistant" in Alive is another layer that lets the AI do certain things or answer in very specific ways, like setting up rules. You can say turn around for example. "Virtual assistant" is the term I used for that component that handles those kind of actions. If you mean that the AI says stuff like "I'm a virtual assistant AI", that depend on the AI model and what you write in the description of the character. Models like llama2 will default towards an AI assistant personality if there's not a lot of clear description going that sets a different personality