Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

RT-Voice - Run-time text-to-speech solution

Discussion in 'Assets and Asset Store' started by Stefan-Laubenberger, Jul 10, 2015.

  1. m4a44

    m4a44

    Joined:
    Mar 13, 2013
    Posts:
    45
    Hey, are there any known limitations to how many speeches can be said in a session? I haven't seen anything in any documentation.

    Right now, using Azure as the provider, we are finding that after playing around 150 speeches, RT Voice just stops creating audio (The generated audio file is invalid) until we stop and start playmode again.

    We're also using the Global Caching, and a modified version of the Paralanguage script (if that helps).
     
  2. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    We're not aware of any problems - I also did a quick test and generated 300 speeches with Azure and it worked.

    There are many possibilities why this could happen, like running out of disk pace, too many connections at once, invalid speeches (SSML tags), Azure settings etc.
    To investigate the issue, please provide more details on how exactly you are using our asset.
    It's also important to know the Unity and RT-Voice version.


    Cheers
    Stefan
     
  3. m4a44

    m4a44

    Joined:
    Mar 13, 2013
    Posts:
    45
    Ok, I will setup a test project then to see if I can narrow it down further.

    It takes around 16 minutes before failing (the vast majority of that from speech), and it's through Unity Visual Scripting (over around 14 speech nodes).
    Disk space is fine, only 1 connection at a time, text was properly setup (the speeches worked fine outside of the flow), and there aren't many Azure settings to change (changing the sample rate didn't fix it).
    Happened across 2 separate PCs at around the same line of speech (it is fairly consistent).

    Using Unity version 2021.2.11f1 and RT-Voice version 2022.1.2
     
    Last edited: Jun 9, 2022
  4. m4a44

    m4a44

    Joined:
    Mar 13, 2013
    Posts:
    45
    So, removed some fluff, changed all the speech lines to Lorum Ipsum and I am seeing that after 15 minutes of consecutive speech, it fails (seems to be a time based error, not amount error).
    About 3 UVS nodes, 28 speech clips with about 5 voices (got that number from the Global Cache, which is weird since it should be trying to load a clip that has the same text and voice from the cache, no?).

    Considering our game is narrative heavy, it looks like we managed to hit that seemingly arbitrary limit pretty easily...
     

    Attached Files:

  5. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hello again

    We found a solution to the problem!
    Please send us your invoice via email and we will give you the update.


    So long,
    Stefan
     
    m4a44 likes this.
  6. vamosfa

    vamosfa

    Joined:
    May 15, 2016
    Posts:
    59
    Hi! I just installed the asset, I tried to follow the first video tutorial
    but only writting the 2 lines of code I receive more than 20 errors in the project. But it is worse, despite I commented on the code of RTVoice and compiled, the errors continue in the Console, so I can not continue working in my project.I read all the docs but I do not know how to solve it

    Photos:


    upload_2022-6-14_21-59-17.png
     
  7. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Unfortunately, this video is really old and we should have replaced it by now, but we're really busy with all the stuff that's currently going on. I apologize for the time it consumed!
    However, please import the "Demos.unitypackage" and take a look at "SimpleRTVoiceExample.cs" or you could try the example from chapter 5.2.7 of the documentation:
    https://www.crosstales.com/media/data/assets/rtvoice/RTVoice-doc.pdf

    I hope this helps you further.


    Cheers
    Stefan
     
  8. vamosfa

    vamosfa

    Joined:
    May 15, 2016
    Posts:
    59
    Hi, if this is the case I strongly suggest deleting the video as it can cause problems.
    Ok, I imported the Demos package despite it was not mentioned in the docs, but the 21 errors still persist in my project. It is very strange as I deleted the script of the video and I did not code anything of the asset, but I still have all these 21 errors related to the asset. Never happened to me before.
    Should I delete the full asset and import it again?
     
  9. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Please try that, but as it seems, your version of RT-Voice is quite old. Can you please make sure you are using 2022.1.1?
     
  10. vamosfa

    vamosfa

    Joined:
    May 15, 2016
    Posts:
    59
    I deleted the full asset and imported it again, and the errors disappeared.
    So now, as reading SimpleRTVoiceExample.cs is not enough, do you recommend me to follow this tuto? It is updated?:

     
  11. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hello again

    Unfortunately, it's also not totally accurate - we will produce new videos for all our assets in August 2022.
    The main change is that you have to use "Speaker.Instance.xy" , e.g. Speaker.Instance.Speak.
    If you have any specific questions, don't hesitate to ask.


    Cheers
    Stefan
     
    vamosfa likes this.
  12. PixelShenanigans

    PixelShenanigans

    Joined:
    Aug 19, 2015
    Posts:
    4
    Hi Stefan, I've just started using RTVoice on a project that is going to be using text to speech, and control a character's mouth (Synty Polygon Kids) - I've registered the OnSpeakCurrentPhoneme event and am seeing the phoneme values but am unsure what they represent, since they are mostly unprintable characters (e.g. '\a', ASCII 7, is the first phoneme returned for "hello"). Do you have any documentation on what phoneme set you are returning? I am going to be mapping these to mouth textures that represent the A AEI BMP F L O R SH_K TH U phonemes.
     
    Last edited: Jun 20, 2022
  13. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    First of all, are you planning to use only Windows? Because this is the only platform that supports phonemes in RT-Voice at all.
    To move the mouth on all platforms, we would recommend using SALSA:
    https://assetstore.unity.com/packages/slug/148442?aid=1011lNGT


    So long,
    Stefan
     
  14. PixelShenanigans

    PixelShenanigans

    Joined:
    Aug 19, 2015
    Posts:
    4
    Thanks, Stefan. Yes, on Windows. I am seeing the phoneme events but don't know how to interpret the values (I need to map these to Synty textures representing the A AEI BMP F L O R SH_K TH U phonemes). For example, speaking "hello" fires the following events:

    OnSpeakCurrentPhoneme: 7
    OnSpeakCurrentWord: hello
    OnSpeakCurrentPhoneme: 26
    OnSpeakCurrentPhoneme: 21
    OnSpeakCurrentPhoneme: 31
    OnSpeakCurrentPhoneme: 35

    How do I interpret these values and map them to the A AEI BMP F L O R SH_K TH U phonemes?

    I have tried RTVoice/SALSA and it works ok but it is limited to analyzing the amplitude of the speech, so it's kind of faking the phenomes - I'd like to make use of the data coming out of RTVoice to make the mouth movements look more realistic.
     
  15. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hello again

    Please take a look at the official mapping from Microsoft: https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms717239(v=vs.85)

    We will return the SYM-code from this mapping-table in the next RTV release.


    All the best,
    Stefan
     
  16. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    Hi Stefan,
    I need to use voices from Google TTS and Azure TTS. This is my code to change the provider and select the right voice from this provider:


    Code (CSharp):
    1. if (service == "azure")
    2.         {
    3.             Speaker.Instance.CustomProvider = customVoiceProviderAzure;
    4.         }
    5.         else if (service == "google")
    6.         {
    7.             Speaker.Instance.CustomProvider = customVoiceProviderGoogle;
    8.         }
    9. Debug.Log("service name:" + Speaker.Instance.CustomProvider);
    10.  
    11. Debug.Log("Real voice from name: " + Speaker.Instance.VoiceForName("it-IT-DiegoNeural"));
    the first debug confirms that I changed the provider, but the second one can't find the voice and return this warning:

    No voice for name 'it-IT-DiegoNeural' found! Speaking with the default voice

    Where am I wrong??

    Thanks
    massimo
     
  17. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi Massimo

    The name of the voice is different for Azure and Google ('it-IT-DiegoNeural' is for Azure, the name for Google is something else).
    Furthermore, you have to wait for the callback "OnVoicesReady" before accessing any voices of a provider.

    I hope this helps you further.


    Cheers
    Stefan
     
  18. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    Thank you Stefan, I add the callback and now all is perfect!
     
    Stefan-Laubenberger likes this.
  19. Ward101

    Ward101

    Joined:
    Mar 22, 2016
    Posts:
    52
    Hi! Newbe question. Working with RT-Voice on Windows (and with Windows voices), Spanish and English.
    How can I emphasize a question? Let´s say ... Y write a question, but the voice speaks as a normal sentence (no emphasis). DO I have to use any special tag? I need just a hint on where to find ...
     
  20. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    The solution is SSML, you find more details in the SSML.txt:
    upload_2022-8-5_17-59-30.png

    It works with tags, like this:
    This is <emphasis level="strong">stronger</emphasis> than the rest.

    I hope this helps you further.


    Cheers
    Stefan
     
  21. Ward101

    Ward101

    Joined:
    Mar 22, 2016
    Posts:
    52
    Thanks! Will check it later.

    Eduardo
     
  22. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    To optimize CPU I'd like to stop and start emotive SALSA processing depending on if the player is speaking to an avatar or not.
    To stop facial animation, would calling the method SALSA -> TurnOfAll() be the method to use?
    To restart facial animation and lipsync ability, would Salsa -> Initialize() be the way to go?
    Thank you kindly!
    The new Azure improvements look awesome.
     
  23. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    It looks correct, but to be sure, please ask the SALSA guys :)


    Cheers
    Stefan
     
  24. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    OMG, my apology Stefan! Thanks for your reply. :eek:
     
    Stefan-Laubenberger likes this.
  25. Justin_Terry

    Justin_Terry

    Joined:
    Oct 14, 2021
    Posts:
    5
    Hello Stefan,

    I'm currently working on an app for Android and iOS and having issues with RTVoice. The issue affects iOS only, on Android everything seems to be working fine.

    The problem is that the OnSpeakComplete callback is not called when the Speaker reaches the end of the string. However, if I call Silence() then the callback is called. I saw that there was someone else with this issue a while back and since it wasn't clear what version got the fix, I updated my plugin version to 2022.2.0 (originally was using 2021.3.3) but the issue is still there for me.

    I've also tried to workaround this issue by just checking Speaker.Instance.isSpeaking but it also stays set to true after the the Speaker reaches the end of the string.

    Do you have any ideas why this is happening only on iOS and what I can do to resolve the issue?

    Thanks in advance!
     
  26. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Unfortunately, I'm on holiday without a Mac, so we have to try it together;)
    Can you please uncomment line 12 of "RTVoiceIOSBridge.mm" from "Assets\Plugins\crosstales\RTVoice\Libraries\iOS"and build again?
    You should see the following logs in Xcode:
    • didStartSpeechUtterance (after starting a speech)
    • didFinishSpeechUtterance (after finishing the speech)
    • didCancelSpeechUtterance (if you cancel the speech)
    Do you see those messages?


    So long,
    Stefan
     
  27. Justin_Terry

    Justin_Terry

    Joined:
    Oct 14, 2021
    Posts:
    5
    I do see them in XCode.
    I am also seeing willSpeakRangeOfSpeechString and SendMessage: RTVoice not found repeatedly until it finishes speaking.
     
  28. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hmm, this is strange... Is the RTVoice-prefab in the scene and the gameobject called "RTVoice"?
     
  29. Justin_Terry

    Justin_Terry

    Joined:
    Oct 14, 2021
    Posts:
    5
    There is an RTVoice prefab and it is called "RTVoice".
     

    Attached Files:

    Last edited: Aug 25, 2022
  30. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Question, if I am working on Mozilla / Coqui TTS with Windows and I've downloaded ESpeak 64-bit, is it possible to link to it in Unity?
     
  31. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    The message "SendMessage: RTVoice not found" implies that it can't find the gameobject RTVoice.
    The strange thing is, that we haven't changed anything in the sendmessage-part in years and it always worked under iOS...
    What's your Unity-version?
    Furthermore, please try this: remove "RTVoice" from the scene (it will be added automatically) and try it again. Does the message still persist?
     
  32. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    I assume you mean in a WebGL-build?
    To access voices from the browser, you will need an additional plugin called "WebGL Speech Synthesis":
    https://assetstore.unity.com/packages/slug/81861?aid=1011lNGT

    Then use our integration in the "3rd party"-folder.

    Does that help you further?
     
  33. Justin_Terry

    Justin_Terry

    Joined:
    Oct 14, 2021
    Posts:
    5
    If I remove the RTVoice prefab, the app crashes when Speaker.Instance.Speak() is called.

    We're using 2020.3.14
     
  34. maswa

    maswa

    Joined:
    Jul 19, 2021
    Posts:
    21
    Hello @Stefan-Laubenberger, it's my first time using this asset!

    I'm struggling to add voices on windows, it gives me errors, then I tried to use the MaryTTS, I attempted to click on the add button, but nothing happened, I'm only trying to use this one, also, where can I see how to add voices with MaryTTS? In the documentation I only see the link to the page!
    Thanks for the response.

    upload_2022-8-27_16-40-36.png
     
  35. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    This is really perplexing to me and I have to test it in September since I'm currently traveling without a Mac.
     
  36. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Have you extracted the MaryTTS-package from the "Extras"-folder?
    You will need to install your MaryTTS-server (see the documentation in the package) or send us your desired username and invoice via email to get an account for our test-server.


    Cheers
    Stefan
     
  37. Anpu23

    Anpu23

    Joined:
    Sep 4, 2017
    Posts:
    16
    Hay, thank you for a great asset. I love it. I've been using it for visual chatbots with SALSA (think animated ragdolls with input output via AIML files). Only problem I have is the SALSA add on doesn't have the ability to chose voice, I just want to create a public string for the voice voice name and dump it into their add on. But I need to re-write this line to recognize the "voice" variable:
    public void Speak(string speakString)
    {
    uid = Speaker.Instance.SpeakNative(speakString, Speaker.Instance.VoiceForGender(Crosstales.RTVoice.Model.Enum.Gender.FEMALE, "en-GB"), 1.0f);
    }

    so rather then finding the gender "en-GB" it just refers to a voice by name (expecting to tie in Marry TTS for more voice options). Help, and thanks!
     
  38. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Thank you for using our asset!

    Unfortunately, I may not fully understand your question - do you just like to use the name of the voice?
    This would would be like that:
    Code (CSharp):
    1. public void Speak(string speakString)
    2. {
    3.    uid = Speaker.Instance.Speak(speakString, Speaker.Instance.VoiceForName("dfki-poppy"));
    4. }
    It is important that you use the "Speak"-method and not "SpeakNative" - otherwise SALSA won't be able to do the lipsync.

    I hope this helps you further.


    Cheers
    Stefan
     
  39. Anpu23

    Anpu23

    Joined:
    Sep 4, 2017
    Posts:
    16
    Thank you that's exactly what I needed. I truly appreciate it.
    Anpu
     
    Stefan-Laubenberger likes this.
  40. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    If you like our asset/support, please consider leaving a review in the store.
    All the best!
     
  41. Bdelcast

    Bdelcast

    Joined:
    Jul 11, 2014
    Posts:
    23
    Question. Would this be a viable/good solution for a potential Console game? IE xbox - ps - switch? Are there ways to embed the speech engine in the package?
     
  42. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    RT-Voice works natively on XBox, but afaik PS4/5 and the Switch have no installed TTS-engine, which means no offline support from our asset.
    Nevertheless, you could still use one of the various supported online solutions, like MaryTTS, AWS Polly, Azure and Google Speech. The only downside is that the devices have to be connected to the Internet to generate the speeches.
    Or you could use RT-Voice to pre-generate all the audio clips inside the Unity Editor and deliver them directly with the game.
    I hope this helps you further.

    Cheers
    Stefan
     
  43. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    Hi Stefan,
    I need to convert a project from desktop to WebGL. On the desktop all is ok, but when I convert the project fot WebGL I receive a message:
    'Generate' is not supported under WebGL!
    Then I discover that in VoiceProviderGoogle.cs I can't call the method Generate from WebGL to create the wav from Google TTS.
    But I read that RTVoice is compatible with webGL... so, how can I create the audio from the texst without using the generate method??

    Thanks
    Massimo
     
  44. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi Massimo

    You still can call the "Speak"-method but "Generate" isn't supported. The main reason is that WebGL doesn't allow to save files on your machine without additional assets like "WebGL Native File Browser":
    https://assetstore.unity.com/packages/slug/41902?aid=1011lNGT

    However, we will try to find a solution for the future, like a second "Generate"-method that returns the bytes of the audio content and you can do with it whatever you like.
    Would that be helpful?


    Cheers
    Stefan
     
  45. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    Thank you Stefan. I didn't know about WebGL Native File Browser's existence. I'll try to modify the generate method to support it.

    Massimo
     
    Stefan-Laubenberger likes this.
  46. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    Hi Stefan, I try to use the Speak method, sending the audioclip directly to the audiosource of Amplitude Component. In editor mode, all is ok and Amplitude with Salsa works perfectly. BUT when I publish for WebGL I can hear the audio generated but the avatar's lips don't move! I suppose the problem is that I send the audioclip in streaming, and the Ampllitude component seems that doesn't work with streaming audio.
    In other threads, I read that RT-Voice is compatible with Salsa AND Amplitude, but I don't understand what kind of parameters I had to modify to obtain some results.

    Thanks
    Massimo
     
  47. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Which provider are you using under WebGL? Because you have to use one that actually generates audio clips, like MaryTTS.
    You can also take a look at our Amplitude-integration under "Demos".
     
  48. masai2k

    masai2k

    Joined:
    Mar 2, 2013
    Posts:
    45
    I use Google TTS. It generates audioclip because I hear the audio generated, but nothing happens in Amplitude.
     
  49. Sogutng

    Sogutng

    Joined:
    Mar 3, 2022
    Posts:
    6
    Hi, I am using the azure demo of rt voice and I have entered the api key, endpoint and url. But there is no voice generated when I select the voices. The warning said " 'VoiceProviderAzure' needs .NET 4.6 or newer to work! " but I have installed the .net developer pack and selected .NET framework in api compatibility level. How can I fix that, I'm using unity 2021.3.9f, thanks.
     
  50. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,850
    Hi Stefan. The Salsa package does not say anything about accessing OS X Native Audio. It seems RT-Voice on Mac does not come out of a AudioSource. I have my own script that works dandy if I can get the sound from an AudioSource. What is the trick for that on the Mac desktop?