Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

RT-Voice - Run-time text-to-speech solution

Discussion in 'Assets and Asset Store' started by Stefan-Laubenberger, Jul 10, 2015.

  1. pavilium

    pavilium

    Joined:
    Oct 5, 2016
    Posts:
    1
  2. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    The TTS from macOS generates audio files which normally end up in Unity. Have you already tried our SALSA demo?


    Cheers
    Stefan
     
  3. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
  4. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,850
    Not interested in Salsa. I have my own solution that is tight and purpose focused. I do not need an entire package catering to a dozen others when I can do it in three functions and about 30 lines of code. What I do need is to find out how to get the sound out from RT Voice. I can turn off every AudioSource on the Mac and RT Voice still comes out. I need to route the voice out to an AudioSource to drive a suite of components. What am I missing here?
     
  5. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    The Speak-method has an AudioSource as second parameter, please try that.
    https://www.crosstales.com/media/da...ass_crosstales_1_1_r_t_voice_1_1_speaker.html
     
  6. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Hey so I'm chiming in on ippdev's question since I'm the other dev on that project.

    Here's the code I'm running, I explicity make a Wrapper with the AudioSource I'm trying to use no matter what platform it's on, and my speak command also explicity passes the same AudioSource. We get readings just fine on Windows but on Mac it escapes the grasp of the same AudioSource, in the same scene.

    Is there anything you can spot that's wrong with this Speak method I call, or any overloads I'm supplying?

    Code (CSharp):
    1. public void Speak( string _ttsOutput ) {
    2.         ttsOutput = _ttsOutput;
    3.         approximateLength = Speaker.Instance.ApproximateSpeechLength(_ttsOutput, speechRate);            
    4.         currentWrapper = new Crosstales.RTVoice.Model.Wrapper(_ttsOutput, SpeakerVoice, speechRate, speechPitch, speechVolume, speechAudioSource, false, "rtvoiceoutput", false);      
    5.         if (!string.IsNullOrEmpty(uid)) {
    6.             Speaker.Instance.Silence(uid);
    7.             isSpeaking = false;
    8.         }      
    9.         if (useNative) {          
    10.             uid = Speaker.Instance.Speak(_ttsOutput, speechAudioSource, SpeakerVoice, false, speechRate);
    11.         } else {
    12.             uid = Speaker.Instance.Speak(_ttsOutput, speechAudioSource, SpeakerVoice, false, speechRate, speechPitch, speechVolume);
    13.         }
    14.     }
    Here's how that hover looks on the Speaker.Instance.Speak call from within Visual Studio 2022
    upload_2022-10-21_10-36-39.png
     
  7. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Sorry for the delayed answer, but I was away a few days without any Internet.
    However, can you please run our demo "01-Speech" under macOS, let a voice speak and see the "Spectrum":
    upload_2022-10-25_21-21-46.png

    Do you see it react to the speech?

    The main difference between Windows and macOS is the underlying audio format, which is "AIFF" for Mac and "WAV" for Windows.
    Could that cause the issue in your lipsync-code?


    Cheers
    Stefan
     
  8. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,850
    It isn't the lip synch code. It is that the sound from RTV is not routed thru any in-Scene AudioSource. I can't pull the audio from the native stream on Mac which I presume is where it is routed, hence cannot sample any bytes. What exactly is going on on the Mac vs PC where it can be easily accessed from the AudioSource assigned. It is the same code but it is being routed elsewhere.
     
  9. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Unfortunately, I currently have no Mac nearby, so it would be very kind of you if you could try it with our demo "01-Speech" and tell me if you see the "Spectrum"-window reacting to the speech. In case you don't see anything there, it would imply that there is indeed no audio routed to Unity. This test would work best in an empty project just to make sure no settings from your current project interfere with our asset.
    Thank you very much!
     
  10. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Okay so we've tested further with your Demo, the Spectrum shows up.

    Went and doublechecked your code vs ours regarding useNative or not. The same sort of methods, Native doesn't pass a specific audio source, non-Native passes an audio source. We've verified we're using the version of the command that passes the audio source to Speaker.Instance.Speak
    upload_2022-10-25_19-48-28.png

    What we're getting is doubled audio now. I changed the middle bool to "true" for speakImmediately in order to get the audio to pass through - so while we're getting the sound amplitude moving our Avatar's mouth - the audio clip is being doubled and glitching over top of itself. If you mute every audio source in the scene, you still get the system audio of the original speech.

    Changing the pitch on our audio source works, but we can still hear the other layer, unaffected, and can't mute or bypass it.

    To be clear, we can get sound out of an audio source, but can't get rid of the Native Audio. Any suggestions?

    Almost got it...
     
  11. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hmm, that's strange. Is the same doubled audio happening in our demo?
    If not, is it possible that you call "Speak" twice?
    Btw, you are creating a "Wrapper" but you don't use it. Maybe get rid of it or pass it instead of the parameters?
     
  12. Boliver0482

    Boliver0482

    Joined:
    Oct 19, 2019
    Posts:
    45
    Hello. I've had a couple users on Huawei M3 tablet running Android. Theses devices don't seem to have any TTS engine installed by default. In this instance the RT-Voice Prefab seems to fail and speaker.Instance.OnVoicesReady is never triggered. Easy fix is for them to install the Goggle TTS engine manually, but would like to capture this scenario and show an appropriate message in app.

    Looking in the documentation, can't see a way to receive any error state from the prefab if it fails to initialise. Is there some way to do so? If not guess best to just wait maybe 10 seconds and if the OnVoicesReady event hasn't triggered handle it myself?

    Just a Huawei thing it seems...

    Many thanks,

    Bob.
     
  13. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi Bob

    You could use the property "Speaker.Instance.AndroidEngine" to determine the current TTS-engine (and if there is any installed).

    I hope this helps you further.


    Cheers
    Stefan
     
    Boliver0482 likes this.
  14. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Definitely not calling it twice. I can call it 0 times, once through Native because I called it once through script, or call it once through script per your directions and get the doubled audio regardless.

    These settings, using a straight wrapper pass to the Speak Method, result in doubled audio:

    Code (CSharp):
    1.     public void Speak( string _ttsOutput ) {
    2.         ttsOutput = _ttsOutput;
    3.         approximateLength = Speaker.Instance.ApproximateSpeechLength(_ttsOutput, speechRate);            
    4.         //currentWrapper = new Crosstales.RTVoice.Model.Wrapper(_ttsOutput, SpeakerVoice, speechRate, speechPitch, speechVolume, speechAudioSource, false, "rtvoiceoutput", false);
    5.         currentWrapper = new Crosstales.RTVoice.Model.Wrapper(_ttsOutput, SpeakerVoice, speechRate, speechPitch, speechVolume, speechAudioSource, true, null, false);
    6.         if (!string.IsNullOrEmpty(uid)) {
    7.             Speaker.Instance.Silence(uid);
    8.             isSpeaking = false;
    9.         }
    10.         if (useNative) {          
    11.             //uid = Speaker.Instance.SpeakNative(_ttsOutput, SpeakerVoice, speechRate, speechPitch, speechVolume);
    12.             uid = Speaker.Instance.SpeakNative(currentWrapper);
    13.         } else {
    14.             //uid = Speaker.Instance.Speak(_ttsOutput, speechAudioSource, SpeakerVoice, false, speechRate, speechPitch, speechVolume);
    15.             uid = Speaker.Instance.Speak(currentWrapper);
    16.         }
    17.     }
    If I change speakImmediately in the wrapper to "false", then I only get Native Audio, can't control it or use it for lipsync. If I set it "true" as above, I get the audio I want, stepped on by the Native Audio I don't want.
     
  15. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    For the record, I debugged when *I* call to speak since I'm running a coroutine, I am definitely only calling for speech once for each set of Text:

    upload_2022-10-26_11-13-51.png
     
  16. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Here's the full chain of events where I send text once to the above method, and RTVoice generates audio, then does speak start, and says there's things with the cache and the dictionary going on. Highlighted the red error but also thought the NOT cached message in the second entry was possibly a lead.

    upload_2022-10-26_12-26-51.png

    I delimit my SpeechCoroutines by whether RTVoice is no longer isSpeaking, so while I get the pacing I expect between spoken blocks of text, yea there's some duplicate read or clip or cache thing happening with the dictionary on Mac.

    Mac Studio M1 Max // Unity 2022.1.20f1
     
  17. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    What's the version of our asset?
     
  18. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    When building for Windows Universal VB Project, an error appears twice:

    Code (CSharp):
    1. Assets\Plugins\crosstales\Common\Scripts\Util\FileHelper.cs(196,75): error CS0234: The type or namespace name 'FB' does not exist in the namespace 'Crosstales' (are you missing an assembly reference?)
    2.  
    (the other error is same except line numbers 196,30)

    I'm investigating now but your advice would be most welcome.
    RT-Voice PRO 2022.2.0
    Unity 2021.3 URP
     
  19. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    It seems you have also used our FileBrowser PRO in the project - is it possible you removed it?
    Also please make sure you are using the latest version of our assets (via "Update" in the asset configuration window).


    Cheers
    Stefan
     
  20. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    Ah, I'll check. This is a new project built from two other projects with the commercial assets imported 'fresh'; one used an unrelated project (DropBoxSync). I haven't purchased FileBrowser Pro.
    I'll update RTVoice now (and look at your FileBrowser product).

    [Edit] FileHelper is included in a fresh RT-Voice PRO import.
    [Edit] Importing an outdated unitypackage (Vimeo SDK) might have caused problems with file updates.
     
    Last edited: Oct 29, 2022
  21. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hmm, then there is most likely a compile define called "CT_FB". Please remove it.
     
    XyrisKenn likes this.
  22. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    There are two other CT_ compile defines. These errors appeared when I tried to make a Universal Windows build.
    A warning I'm missing VB packages for WinUni is also present.

    For WinUni the compiler switchers to ilcpp; the regular Unity build is in Mono and there are no errors.
    WinUni I'm trying just to package my build into an exe so users find it easier to use.
     

    Attached Files:

  23. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    64
    Hey Stefan, forgive the delay - Version is the August 15th Release, 2022.2.0 that has the doubled audio on the M1 Mac Studio // Unity 2022.1.20f1
     
  24. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Are you using the latest version 2022.2.0 of RT-Voice?
     
    XyrisKenn likes this.
  25. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    Yes, 2022.2.0, imported 'fresh' for the new project.
    This is my first VB build though, my experience is in iOS.
     
  26. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    We found the problem and we will release a fix in the next 24h to the store.
     
    XyrisKenn likes this.
  27. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    Awesome! Thank you and Cheers :)
     
  28. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,850
    Wondering if you made any progress on double voicing on Mac M1 Studio issue with the latest release? Yer usually on the money wth replies and this one is holding up a public release on OS X of our application.
     
  29. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    We did some extensive tests on macOS 12.6 with Unity 2021.3.10 and can't confirm that there is any "double-speak".
    You can try it in an new project by testing it via the class "SimpleRTVoiceExample.cs" or our demo "00-Simple_Example". If you need more proof, we made a video to which I can give you access - just send me an email.


    Cheers
    Stefan
     
    stfunity likes this.
  30. kelly_unity282

    kelly_unity282

    Joined:
    Nov 10, 2022
    Posts:
    5
    Hello RT-Voice Team,

    I am currently working on a project that we use RT-Voice to create speeches for our characters. We are building our project in Android, IOS and possibly WebGL. We have male and female characters. We have Chinese speeches too. We set the voice names using VoiceForName as VoiceForGender does not work for Android.

    For example, we are setting our voice names like this:
    Code (CSharp):
    1.  
    2.             for (int i = 0; i < genders.Length; i++)
    3.             {
    4.                 if (genders[i] == Gender.MALE)
    5.                 {
    6.                     voiceNamesForAndroid[i] = "en-gb-x-gbb-local";
    7.                     voiceNamesForIOS[i] = "Daniel";
    8.                 }
    9.                 else
    10.                 {
    11.                     voiceNamesForAndroid[i] = "en-gb-x-gba-local";
    12.                     voiceNamesForIOS[i] = "Martha";
    13.                 }
    14.             }
    15.      
    16.         //unity editor
    17. #if UNITY_EDITOR
    18.         voice = Speaker.Instance.VoiceForGender(genders[currentIndex], lang);
    19. #endif
    20.         //android/iOS
    21. #if UNITY_ANDROID && !UNITY_EDITOR
    22.         Speaker.Instance.AndroidEngine = "com.google.android.tts";
    23.         voice = Speaker.Instance.VoiceForName(voiceNamesForAndroid[currentIndex]);
    24. #endif
    25. #if UNITY_IPHONE && !UNITY_EDITOR
    26.         voice = Speaker.Instance.VoiceForName(voiceNamesForIOS[currentIndex]);
    27. #endif
    28.        
    29.  
    In one case we are setting the voice to be a male voice. However, different Android phones have different results. Some of them worked but some of them cannot find the corrent voice name and return the default voice, which is always female voice. One of them is a samsung phone where no male voice is found.

    I know it may be due to different voice names in different versions of Android. I have tried some other voice providers but some need another plug in and some do not support all platforms or do not support Chinese. May I ask if we have a solution that we can put a set of voices in it instead of finding the voices in the devices?

    Thank you very much!

    Kelly
     
    Last edited: Nov 10, 2022
  31. kelly_unity282

    kelly_unity282

    Joined:
    Nov 10, 2022
    Posts:
    5
    Also, there was a big lag when the first time we hit a character and speak. Any suggestions?

    Thank you so much!

    Kelly
     
  32. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi Kelly

    Android TTS is quite weak compared to other platforms, especially since many vendors are using other engines than "Google". The native API also doesn't provide the gender, so we also don't know it... It's a bit of a mess and most voices are female only.
    You could try to enforce the engine to be from Google, but if the user won't/can't install it (e.g. on some Chinese brands), you may out of luck.
    The only "real" solution would be to ditch the native TTS and use one of the supported services like AWS Polly, Azure or Google - they all deliver the gender information and have a large variety of supported voices and the quality is also superb.

    About the lag: this is a common issue on some (mostly older) devices and can be mitigated by using our prefab "VoiceInitalizer" in the first scene.

    I hope this helps you further.


    Cheers
    Stefan
     
  33. kelly_unity282

    kelly_unity282

    Joined:
    Nov 10, 2022
    Posts:
    5
    Thank you very much Stefan!
     
    Stefan-Laubenberger likes this.
  34. mb13admin

    mb13admin

    Joined:
    May 28, 2017
    Posts:
    22
    Hi Stefan-Laubenberger,
    I think there's a bug with RT Voice on native iOS Arabic language (ar-SA)
    Occurs on our iphone 12 (ios 15.6) and iphone 13 (ios 16.1)
    Does NOT occur on Android
    Does NOT occur on iOS with other languages, only Arabic is wrongly speaking

    Our test word: الشمس (English: the Sun), correct pronunciation: 'alshams'. But on iOS, RT voice speaks it as "Ah" instead, on Android it speaks "alshams" correctly

    We have debugged and seemed that RT Voice retrieved the ios native voice over language package correctly: culture = ar-SA, voice=Maged, gender=MALE which are the same as in the device settings and your documentation

    Could you please have a look at this issue?
    We are wondering whether RTVoice converts the right-to-left Arabic word when it speaks
    However, Hebrew language which is also right-to-left is still correct
     
    Last edited: Nov 17, 2022
  35. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    RT-Voice does nothing with the input string (for any language) and simply delegates it to the native TTS.
    Therefore, I'm pretty sure it's a bug in the iOS TTS itself, especially since Hebrew works.

    Can you please try the voice in in the iOS settings? If it works nevertheless, maybe a simple "string reverse" under iOS would do the trick?


    Cheers
    Stefan
     
  36. mb13admin

    mb13admin

    Joined:
    May 28, 2017
    Posts:
    22
    yes, we DID test the word in iOS settings and it speaks correctly. Only through unity it speaks wrongly
    we will try to reverse the string before input it into the SDK
    Thank you
     
  37. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Please let me know if reversing the string solved the issue. If that's the case, we will implement a "special case" for Arabic under iOS.
    Thank you!
     
  38. mb13admin

    mb13admin

    Joined:
    May 28, 2017
    Posts:
    22
    Hi we found the culprit, the plugin is working normally. But in our code, we reshaped Arabic words via an ArabicSupport script which caused the issue
    Thanks again for your quick response though. Have a nice day :)
     
    Stefan-Laubenberger likes this.
  39. jackyetz

    jackyetz

    Joined:
    Dec 28, 2022
    Posts:
    11
    using UnityEngine;
    using Crosstales.RTVoice;
    public class test : MonoBehaviour
    {
    AudioSource SourceA;
    void Start()
    {
    SourceA = GetComponent<AudioSource>();
    Speaker.Instance.Speak(mytext, SourceA,
    Speaker.Instance.VoiceForCulture("zh"));
    }
    }
    where mytext is a string consisting of english and chinese words. However,
    running it throws a warning of "No voices for culture 'zh' found! Speaking with the default voice!". And only speaking english words.
     
  40. jackyetz

    jackyetz

    Joined:
    Dec 28, 2022
    Posts:
    11
    I m using Unity 2021.3.16.
     
  41. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    You have to wait for the "OnVoicesReady"-callback (as in the example in chapter 5.2.7 of the documentation) and make sure, you have a Chinese voice installed on your system.


    Cheers
    Stefan
     
  42. alexis78963_unity

    alexis78963_unity

    Joined:
    May 9, 2019
    Posts:
    14
    Hello!

    Thank you for your great asset, we really enjoy using it and it saves us quite some work.
    One question, we're using less and less AWS polly and more more this new voice provider: https://beta.elevenlabs.io/
    The voice quality is just a LOT better.
    Do you have any plan on providing integration for this tool?
     
  43. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    We will check it out, but I can't promise anything.
    But you can always extend our "BaseCustomVoiceProvider.cs" (with an example in "VoiceProviderExample.cs") and implement any solution out there.


    Cheers
    Stefan
     
  44. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    Hi! Might you have advice for using RT-Voice with Watson?
     
  45. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    I'm not sure if Watson still provides any Unity integration. This is an older post by our friends from CrazyMinnow:
    http://crazyminnowstudio.com/posts/ibm-watson-and-salsa-lipsync/

    However, we don't have any plans to write an integration for Watson.
    If you are a programmer, you could create your own custom provider, see VoiceProviderExample for more.


    Cheers
    Stefan
     
    XyrisKenn likes this.
  46. jackyetz

    jackyetz

    Joined:
    Dec 28, 2022
    Posts:
    11
    How to speak several sentences one by one?
    There are several sentences. And the speech run is in a asynchronous task. Sadly, the former sentences are not spoken out except the last one. The
    await
    is not allowed to be used with
    Speaker.Instance.Speak
    . I put the pseudo code in following.

    BTW, the three sents cannot be simply concatenated each other, because I have to insert silent interval
    await Task.Delay()
    between them. And the interval length is according to the context.

    Code (CSharp):
    1.  
    2. private async Task WaitRemoteClip()
    3. {
    4. string[] sents = ["This is the first sent.","This is the second sent.", "This is the third sent."];
    5. foreach(string sent in sents)
    6. {
    7. Speaker.Instance.Speak(sent, audioRemote, Speaker.Instance.VoiceForCulture("zh"), true);
    8. }
    9. }
    10.  
     
    Last edited: Feb 22, 2023
  47. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hi

    Have you tried our "Sequencer" under "Extras"?
    If you want to create your own solution, use the callback "OnSpeakComplete" as trigger to move to the next sentence.


    Cheers
    Stefan
     
  48. jackyetz

    jackyetz

    Joined:
    Dec 28, 2022
    Posts:
    11
    Thank Stefan. Your reply is fast. And is there any sample codes using the callback method? That will help guide me a lot.
     
  49. Stefan-Laubenberger

    Stefan-Laubenberger

    Joined:
    May 25, 2014
    Posts:
    1,976
    Hello again

    Please see chapter 5.2.7 of the documentation.


    Cheers
    Stefan
     
  50. XyrisKenn

    XyrisKenn

    Joined:
    Dec 8, 2015
    Posts:
    92
    No worries. I switched to Azure and am trying that out. Many choices for online speech audio providers in RT-Voice [awesome].