Search Unity

easy speech synthesis on a Mac

Discussion in 'macOS' started by JoeStrout, Mar 29, 2018.

  1. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    I needed speech synthesis for a recent project. I started out using Watson's text-to-speech service, but in less than a week I hit the limit of their free tier (10,000 characters). Since I'm on a Mac, I decided to try Apple's speech instead, and I love it. The voice quality is at least as good, if not better; the performance is great, and it's free.

    Here's the code:

    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using UnityEngine.Events;
    5.  
    6. public class AppleSpeechSynth : MonoBehaviour {
    7.      
    8.     public string voice = "Samantha";
    9.     public int outputChannel = 48;
    10.  
    11.     public UnityEvent onStartedSpeaking;
    12.     public UnityEvent onStoppedSpeaking;
    13.  
    14.     System.Diagnostics.Process speechProcess;
    15.     bool wasSpeaking;
    16.  
    17.     void Update() {
    18.         bool isSpeaking = (speechProcess != null && !speechProcess.HasExited);
    19.         if (isSpeaking != wasSpeaking) {
    20.             if (isSpeaking) onStartedSpeaking.Invoke();
    21.             else onStoppedSpeaking.Invoke();
    22.             wasSpeaking = isSpeaking;
    23.         }
    24.     }
    25.  
    26.     public void Speak(string text) {
    27.         string cmdArgs = string.Format("-a {2} -v {0} \"{1}\"", voice, text.Replace("\"", ","), outputChannel);
    28.         speechProcess = System.Diagnostics.Process.Start("/usr/bin/say", cmdArgs);      
    29.     }
    30.  
    31. }
    Just call the Speak method, and bask in the sultry (or manly, as you prefer) sounds of speech.

    Note that I needed the outputChannel parameter in order to redirect the output (through SoundFlower) to QuickTime when recording this demo video. That was a PITA, because then I couldn't hear it while recording... but anyway, if it you have any trouble hearing the speech, do a say -a '?' on the command line, and check that the output channel you have selected is the correct number for "Built-in Output".
     
    stfunity, BigToe, Ryiah and 1 other person like this.
  2. unity_49fCeZpEbCk40A

    unity_49fCeZpEbCk40A

    Joined:
    Jul 12, 2019
    Posts:
    1
    Hi Joe, I want to implement this script in my unity project. How can I do that? should I attach it with my main camera or game object?
     
  3. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    It's just a MonoBehaviour. Attach it to whatever you want.
     
  4. estherifitae

    estherifitae

    Joined:
    Apr 29, 2020
    Posts:
    1
    hello! may i ask how u went about using Watson's service? I was trying out their speech-to-text service, however, i had errors in compiling the script (error under the inspector section).
     
  5. dfarjoun

    dfarjoun

    Joined:
    Aug 6, 2017
    Posts:
    42
    Hi,
    This is VERY interesting!!!!
    How do you implement the voice recognition on the mac?
    Can you use Apple also for the recognition?
     
  6. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    Sorry, I have no idea about that.
     
  7. bricefr

    bricefr

    Joined:
    May 3, 2015
    Posts:
    61
    I tried everything I could, not possible using the native macOS binaries. Also note that for the TTS feature - the say command - there is some legal restrictions...
     
  8. djackson_unity

    djackson_unity

    Joined:
    Feb 8, 2021
    Posts:
    2
    Hey! I'm a beginning coder and I need some text to speech in my (mac based) project. Can you explain what the two UnityEvents:
    Code (CSharp):
    1.     public UnityEvent onStartedSpeaking;
    2.     public UnityEvent onStoppedSpeaking;
    do in the script? Do I need to use them in some way, or just call the Speak method?

    Thank you so much for this code, btw, it's exactly what I needed.
     
  9. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    Those simply provide events other code can use if they need them. If you don't need them, you don't need them.

    If you're not familiar with Unity events and all the cool ways they let you decouple your code, check out this tutorial (old but still applies today).
     
    djackson_unity likes this.
  10. djackson_unity

    djackson_unity

    Joined:
    Feb 8, 2021
    Posts:
    2
    Thanks! I'm not familiar with Unity events so I appreciate the link to the tutorial.
     
    JoeStrout likes this.
  11. paulshaquille

    paulshaquille

    Joined:
    Mar 11, 2020
    Posts:
    2
    Can someone explain what this code is doing exactly?
    I need to implement voice recognition in my (iOS) game and I feel like this actually works but I don't know what it's doing. I've placed in into my game but I also don't know if it's working. Am I supposed to have downloaded something else or this should work no matter what?
     
  12. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    It's just invoking the /usr/bin/say command (a built-in command-line app on macOS) via the shell. This is speech synthesis, not voice recognition. I wouldn't expect it to work on iOS.
     
  13. paulshaquille

    paulshaquille

    Joined:
    Mar 11, 2020
    Posts:
    2
    Thank you. This definitely helped me save some time
     
  14. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    65
    Great stuff, thank you. Missed hearing Fred's voice
     
    JoeStrout likes this.
  15. stfunity

    stfunity

    Joined:
    Sep 9, 2018
    Posts:
    65
    I like this command argument for Say and I wanted to make a second one so I could poll the local MacOS and get a list of all available SpeechSynthesis voices

    I found a command that runs like this in bash terminal and tried to work it into the same format as your say command above but I couldn't figure it out all the way because I wasn't sure if I needed a path declaration like you had at first

    This is the command I want to process as a bash/terminal argument out of Unity C#

    ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'


    I tried setting up

    Code (CSharp):
    1.     public void ListAvailable() {
    2.         string cmdArgs = "ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'";
    3.         speechProcess = System.Diagnostics.Process.Start(cmdArgs);    
    4.     }
    Wasn't sure if I -needed- to add the first part of the other command format from your original Speak function where it says "/usr/bin/say", or if I could just pass one string as a command argument

    I also don't know how to capture the console's response to that, I know it would return text but I am no expert on talking to bash indirectly.
    ----------------------------------------------------------------------------------

    Meanwhile I have another question:
    Is it possible to access some kind of phoneme or viseme stream on the Mac side of SpeechSynthesis or another library for timing of mouth poses on an Avatar? Obviously the speak function engages Synthesis. Not sure if there's kind of timing system exposed, I can see where people have set the speed parameter on the speech on Mac Side so I guess there's something back there

    Discussion threads on this issue are surprisingly scant given the amount of time all of these systems have been coexisting. Thanks so much for any expertise you can offer.

    @ippdev and I are trying to crack this nut so we can get universal speech support
     
  16. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    Hmm, you're trying to execute a compound command — where output of one command is piped into another. I'm not certain that System.Diagnostics.Process.Start can do that.

    An alternative would be to use just the "ls" command (which is actually "/bin/ls") as the first argument to Process.Start, with "/System/Library/Speech/Voices" as the cmdArgs (second argument).

    But then you will need to process the returned text. That is a little tricky, but it is doable; Process.Start returns a Process object, which has a StandardOutput stream you can read from. See these answers for some examples. Once your code is reading the results of the ls command, you can search it yourself for SpeechVoice entries.

    Or, better yet: why are we going to all this work to run `ls` in a shell to get a list of files the hard way? C# has built-in methods to get files in a directory. Just use one of those instead.
     
  17. Matloob

    Matloob

    Joined:
    May 4, 2018
    Posts:
    1
    Hi, is there any way to save the audio generated by the speech synthesis? I'm trying to display the audio generated visually.
     
  18. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    Yes. Type "man say" in Terminal for details.