Search Unity

  1. Unity 2019.1 beta is now available.
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. We're looking for insight from anyone who has experience with game testing to help us better Unity. Take our survey here. If chosen to participate you'll be entered into a sweepstake to win an Amazon gift card.
    Dismiss Notice
  4. Want to provide direct feedback to the Unity team? Join the Unity Advisory Panel.
    Dismiss Notice
  5. Unity 2018.3 is now released.
    Dismiss Notice
  6. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice

Google Cloud Speech to Text in Unity

Discussion in 'AR/VR (XR) Discussion' started by NTNU_VR, Dec 6, 2018.

  1. NTNU_VR


    Dec 6, 2018

    We are currently working with a VR multiplayer project and using Unity version 2018.2.16f1. One of the features we wanted to try out was voice recognition and we quickly found out that we needed to use the Google Cloud speech-to-text product.

    After looking at the assets store, it did not seem like anyone had made what we were specifically looking for. There are two things we need: voice recognition for streaming and support for the Norwegian language. Because of this we decided that we had to make a plugin ourselves. There was a lot of hassle with trying to get our plugin into Unity because of all the .dll files that produced errors, but we were able to solve most of them. However, we encountered a problem that got us really stuck. After resolving issues with the plugins, we tried to play the application in the editor, but it just freezes. According to the Google cloud console a request is made so it seems to be working halfway through. Anyone here who has any experience or idea with why this is happening?
  2. madleen_unity


    Unity Technologies

    Aug 28, 2018
    Thank you for posting!

    From reading the forum post I do not have an idea what is going wrong here for now.

    So here are a few questions:
    Is any firewall blocking a request coming in?
    You say that Google cloud gets the request, so I assume you just do not get a response back? What is your call to which Google service? What is your code that listens for the response?

    Can you also verify if the response is being even send back to your app and just not received or if it fails somewhere in the Google API already?

    If you could please add as many information (sample codes, etc) as possible to this post, I will hopefully be able to help!

    Many Thanks
  3. NTNU_VR


    Dec 6, 2018
    Below is our code for Google Cloud Speech-to-text, it's a modified version of Google's sample code (, the one below). I do not think the request is sent back as the Google Cloud Platform dashboard registers it as a failed attempt, this is most likely because Unity freezes and we are forced to shut it down.

    The firewall should not be an issue since it works outside of Unity. All I'm trying to do in Unity is to run the plugin and print out the result, which should be a boolean (If the user says a certain word).

    Code (CSharp):
    1. using System;
    2. using Google.Cloud.Speech.V1;
    3. using System.Diagnostics;
    4. using System.Threading;
    5. using System.Threading.Tasks;
    7. namespace VR_VoiceRecognition
    8. {
    9.    public class VoiceRecognition
    10.    {
    12.         public bool StartSpeechRecognition()
    13.         {
    14.             bool test = StreamingMicRecognizeAsync(20, "fantastisk").Result;
    15.             return test;
    16.         }
    18.         static async Task<bool> StreamingMicRecognizeAsync(int inputTime, string inputWord)
    19.         {
    20.             bool speechSuccess = false;
    21.             Stopwatch timer = new Stopwatch();
    23.             Task delay = Task.Delay(TimeSpan.FromSeconds(1));
    25.             if (NAudio.Wave.WaveIn.DeviceCount < 1)
    26.             {
    27.                 Console.WriteLine("No microphone!");
    28.                 return false;
    29.             }
    30.             var speech = SpeechClient.Create();
    31.             var streamingCall = speech.StreamingRecognize();
    32.             // Write the initial request with the config.
    33.             await streamingCall.WriteAsync(
    34.                 new StreamingRecognizeRequest()
    35.                 {
    36.                     StreamingConfig = new StreamingRecognitionConfig()
    37.                     {
    38.                         Config = new RecognitionConfig()
    39.                         {
    40.                             Encoding =
    41.                             RecognitionConfig.Types.AudioEncoding.Linear16,
    42.                             SampleRateHertz = 16000,
    43.                             LanguageCode = "nb",
    44.                         },
    45.                         InterimResults = true,
    46.                     }
    47.                 });
    50.             // Compare speech with the input word, finish if they are the same and speechSuccess becomes true.
    51.             Task compareSpeech = Task.Run(async () =>
    52.             {
    53.                 while (await streamingCall.ResponseStream.MoveNext(
    54.                     default(CancellationToken)))
    55.                 {
    56.                     foreach (var result in streamingCall.ResponseStream
    57.                         .Current.Results)
    58.                     {
    59.                         foreach (var alternative in result.Alternatives)
    60.                         {
    61.                             if (alternative.Transcript.Replace(" ", String.Empty).Equals(inputWord, StringComparison.InvariantCultureIgnoreCase))
    62.                             {
    63.                                 speechSuccess = true;
    65.                                 return;
    66.                             }
    68.                         }
    69.                     }
    70.                 }
    71.             });
    73.             // Read from the microphone and stream to API.
    74.             object writeLock = new object();
    75.             bool writeMore = true;
    76.             var waveIn = new NAudio.Wave.WaveInEvent();
    77.             waveIn.DeviceNumber = 0;
    78.             waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
    79.             waveIn.DataAvailable +=
    80.                 (object sender, NAudio.Wave.WaveInEventArgs args) =>
    81.                 {
    82.                     lock (writeLock)
    83.                     {
    84.                         if (!writeMore) return;
    85.                         streamingCall.WriteAsync(
    86.                             new StreamingRecognizeRequest()
    87.                             {
    88.                                 AudioContent = Google.Protobuf.ByteString
    89.                                     .CopyFrom(args.Buffer, 0, args.BytesRecorded)
    90.                             }).Wait();
    91.                     }
    92.                 };
    94.             waveIn.StartRecording();
    95.             timer.Start();
    96.             //Console.WriteLine("Speak now.");
    98.             //Delay continues as long as a match has not been found between speech and inputword or time that has passed since recording is lower than inputTime.
    99.             while (!speechSuccess && timer.Elapsed.TotalSeconds <= inputTime)
    100.             {
    101.                 await delay;
    102.             }
    104.             // Stop recording and shut down.
    105.             waveIn.StopRecording();
    106.             timer.Stop();
    108.             lock (writeLock) writeMore = false;
    110.             await streamingCall.WriteCompleteAsync();
    111.             await compareSpeech;
    113.             //Console.WriteLine("Finished.");
    114.             return speechSuccess;
    115.         }
    116.     }
    117. }
  4. madleen_unity


    Unity Technologies

    Aug 28, 2018
    Sorry for the delayed response, it has been quite hectic here!

    The Unity editor runs on the main thread - awaiting a C# task blocks that thread, which is why the Editor is freezing up in that situation. This is why we encourage people to use coroutines, as it can operate across multiple frames.

    In your code:

    public bool StartSpeechRecognition()
    bool test = StreamingMicRecognizeAsync(20, "fantastisk").Result;
    return test;

    As you are waiting for the task's result in this function, the editor and game become unresponsive as you're waiting for the Task to finish and give you the result. To stop that happening, instead of returning the result of the task in StartSpeechRecognition(), you can return the task object itself.

    For example:

    public Task<bool> StartSpeechRecognition()
    Task<bool> myTask = StreamingMicRecognizeAsync(20, "fantastisk");
    return myTask;

    Then, from the Unity side of the code, in Test.cs, you can use Task.Run(() => StartSpeechRecognition()) on the Task object you received, and in a coroutine you can check every frame to see if the task is completed yet. This seems to stop the editor from freezing up while the task is running.

    However, please be aware that that task also needs to be manually cleaned/exited once it is not needed.
    It's advised in Unity that if you want to use threaded operations or C# tasks, you should keep track of the threads and ensure they are disposed of correctly when the application is shut down.

    I hope this will help!
  5. NTNU_VR


    Dec 6, 2018
    Thank you for the help! We will continue to work with this and see if we can find some sort of solution :)
  6. asotelo94


    Jan 26, 2013
    Where can I find the SDK for this? How do I get the Google.Cloud.Speech.V1 namespace?
  7. NTNU_VR


    Dec 6, 2018

    Google has a guide that you can follow to set up this service: