Search Unity

  1. Unity 2020.2 has been released.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Audio Speech recognition with "external services" in Unity

Discussion in 'Audio & Video' started by StormPooper, Apr 17, 2020.

  1. StormPooper


    Mar 17, 2019
    Hi, I'm a master's degree student in computer science, and I'm working on a speech recognition application in Unity for my thesis.
    The application is still work in progress: the ideal goal would be to implement a simple game, but the core is a sort of generic "module" (when I say "module" I mean a collection of objects and scripts that can be inserted in a scene to give the opportunity to a programmer to utilize speech recognition in his application); at the moment, I managed to make a simple scene with a few objects that make specific actions if an "order" is recognized (for example "turn on the light" or "crate, make a jump").
    A part of the thesis will be focused on the comparison of different speech recognition services, and here there is my problem: for now, I was only able to use the Unity default library (which by the way should only work on Windows if I'm correct) and IBM Watson; so, now I'm looking for other services/APIs/libraries/SDKs to use in my project.
    I asked to the professor who is helping me with the thesis, and he spoke about these possibilities: Google API, Alexa (I have an Echo Dot at home, so I could use it for testing), Siri (I don't have any Apple device, so I'm not totally sure on how I could use it) and Cortana.
    The problem is that I can't understand how can I integrate those services into my Unity project (I checked a lot of guides and documentation pages, but a lot of them are outdated).
    In conclusion: if someone could give me some hints on how to use those services in my project I would be really grateful (I would also be willing to try other services if you know something that I didn't insert in the list in this post).
  2. mgear


    Aug 3, 2010
    asset store has some ready plugins, for other services need to create it yourself (unless can find elsewhere)..

    making your own simple system might not be too complicated:
    make generic audio recorder in unity,
    save the audio to wav/mp3,
    send audio file to 3rd party text to speech system (online or some external commandline tools),
    read received results back to unity..
    StormPooper likes this.
  3. r618


    Jan 19, 2009
    you can use all existing respective .net SDKs (if platform provides it) directly in unity if you know how
    You can interact with all available GCP or Azure services directly from unity - e.g. I used GCP's STT and DialogFlow's agents/intents and you can interact with these in streamed fashion (meaning you get even partial results as the person is speaking on the fly)
    good luck !
    StormPooper likes this.
  4. FelixRos


    Oct 15, 2018
    I have been looking too. Right now I think this is the best option for mac:
    It uses chrome's speech API.
    My limitation right now is that it doesn't distinguish very well between longer sentences.

    I want to find a more sophisticated option though. Let me know if you find something that uses either Cortana, Echo or something similar.

    I'll let you know if i find a clever solution out there.
  5. BlandBlueberry


    Apr 30, 2020
    Thank you so much your site has nice content
  6. acesetm


    Oct 21, 2020