Search Unity

Audio Speech recognition with "external services" in Unity

Discussion in 'Audio & Video' started by StormPooper, Apr 17, 2020.

Thread Status:
Not open for further replies.
  1. StormPooper

    StormPooper

    Joined:
    Mar 17, 2019
    Posts:
    2
    Hi, I'm a master's degree student in computer science, and I'm working on a speech recognition application in Unity for my thesis.
    The application is still work in progress: the ideal goal would be to implement a simple game, but the core is a sort of generic "module" (when I say "module" I mean a collection of objects and scripts that can be inserted in a scene to give the opportunity to a programmer to utilize speech recognition in his application); at the moment, I managed to make a simple scene with a few objects that make specific actions if an "order" is recognized (for example "turn on the light" or "crate, make a jump").
    A part of the thesis will be focused on the comparison of different speech recognition services, and here there is my problem: for now, I was only able to use the Unity default library (which by the way should only work on Windows if I'm correct) and IBM Watson; so, now I'm looking for other services/APIs/libraries/SDKs to use in my project.
    I asked to the professor who is helping me with the thesis, and he spoke about these possibilities: Google API, Alexa (I have an Echo Dot at home, so I could use it for testing), Siri (I don't have any Apple device, so I'm not totally sure on how I could use it) and Cortana.
    The problem is that I can't understand how can I integrate those services into my Unity project (I checked a lot of guides and documentation pages, but a lot of them are outdated).
    In conclusion: if someone could give me some hints on how to use those services in my project I would be really grateful (I would also be willing to try other services if you know something that I didn't insert in the list in this post).
     
  2. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,408
    asset store has some ready plugins, for other services need to create it yourself (unless can find elsewhere)..

    making your own simple system might not be too complicated:
    make generic audio recorder in unity,
    save the audio to wav/mp3,
    send audio file to 3rd party text to speech system (online or some external commandline tools),
    read received results back to unity..
     
    StormPooper likes this.
  3. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    you can use all existing respective .net SDKs (if platform provides it) directly in unity if you know how
    You can interact with all available GCP or Azure services directly from unity - e.g. I used GCP's STT and DialogFlow's agents/intents and you can interact with these in streamed fashion (meaning you get even partial results as the person is speaking on the fly)
    good luck !
     
    StormPooper likes this.
  4. FelixRos

    FelixRos

    Joined:
    Oct 15, 2018
    Posts:
    1
    I have been looking too. Right now I think this is the best option for mac: https://assetstore.unity.com/packages/tools/audio/webgl-speech-detection-81076
    It uses chrome's speech API.
    My limitation right now is that it doesn't distinguish very well between longer sentences.

    I want to find a more sophisticated option though. Let me know if you find something that uses either Cortana, Echo or something similar.

    I'll let you know if i find a clever solution out there.
     
  5. BlandBlueberry

    BlandBlueberry

    Joined:
    Apr 30, 2020
    Posts:
    1
    Thank you so much your site has nice content
     
Thread Status:
Not open for further replies.