Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice
  4. Dismiss Notice

Speech-Recognition for Android-Games

Discussion in 'General Discussion' started by AdmiralFick, Mar 15, 2022.

  1. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    Hi

    I want to make a game with speech-control, so when the user says "left" the player moves to the left, etc.

    Now I fiddled around with Picovoice and it is really cool but expensive as hell.. totally out of my budget.

    Is there any alternative?
     
  2. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    There are tons of alternatives. Vosk, pocketsphinx, mozilla deepspeech...

    None are very good, but yes, there are alternatives.

    There's also Google Speech if you have the cash and the user will always be online (there's an AWS version too, don't know the name).
     
    Last edited: Mar 17, 2022
    AdmiralFick likes this.
  3. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18

    Hey, thanks for the reply :)

    I need definetly one that is free and as fast as possible, as I want speech-control for my player .. I once tried AWS-cloudbased with python and it took maybe 3-5 sec until the result came so this would be inacceptable for a game.

    I simply need my android-game to recognize when the player says "left" or "right"
     
  4. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    Try Vosk and deepspeech.
     
    AdmiralFick likes this.
  5. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    Thanks a lot! :)

    I just checked some out.. I got Vosk running but man, this is inaccurate as hell.. I did not get one single word recognized correctly. .. deepspeach and pocket-sphinx I did not get running yet but I'm on it :)

    thanx so far
     
    RiverExplorer likes this.
  6. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    What model are you using? It has several. Additionally, for Vosk I've found I have better success when I convert the audio to 16-bit rather than floats.
     
    AdmiralFick likes this.
  7. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    I tried
    vosk-model-small-en-us-0.15
    and
    vosk-model-small-de-0.15 ( for german language)

    both cant manage to recognize one single word correctly.
     
    RiverExplorer likes this.
  8. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    Try danzuu, I think there's another of a similar size. They're larger but unless you're willing to make a Kaldi model yourself there's not much you can do.
     
    AdmiralFick likes this.
  9. AidenSamuel

    AidenSamuel

    Joined:
    Mar 18, 2022
    Posts:
    1
    Because of several reasons, voice recognition is not used in games.
    The most important is that voice command processing takes a long time, and anything that slows down the game's rhythm could make it less fun.
    Jeff also said it's suitable for RTS games and some sports games. However, if you want to use this technology in a fast-paced game like a first-person shooter, you'll need a lot of high-level hardware and algorithms to understand what the user is doing.
     
    AdmiralFick likes this.
  10. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,137
    What's your definition of a long time? In my own tests Windows Speech Recognition is at most around one second.

    Skyrim's Dragonborn Speaks Naturally mod left me with the exact opposite impression.
     
    DragonCoder and AdmiralFick like this.
  11. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    yes but I need it :) if picovoice would not be expensive as hell, I wouldn't have a problem :/
     
  12. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    do you have a link for danzuu? I only can find this https://github.com/daanzu/kaldi-active-grammar
     
  13. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    Sorry I spelled it wrong, that's correct. But the full list is here:

    https://alphacephei.com/vosk/models

    You might try a couple different ones (looks like vosk-model-en-us-0.22 is the latest) to see which give you the best results (alongside other speech recognition packages entirely, it's easy to get in a rabbit hole with just one framework).
     
    AdmiralFick likes this.
  14. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,469
    Input are defined by:
    - breadth (direct access to item options)
    - latency (time to response)
    - discreteness (can it hold signal)
    - other stuff not relevant here

    -> Voice have discrete and high breadth but high latency (direct access to many items, but takes time to get effects)
    -> button have continuous low breadth but low latency (fixed number of options, instant signal)

    The question is not if an input is bad, but where it applies, obviously using direction with voice without adjusting gameplay is a nogo if the game is like ninja gaiden (narrow action, low latency, continuous movement), but for inventory (latency irrelevant, big nupmber of items, discrete selection) it's probably the right input (vs pausing the game, going through a list, confirming selection, unpausing the game)
     
  15. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,325
    Doom 3VR used speech recognition for .... weapon switching.

    It wasn't very reliable, but didn't require expensive hardware.

    If you have a C++ programmer available, you could try finding some functional opensource speech to text library and then stuffing it into unity as a C++ plugin. This... is not exactly trivial but can be theoretically doable.

    For example, there's "julius" which is said to support android.
     
    AdmiralFick likes this.
  16. AdmiralFick

    AdmiralFick

    Joined:
    Apr 30, 2020
    Posts:
    18
    This sounds good, but how would I make a plugin? is there a tut on this?

    Edit: ok i just googled :D .. found one, I'll try that :)