Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

[Open Source] whisper.unity - free speech to text running on your machine

Discussion in 'Assets and Asset Store' started by Macoron, Apr 12, 2023.

  1. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    whisper.unity


    Several month ago OpenAI released powerful audio speech recognition (asr) model called Whisper. Code and weights are under MIT license. I used another open source implementation called whisper.cpp and moved it to Unity.

    Main features:
    • Multilanguage, supports around 60 languages
    • Can do transcription from one language to another. For example transcribe German audio to English text.
    • Works faster than realtime. On my Mac it transcribes 11 seconds audio in 220 ms
    • Runs on local user machine without Internet connection
    • Free and open source, can be used in commercial projects
    Feel free to use it in your projects:

    https://github.com/Macoron/whisper.unity
     
  2. Gord10

    Gord10

    Joined:
    Mar 27, 2013
    Posts:
    142
    I implemented this into my new project. It works great, thanks for this!

    I couldn't get this work for IL2CPP, though, I had to use Mono. (Unity 2022.3.0)
     
  3. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Great, nice to hear that you used it for your project. For what platform did you have problem with IL2CPP? It should be supported.
     
  4. Gord10

    Gord10

    Joined:
    Mar 27, 2013
    Posts:
    142
    I get following errors in player (64-bits Windows build)

     
  5. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    It was a bug from recent changes, thanks for reporting this! I've just updated github repository, so il2cpp should compile correctly again.
     
  6. Gord10

    Gord10

    Joined:
    Mar 27, 2013
    Posts:
    142
    Great! Windows, Mac (tested only for Silicon) and Linux IL2CPP builds work perfectly, now, thanks for the fix.
     
    Macoron likes this.
  7. Warfighter789

    Warfighter789

    Joined:
    Apr 23, 2018
    Posts:
    4
    Hi there, is this asset compatible with VR devices such as Meta Quest 2? I attempted to integrate it into my project but encountered an error. Thanks in advance.

    Error Unity NotSupportedException: IL2CPP doesn't allow marshaling delegates that reference instance methods to native code. The method we're trying to marshal is: Whisper.Native.whisper_progress_callback::Invoke.
     
  8. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Check messages above. This error should be fixed by recent update.

    Btw, I didn't test in Oculus Quest 2, but really interested to see how fast it works. Please write back.

    Edit: Make sure you use lastest-latest with this update https://github.com/Macoron/whisper.unity/pull/41
     
    Last edited: Jul 6, 2023
  9. jlmarc33

    jlmarc33

    Joined:
    Apr 20, 2014
    Posts:
    3
    Hi,
    Unfortunately, I have an initialization error with Unity 2022.3 LTS concerning libwhisper.dll (DllNotFoundException: libwhisper assembly)

    Whisper_U2022_LTS.JPG

    Any advice will be welcome to allow a compilation with this version.
     
  10. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Hard to say what might be a problem here. Could you please create issue here?
    https://github.com/Macoron/whisper.unity/issues

    Wild guess - can you try to install il2cpp build support?
     
  11. Warfighter789

    Warfighter789

    Joined:
    Apr 23, 2018
    Posts:
    4
    The latest update fixed the issue, thank you! The speed is quite fast. I noticed that the Speech to Voice isn't as accurate anymore, is that normal?
     
  12. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    What do you mean Speech to Voice isn't accurate anymore? Do you have bad transcription results?

    If that the the case, what language do you use?
     
  13. Warfighter789

    Warfighter789

    Joined:
    Apr 23, 2018
    Posts:
    4
    Yeah, my transcription results are having problems. I've noticed that it's not picking up my voice as accurately as before. Sometimes when I say something, it comes out differently in the transcript. I'm using English.
     
  14. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Well, you can try to use older release. The latest master uses whisper.cpp 1.4.2 which may works different from 1.2.2. I also noticed some changes, but not sure if it's better or worse.
    https://github.com/Macoron/whisper.unity/releases/tag/1.1.1

    If you are using English, I highly recommend you to switch to `whisper.tiny.en` or `whisper.base.en` models. They are much better in English transcription. I personally use `whisper.small.en`, but they might be too heavy for quest.
     
  15. Warfighter789

    Warfighter789

    Joined:
    Apr 23, 2018
    Posts:
    4
    I will try it out, thank you so much for your help.
     
  16. jlmarc33

    jlmarc33

    Joined:
    Apr 20, 2014
    Posts:
    3
    I tested Whisper.unity successfully on my Windows 11 laptop PC without any issues.
    I used Unity 2021.3.9 and the latest 2022.3.4 LTS.
    So, my initialization problem with Unity 2022.3.0 seems to be related only to my specific desktop PC configuration... (Windows 10 with security restrictions).
     
  17. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    The error shows that Unity failed to load libwhisper.dll. This usually because it missing some dependencies. I'm not an expert in this, but the only dependencies that I can think of are il2cpp build support or visual c++ runtime package.

    Sadly no proper way to debug it, but in some rare cases you can check Unity Editor log file and see if it has some errors messages that doesn't show in editor log window.
     
  18. Bullybolton

    Bullybolton

    Joined:
    Apr 13, 2022
    Posts:
    2
    @Warfighter789 what did you do to test on Quest 2? I've just built the microphone sample scene onto quest and it was very slow.
     
  19. Spellbook

    Spellbook

    Joined:
    May 21, 2015
    Posts:
    29
    We're working on an audio-only AI-driven MMO and this answered our #2 "must solve" problem: A device-local speech recognition system that doesn't depend on deployment platform or external services.

    This is something I've worked towards for years and it has effectively been impossible unless you're Google, Apple or Amazon... I don't think people quite realize how revolutionary this stuff is yet.

    Just some notes -- I tried it using the larger speech models (120mb, 500mb) and did not see a significant improvement in recognition, but did have much slower processing times. The "tiny" model works well enough for production and is the fastest. We're using the English-only model but offering an option to load the multi-language model with translation to English.

    Thank you for putting this together and releasing it free.
     
  20. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Agree, I believe people will soon start using locally running models (STT, TTS, LLM, etc) and we will see a lot of new applications for games that were impossible several years ago.

    I personally using "small.en" for my projects. I find it more robust and it doesn't hallucinate often. On modern PC it should be close to real-time. I wouldn't recommend to use "tiny" for multi-language, but it depends on your use-case.
     
  21. Spellbook

    Spellbook

    Joined:
    May 21, 2015
    Posts:
    29
    One issue I've run into is sampling a short audio clip returns 0 segments. Using push-to-talk, someone might quickly say "Yes" and the clip is 1 or 2 seconds long.

    The WhisperWrapper line "var n = WhisperNative.whisper_full_n_segments(_whisperCtx);" returns 0, finding no segments.

    I assume this is probably a limitation of the Whisper internals? I wanted to ask before I artificially append a few seconds to the end of audio clips as a hack solution.
     
  22. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Yeah, I checked whisper.cpp source code. Looks like they just don't process anything smaller than 1 second. And it doesn't work really stable for audio lesser than 2 seconds.

    Adding silence in the beginning and end should fix that.
     
  23. Utopien

    Utopien

    Joined:
    Feb 15, 2016
    Posts:
    46
    yeah , escuse my english i french sorry if question is a noob one i cant find how to translate a text from language to another exept of couse for the bool translateToEnglish, i want to translate all speech what ever the language in french any help would be highly appreciated


    thanks for this great paquage !
     
    Last edited: Jul 16, 2023
  24. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Find Whisper Manager on your scene and there find "Language" field. Write "fr" language code and make sure that "Translate To English" is disabled. Now any speech on any language will be translated to French text.

    Keep in mind, that it doesn't work as well as English translation and you will probably need bigger model than "tiny". With smaller models it will probably be just gibberish.

    Screenshot 2023-07-17 at 09.55.41.png
     
  25. Utopien

    Utopien

    Joined:
    Feb 15, 2016
    Posts:
    46
    thanks so much i thought language was for the spocken not text i fell so stupid ^^ it dosen't sound gibberish i am just the expert on being stuck on stupid things ^^ i look for bigger model than Whisper/ggml-tiny.bin or make the translation from english to X with another paquage thanks again .........
     
  26. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Quick update: whisper.unity updated to 1.2.0 version! Biggest changes are prompting and streaming support.
    For more information, check release notes in Github repository.
     
  27. Sammyueru1

    Sammyueru1

    Joined:
    Jul 20, 2020
    Posts:
    45
    This is amazing I've always wanted to see something like this
     
  28. Strategos

    Strategos

    Joined:
    Aug 24, 2012
    Posts:
    255
    Hey this is working great for me in editor but when i do an android build It dies thusly


    09-11 23:43:30.505 1781 2132 I Unity : Trying to load Whisper model from buffer...
    09-11 23:43:30.553 1781 1817 E Unity : DllNotFoundException: __Internal assembly:<unknown assembly> type:<unknown type> member:(null)
    09-11 23:43:30.553 1781 1817 E Unity : at (wrapper managed-to-native) Whisper.Native.WhisperNative.whisper_init_from_buffer(intptr,uintptr)
    09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromBuffer (System.Byte[] buffer) [0x00054] in <82e321693d1448d4ae1fba9fa7e11c76>:0
    09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper+<>c__DisplayClass27_0.<InitFromBufferAsync>b__0 () [0x00000] in <82e321693d1448d4ae1fba9fa7e11c76>:0
    09-11 23:43:30.553 1781 1817 E Unity : at System.Threading.Tasks.Task`1[TResult].InnerInvoke () [0x0000f] in <0bfb382d99114c52bcae2561abca6423>:0
    09-11 23:43:30.553 1781 1817 E Unity : at System.Threading.Tasks.Task.Execute () [0x00000] in <0bfb382d99114c52bcae2561abca6423>:0
    09-11 23:43:30.553 1781 1817 E Unity : --- End of stack trace from previous location where exception was thrown ---
    09-11 23:43:30.553 1781 1817 E Unity :
    09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromBufferAsync (System.Byte[] buffer) [0x0007d] in <82e321693d1448d4ae1fba9fa7e11c76>:0
    09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromFileAsync (System.String modelPath) [0x000c1] in <82e321693d1448d4ae1fba9fa7e11c76>:0
    09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperManager.InitModel () [0x000bd]
    09-11 23:43:35.290 1781 1817 E Unity : Whisper model isn't loaded! Init Whisper model first!
     
  29. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    For what device are you building? Which version of Unity? Please also check that your Player Settings has IL2CPP Scripting Backend and you are building for ARM64 architecture.
     
  30. Strategos

    Strategos

    Joined:
    Aug 24, 2012
    Posts:
    255
    Thanks I will check these things and report back.
     
  31. epl-matt

    epl-matt

    Joined:
    Sep 9, 2021
    Posts:
    13
    Is there a way to add custom words? There are some words that I need to use but it doesn't pick them up as that word ever.
     
  32. epl-matt

    epl-matt

    Joined:
    Sep 9, 2021
    Posts:
    13
    Also, @Macoron I tried installing the package from the package manager and kept getting an error about OnRecordStop delegate not being found. Removed it and copied the package com.whisper.unity package folder in the downloaded git zip download and no error.
     
  33. Macoron

    Macoron

    Joined:
    Mar 11, 2017
    Posts:
    31
    Yeah, try to use "Initial Prompt" field in WhisperManager. Just type your words there, it should try to transcribe them better. You might need bigger network than "tiny" for that.

    Weird, what is your Unity version? I just checked it on my 2021.3.3f1 and it works fine.
     
  34. Strategos

    Strategos

    Joined:
    Aug 24, 2012
    Posts:
    255
    This worked btw thankyou