Search Unity

AudioStreamSpeechWhisper [offline speech recognition system]

Discussion in 'Assets and Asset Store' started by r618, Apr 15, 2023.

  1. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    : offline speech recognition, transcription, translation to English and language detection system based on originally OpenAI's Whisper, using an efficient whisper.cpp implementation running entirely locally on user device

    .either manual or automatic processing based on custom VAD (Voice Activity Detection) over audio stream (can be used in automatic 'open mic' fashion)

    an example running in macOS Editor:



    Please also see latest asset documentation
    Demo builds: Windows x64 | macOS | Linux (x64) | Android/ChromeOS
    Store page: Asset Store page
     
    Last edited: May 31, 2023
  2. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    Initial version - w/14 days new release discount - just went live -
     
  3. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    An update submitted, should be online hopefully shortly:

    mainly fixed models downloads, a VAD bugfix, and added CoreML support for Apple Silicon:

    V 1.4.7 092023 .250k
    - updated to (current) latest [1.4.7] Whisper.NET changes, also from now on following Whisper.NET versioning no.
    - updated model [HF] downloads & error handling
    - added models QuantizationType
    - updated iOS/macOS native libraries which now support CoreML
    - added automatic download of CoreML models
    - fixed VAD detection bug for open mic/continuos processing
    - see updated Docs for more about platform/macOS specific libraries
     
  4. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305

    -~~`
    December '23 SALE `~~-

    - For two weeks - until about ~28th - the asset is 40% OFF

    Don't forget to download demos and enjoy the holidays - \o|
     
  5. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,411
    btw. android demo link is 404
     
  6. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    this was not built, new update should have Android demo link on the store page
     
  7. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    submitted an update:

    V 1.5 012024 >300k
    - changes from latest [1.5] Whisper.net
    - removed custom model loading/usage (this was not being much used and it simplified interface)
    - whisper native library messages can now be logged to Unity Console
    - on Windows/Linux it's now possible to use Clblast/Cublas GPU accelerated versions of whisper libs

    will be reviewed hopefully soon
     
  8. JuanGuzmanH

    JuanGuzmanH

    Joined:
    Feb 8, 2018
    Posts:
    74
    Hi r618! Im searching for a local Speech To Text solution. For some reason, every solution I tried fails at some of my requriments for different reasons and I wonder if your plugin can fit my needs:

    - Target platform is Oclus 2 (Android)
    - Transcriptions should have right punctuation (comas, ...)
    - We can not know previously how long is going to be the audio session to be transcribed (streaming)
    - I need to receive transciptions while player is talking (streaming) during the audio session
    - Even if the user make a silence (unknown seconds) when the user starts talking again the transcription continues.
    - Only when the player click a button to finish the audio session, the Speech To Text processing ends.
    - I need to have under control the transcriptions that are not voice (responses like "background music", etc) Ideally i would prefer not to have them but if there is a way to know previously the possible answers I can just ignore those transcriptions for my needs.
    - At runtime I need to be able to change the language to recognize.

    Im considering your plugin, but since there is no trial period, I feel I had to ask:
    Do you think your package fulfill my needs?
     
  9. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    hi please download demo from the asset's store page and run it on device
    - i've never run it on Oculus though
    - i don't think it can fully and correctly punctuate all transcripts though
    - see mainly VoiceActivityDemo scene - this activates transcription automatically based on detected voice, otherwise it just keeps running... - pay attention to dB threshold parameter (and to all text descriptions in demo scenes..) - that's (the only) user parameter which should/can be changed
    - as for language - whisper detects language automatically with proper model (that is a model which is *not* EN only)
    depending on what you're doing this might work automatically, but for certain usages the language can be set (again, see demos..), each Whisper detection session is independent, so language can be changed at runtime
     
  10. JoergUlrichZilles

    JoergUlrichZilles

    Joined:
    Feb 13, 2021
    Posts:
    2
    hi, I have the following problem:
    Mac OS 13.6, Unity 2023.2.3f1, AudioStreamSpeechWhisper_VoiceActivityDemo, Model LargeV3 throws error after indicating 100% download and downloading Medium whisper model crashes Unity

    Error during / after download of whisper model LargeV3
    ArgumentOutOfRangeException: Length must be >= 0
    Parameter name: length
    Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.CheckConvertArguments[T] (System.Int32 length) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1115)
    Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.ConvertExistingDataToNativeArray[T] (System.Void* dataPointer, System.Int32 length, Unity.Collections.Allocator allocator) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1123)
    UnityEngine.Networking.DownloadHandler.CreateNativeArrayForNativeData (Unity.Collections.NativeArray`1[System.Byte]& data, System.Byte* bytes, System.Int32 length) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:202)
    UnityEngine.Networking.DownloadHandler.InternalGetNativeArray (UnityEngine.Networking.DownloadHandler dh, Unity.Collections.NativeArray`1[System.Byte]& nativeArray) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:183)
    UnityEngine.Networking.DownloadHandlerBuffer.GetNativeData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:239)
    UnityEngine.Networking.DownloadHandler.InternalGetByteArray (UnityEngine.Networking.DownloadHandler dh) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:163)
    UnityEngine.Networking.DownloadHandler.GetData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:72)
    UnityEngine.Networking.DownloadHandler.get_data () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:60)
    AudioStreamSpeechWhisper.AudioStreamSpeechWhisper+<DownloadModelIfNeeded>d__24.MoveNext () (at <1873ea4b4a574b0783dee64b31f79656>:0)
    UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) (at /Users/bokken/build/output/unity/unity/Runtime/Export/Scripting/Coroutines.cs:17)

    Any idea wat went wrong?
     
  11. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    hi thanks for the report !
    I will have to replace UnityWebRequest with something else for large models, apparently

    If I may ask was (one of) Medium model(s) insufficient for recognition ? I recommend using one of those meanwhile, they usually produce good results -

    Thanks, please let me know if this works for you
     
  12. JoergUlrichZilles

    JoergUlrichZilles

    Joined:
    Feb 13, 2021
    Posts:
    2
    ...I forgot to mention that the tinyEN model works well (although slow on my machine MAC with 3,6 GHz 8-Core Intel Core i9)....but I need the precision of the bigger models, since the tiny model has a high WER on German language (which I want to use).....by my own tests the large models work best (tested in python) but since I want to use them in Unity I went for your professional solution.......
    P.S. in comparison to e.g. whisper implementation (experimental) from sentis, the speed of detection on my machine with your asset seems to be very low ...any explanation for that?
     
  13. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    have a look at Medium (non En) if possible, from my experience it works significantly better than Tiny ones
    / also don't forget to use Language code parameter /
    please try to replace macOS bits w/ adequate library from https://github.com/sandrohanea/whisper.net/tree/main/Whisper.net.Runtime.CoreML
    I think the library included in the asset is Universal, but might be lacking this
    ( currently model downloader should download also CoreML `mlmodelc` model and whisper shoudl use it automatically, but I haven't tested this on Intel now I realized )
    Thanks !
     
  14. r618

    r618

    Joined:
    Jan 19, 2009
    Posts:
    1,305
    submitted an update, hope it will be found useful once it's live on the store ~
    demo builds are already updated : -

    ===========================================
    V 1.5.1 042024 >400k

    - replaced UnityWebRequest with HttpClient in order to overcome its max. download size limit
    ( StreamAsync/CopyAsync are used to write/extract download directly to disk, Large models can be now downloaded )

    - Windows and macOS/iOS builds of included whisper libraries built from its 1.5.1 release, additionally
    - macOS/iOS: updated/fixed whisper libraries to use corresponding CoreML model by default
    - whisper logging improved

    - Fixed models re/loading: entering/exiting playmode + editor reloads should now work correctly at all times

    ------------------------------------------------------------------------
    should be also (much) more stable overall esp. in editor

    TT ~ !