AudioStreamSpeechWhisper [offline speech recognition system]

r618 · May 31, 2023

: offline speech recognition, transcription, translation to English and language detection system based on originally OpenAI's Whisper, using an efficient whisper.cpp implementation running entirely locally on user device

.either manual or automatic processing based on custom VAD (Voice Activity Detection) over audio stream (can be used in automatic 'open mic' fashion)

an example running in macOS Editor:

Please also see latest asset documentation
Demo builds: Windows x64 | macOS | Linux (x64) | Android/ChromeOS
Store page: Asset Store page

r618 · May 31, 2023

Initial version - w/14 days new release discount - just went live -

r618 · Sep 11, 2023

An update submitted, should be online hopefully shortly:

mainly fixed models downloads, a VAD bugfix, and added CoreML support for Apple Silicon:

V 1.4.7 092023 .250k
- updated to (current) latest [1.4.7] Whisper.NET changes, also from now on following Whisper.NET versioning no.
- updated model [HF] downloads & error handling
- added models QuantizationType
- updated iOS/macOS native libraries which now support CoreML
- added automatic download of CoreML models
- fixed VAD detection bug for open mic/continuos processing
- see updated Docs for more about platform/macOS specific libraries

r618 · Dec 14, 2023

-~~` December '23 SALE `~~-

- For two weeks - until about ~28th - the asset is 40% OFF

Don't forget to download demos and enjoy the holidays - \o|

mgear · Dec 14, 2023

btw. android demo link is 404

r618 · Jan 3, 2024

mgear said: ↑

btw. android demo link is 404
Click to expand...

this was not built, new update should have Android demo link on the store page

r618 · Jan 3, 2024

submitted an update:

V 1.5 012024 >300k
- changes from latest [1.5] Whisper.net
- removed custom model loading/usage (this was not being much used and it simplified interface)
- whisper native library messages can now be logged to Unity Console
- on Windows/Linux it's now possible to use Clblast/Cublas GPU accelerated versions of whisper libs

will be reviewed hopefully soon

JuanGuzmanH · Feb 5, 2024

Hi r618! Im searching for a local Speech To Text solution. For some reason, every solution I tried fails at some of my requriments for different reasons and I wonder if your plugin can fit my needs:

- Target platform is Oclus 2 (Android)
- Transcriptions should have right punctuation (comas, ...)
- We can not know previously how long is going to be the audio session to be transcribed (streaming)
- I need to receive transciptions while player is talking (streaming) during the audio session
- Even if the user make a silence (unknown seconds) when the user starts talking again the transcription continues.
- Only when the player click a button to finish the audio session, the Speech To Text processing ends.
- I need to have under control the transcriptions that are not voice (responses like "background music", etc) Ideally i would prefer not to have them but if there is a way to know previously the possible answers I can just ignore those transcriptions for my needs.
- At runtime I need to be able to change the language to recognize.

Im considering your plugin, but since there is no trial period, I feel I had to ask:
Do you think your package fulfill my needs?

r618 · Feb 7, 2024

JuanGuzmanH said: ↑

Hi r618! Im searching for a local Speech To Text solution. For some reason, every solution I tried fails at some of my requriments for different reasons and I wonder if your plugin can fit my needs:

- Target platform is Oclus 2 (Android)
- Transcriptions should have right punctuation (comas, ...)
- We can not know previously how long is going to be the audio session to be transcribed (streaming)
- I need to receive transciptions while player is talking (streaming) during the audio session
- Even if the user make a silence (unknown seconds) when the user starts talking again the transcription continues.
- Only when the player click a button to finish the audio session, the Speech To Text processing ends.
- I need to have under control the transcriptions that are not voice (responses like "background music", etc) Ideally i would prefer not to have them but if there is a way to know previously the possible answers I can just ignore those transcriptions for my needs.
- At runtime I need to be able to change the language to recognize.

Im considering your plugin, but since there is no trial period, I feel I had to ask:
Do you think your package fulfill my needs?
Click to expand...

hi please download demo from the asset's store page and run it on device
- i've never run it on Oculus though
- i don't think it can fully and correctly punctuate all transcripts though
- see mainly VoiceActivityDemo scene - this activates transcription automatically based on detected voice, otherwise it just keeps running... - pay attention to dB threshold parameter (and to all text descriptions in demo scenes..) - that's (the only) user parameter which should/can be changed
- as for language - whisper detects language automatically with proper model (that is a model which is *not* EN only)
depending on what you're doing this might work automatically, but for certain usages the language can be set (again, see demos..), each Whisper detection session is independent, so language can be changed at runtime

JoergUlrichZilles · Feb 28, 2024

hi, I have the following problem:
Mac OS 13.6, Unity 2023.2.3f1, AudioStreamSpeechWhisper_VoiceActivityDemo, Model LargeV3 throws error after indicating 100% download and downloading Medium whisper model crashes Unity

Error during / after download of whisper model LargeV3
ArgumentOutOfRangeException: Length must be >= 0
Parameter name: length
Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.CheckConvertArguments[T] (System.Int32 length) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1115)
Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.ConvertExistingDataToNativeArray[T] (System.Void* dataPointer, System.Int32 length, Unity.Collections.Allocator allocator) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1123)
UnityEngine.Networking.DownloadHandler.CreateNativeArrayForNativeData (Unity.Collections.NativeArray`1[System.Byte]& data, System.Byte* bytes, System.Int32 length) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:202)
UnityEngine.Networking.DownloadHandler.InternalGetNativeArray (UnityEngine.Networking.DownloadHandler dh, Unity.Collections.NativeArray`1[System.Byte]& nativeArray) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:183)
UnityEngine.Networking.DownloadHandlerBuffer.GetNativeData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:239)
UnityEngine.Networking.DownloadHandler.InternalGetByteArray (UnityEngine.Networking.DownloadHandler dh) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:163)
UnityEngine.Networking.DownloadHandler.GetData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:72)
UnityEngine.Networking.DownloadHandler.get_data () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:60)
AudioStreamSpeechWhisper.AudioStreamSpeechWhisper+<DownloadModelIfNeeded>d__24.MoveNext () (at <1873ea4b4a574b0783dee64b31f79656>:0)
UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) (at /Users/bokken/build/output/unity/unity/Runtime/Export/Scripting/Coroutines.cs:17)

Any idea wat went wrong?

r618 · Feb 28, 2024

JoergUlrichZilles said: ↑

hi, I have the following problem:
Mac OS 13.6, Unity 2023.2.3f1, AudioStreamSpeechWhisper_VoiceActivityDemo, Model LargeV3 throws error after indicating 100% download and downloading Medium whisper model crashes Unity

Error during / after download of whisper model LargeV3
ArgumentOutOfRangeException: Length must be >= 0
Parameter name: length
Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.CheckConvertArguments[T] (System.Int32 length) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1115)
Unity.Collections.LowLevel.Unsafe.NativeArrayUnsafeUtility.ConvertExistingDataToNativeArray[T] (System.Void* dataPointer, System.Int32 length, Unity.Collections.Allocator allocator) (at /Users/bokken/build/output/unity/unity/Runtime/Export/NativeArray/NativeArray.cs:1123)
UnityEngine.Networking.DownloadHandler.CreateNativeArrayForNativeData (Unity.Collections.NativeArray`1[System.Byte]& data, System.Byte* bytes, System.Int32 length) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:202)
UnityEngine.Networking.DownloadHandler.InternalGetNativeArray (UnityEngine.Networking.DownloadHandler dh, Unity.Collections.NativeArray`1[System.Byte]& nativeArray) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:183)
UnityEngine.Networking.DownloadHandlerBuffer.GetNativeData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:239)
UnityEngine.Networking.DownloadHandler.InternalGetByteArray (UnityEngine.Networking.DownloadHandler dh) (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:163)
UnityEngine.Networking.DownloadHandler.GetData () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:72)
UnityEngine.Networking.DownloadHandler.get_data () (at /Users/bokken/build/output/unity/unity/Modules/UnityWebRequest/Public/DownloadHandler/DownloadHandler.bindings.cs:60)
AudioStreamSpeechWhisper.AudioStreamSpeechWhisper+<DownloadModelIfNeeded>d__24.MoveNext () (at <1873ea4b4a574b0783dee64b31f79656>:0)
UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) (at /Users/bokken/build/output/unity/unity/Runtime/Export/Scripting/Coroutines.cs:17)

Any idea wat went wrong?
Click to expand...

hi thanks for the report !
I will have to replace UnityWebRequest with something else for large models, apparently

If I may ask was (one of) Medium model(s) insufficient for recognition ? I recommend using one of those meanwhile, they usually produce good results -

Thanks, please let me know if this works for you

JoergUlrichZilles · Feb 28, 2024

...I forgot to mention that the tinyEN model works well (although slow on my machine MAC with 3,6 GHz 8-Core Intel Core i9)....but I need the precision of the bigger models, since the tiny model has a high WER on German language (which I want to use).....by my own tests the large models work best (tested in python) but since I want to use them in Unity I went for your professional solution.......
P.S. in comparison to e.g. whisper implementation (experimental) from sentis, the speed of detection on my machine with your asset seems to be very low ...any explanation for that?

r618 · Feb 28, 2024

JoergUlrichZilles said: ↑

but I need the precision of the bigger models, since the tiny model has a high WER on German language (which I want to use)..
Click to expand...

have a look at Medium (non En) if possible, from my experience it works significantly better than Tiny ones
/ also don't forget to use Language code parameter /

JoergUlrichZilles said: ↑

P.S. in comparison to e.g. whisper implementation (experimental) from sentis, the speed of detection on my machine with your asset seems to be very low ...any explanation for that?
Click to expand...

please try to replace macOS bits w/ adequate library from https://github.com/sandrohanea/whisper.net/tree/main/Whisper.net.Runtime.CoreML
I think the library included in the asset is Universal, but might be lacking this
( currently model downloader should download also CoreML `mlmodelc` model and whisper shoudl use it automatically, but I haven't tested this on Intel now I realized )
Thanks !

r618 · Apr 1, 2024

submitted an update, hope it will be found useful once it's live on the store ~
demo builds are already updated : -

===========================================
V 1.5.1 042024 >400k

- replaced UnityWebRequest with HttpClient in order to overcome its max. download size limit
( StreamAsync/CopyAsync are used to write/extract download directly to disk, Large models can be now downloaded )

- Windows and macOS/iOS builds of included whisper libraries built from its 1.5.1 release, additionally
- macOS/iOS: updated/fixed whisper libraries to use corresponding CoreML model by default
- whisper logging improved

- Fixed models re/loading: entering/exiting playmode + editor reloads should now work correctly at all times

------------------------------------------------------------------------
should be also (much) more stable overall esp. in editor

TT ~ !

Search Unity

Unity ID

Useful Searches

AudioStreamSpeechWhisper [offline speech recognition system]