Speech Recognition Engine / Recording Devices

mikewarren · May 22, 2019

I wasn't quite sure where to put this post as my target platform is an MS Hololens, but my questions are really about the Unity speech recognition API.

I'd like to be able to use speech recognition in a Unity application on a Hololens with a microphone other than the microphone array built into the HL device. I've seen and been told conflicting information on whether it's possible so I'm looking for more details.

The Unity speech API layer is fairly sparse. When a (Unity) recognition engine is instantiated how is the audio input device chosen? On desktop systems, I assume it's the default recording device on the sound panel? If so, is there a similar option on the Hololens? Or, an API to set the default recording device prior to creating a speech recognizer.

I'm assuming the Unity speech API is a layer on top of the underlying Windows speech API. Any reason I couldn't just implement a windows speech recognizer directly? (For instance, resolving extra assemblies?) Which API is the Unity speech API built on?

Tautvydas-Zilys · May 23, 2019

It's built directly on top of this class: https://docs.microsoft.com/en-us/uwp/api/windows.media.speechrecognition.speechrecognizer

That API doesn't let us choose the audio input device. I assume it uses whatever is connected.

mikewarren · May 23, 2019

Thanks @Tautvydas-Zilys. That helps.

I know that on Windows 10 (standard) that I've been able to change the audio input device via the Sound panel (default recording device). I don't see any such construct on the Hololens. Anyone know if there's an API to enumerate and change the default audio device?

Tautvydas-Zilys · May 23, 2019

I unfortunately do not.

timke · May 23, 2019

From my understanding there's no API in Windows (desktop or otherwise) to change the default audio device; only the user is allowed to manipulate this setting. So the only way this could work is by hacking the Registry on the HL.

I don't know if it's even possible to manipulate the Registry on HL, but changing the active audio capture device under this key: HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Capture should do the trick. Here's a post with a few details on this: https://superuser.com/questions/1054594/switching-default-audio-device-with-a-batch-file

However, even if you get past all this and get Speech Recognition to capture from an external microphone, it probably won't work very well (if at all). This is because the Speech system must be calibrated with the microphone to filter (cancel) out background noise, reverberation, etc. to produce a clean signal. Unlike Desktop Windows, HoloLens is designed to only work with the built-in microphone array, and (AFAIK) the calibration data cannot be changed so it'll use the same filtering on the audio stream from the external microphone. That is, it'll apply incorrect filtering producing a worse signal than if no filtering was used.

mikewarren · May 23, 2019

@timke Appreciate the feedback.

I don't know of an API to change the default audio device either, but if it involve registry manipulation, I don't want any part of it anyway. I have an open dialog with MS and I'm trying to get an informed determination.

I thought the audio processing (filtering, beam forming, etc.) was part of the device driver DSP, not the speech system, and that the audio sample data was pre-processed prior to hitting the recognition system. If so, I should be able to substitute (theoretically) any audio sample source (even recorded data)..

https://docs.microsoft.com/en-us/wi...ssing-modes#available-signal-processing-modes

For instance, the Hololens produces a Communications, Speech and Other (environmental) stream from the built in microphone array. The Communications and Speech streams use the beam forming / filtering technology you cite, whereas the Other stream does something different. (I've recorded samples from each stream in the same environment and it's startling how well the Speech stream filters noise.)

https://docs.microsoft.com/en-us/windows/mixed-reality/voice-input#communication

shaho1763 · Mar 26, 2020

hi
I want to work on virtual reality. I want to talk to Unity about converting speech to text and working on conversations. please guide me. Thanks.
my email : shaho1763@gmail.com

Search Unity

Speech Recognition Engine / Recording Devices

mikewarren

Tautvydas-Zilys

Unity Technologies

mikewarren

Tautvydas-Zilys

Unity Technologies

timke

mikewarren

shaho1763

Search Unity

Unity ID

Useful Searches

Speech Recognition Engine / Recording Devices

mikewarren

Tautvydas-Zilys

Unity Technologies

mikewarren

Tautvydas-Zilys

Unity Technologies

timke

mikewarren

shaho1763