Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Question Automatically capturing user speech in Meta Quest 3

Discussion in 'VR' started by Tyke18, Dec 7, 2023.

  1. Tyke18

    Tyke18

    Joined:
    Oct 6, 2023
    Posts:
    24
    Hi :)

    I'm creating an app whereby a user chats to an NPC, the NPC is powered by Open AI. I want my app to automatically detect when the user speaks, and do stuff with the microphone input (send the audio to openai for speech to text transcription etc) and detect when the speech has stopped.

    Meta's Wit AI can capture mic audio and transcribe it but it offers no automatic voice detection feature, you have to press a key/button first to let it know you're speaking, I don't want that. Can anyone point me in the direction of what I want i.e. an existing software solution etc?
     
    FarmerInATechStack likes this.
  2. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,251
    Maybe read out the decibels of the mic?
    Start when it goes over a threshold, then stop when it's under a threshold for a few seconds.

    This is why most smart assistants use a phrase like hey google to start recording
     
    FarmerInATechStack likes this.
  3. Tyke18

    Tyke18

    Joined:
    Oct 6, 2023
    Posts:
    24
    thx for the suggestion, I tried that, it worked sometimes but not reliably. I found a python library that does pretty good voice detection, i ran it's code in a websocket server connected to my unity app. bit messy but it works.
     
    FarmerInATechStack and DevDunk like this.
  4. FarmerInATechStack

    FarmerInATechStack

    Joined:
    Dec 28, 2020
    Posts:
    59
    @Tyke18 is this still working or have you come up with something better? Also:
    • Can you share which Python library helped?
    • Does Wit AI offer speech to text transcription too, or did you find that Open AI was the only solution?
     
  5. carton22liu_unity

    carton22liu_unity

    Joined:
    May 15, 2023
    Posts:
    1
    @Tyke18@FarmerInATechStack I have the same question with FarmerInATechStack
    I try to use Azure stream speech to text to do the transcription.
    And I also need the automatic voice detection so that it can stop crorrectly?
    I also wanna ask
    • Can you please point me to which Python library helped?
    • (Same question) Does Wit AI offer speech to text transcription too, or did you find that Open AI was the only solution?
     
  6. FarmerInATechStack

    FarmerInATechStack

    Joined:
    Dec 28, 2020
    Posts:
    59
    @carton22liu_unity I've gotten text-to-speech working using the OpenAI options and some scripts for microphone recording. However, I'm not doing automatic speech detection. I press a button to start recording from the mic.

    If interested, I'm also on Discord at farmerinatechstack
     
  7. colinleet

    colinleet

    Joined:
    Nov 20, 2019
    Posts:
    193
    FarmerInATechStack likes this.
  8. FarmerInATechStack

    FarmerInATechStack

    Joined:
    Dec 28, 2020
    Posts:
    59
    Nice, you can probably skip that if you're up for just using the APIs directly but it can also be really nice to have a solution that "just works" and someone else maintains.