Search Unity

Detecting Musical Notes from Vocal Input

Discussion in 'Scripting' started by theguywhodreams, Apr 6, 2015.

  1. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    Hey there,

    I'm currently making an application that displays a note that corresponds to what a user sings into their device. Essentially, the goal is to record vocal input through the microphone and make it a temporary audio track, collect spectrum data from this track by Fast Fourier Transform (FFT), and get the fundamental frequency from the spectrum data.

    Right now, I'm using Kaapine's tutorials on input using Microphone, collecting spectrum data using AudioSource.GetSpectrumData(), and further calculating fundamental frequency given other values. Links are the following:

    Using Microphone Input in Unity3D: http://www.kaappine.fi/tutorials/using-microphone-input-in-unity3d/
    Fundamental Frequencies and Detecting Notes: http://www.kaappine.fi/tutorials/fundamental-frequencies-and-detecting-notes/

    I've got it all up and running, I'm positive that my laptop detects audio input as the displayed frequencies change, and it somehow works as said frequencies change when I play different notes, but I'm not sure if it's accurate at all. One, because it sometimes returns different frequencies when I play the same note, and two, because the probable frequencies don't change their range (of around 230 to 430Hz) even if I change the octave completely.

    I say probable because the frequencies sometimes spike up to 800Hz and above or stay as low as 90Hz when I play notes with frequencies that should be in the 200's. Probable frequencies are the ones that are most likely correct when compared to the nearest notes. I can say this because some notes are okay and stable as they always return the same thing at the expected frequencies.

    To play notes, I'm using sonicviz's Unity implementation of the MIDI C#Synth Project from CodePlex, whose link can be found here:

    UnitySynth - full Xplatform midi synth: http://forum.unity3d.com/threads/unitysynth-full-xplatform-midi-synth.130104/

    I'm no expert at music, but I'm pretty sure it's playing notes in the correct order and their pitches are pretty accurate.

    My question/s are:
    1. Is there any way that I can improve the precision of the returned frequencies? I've tried playing with sample rates, bin lengths, loudness, etc, but it's mostly the same problem.
    2. Is there a better method of getting an audio track's fundamental frequency? I noticed that this does calculations in real-time, so I'm thinking that GetSpectrumData() doesn't take into account the whole track itself when computing spectrum data.
    3. As I said, this is a real-time implementation and that isn't exactly what I need as I want to record audio, then compute on the produced track. Is there a way to make this code do that? (I'm a beginner at Unity, sorry.)

    Any help with this is deeply appreciated. I'm under a lot of pressure to finish this so I really thank you in advance.

    Cheers,
    Justin
     
  2. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    UPDATE:

    After a lot of testing, I'm not sure if Kaapine's methods are accurate at all now, either that or there's something I'm doing horribly wrong. I tried playing a C in all octaves that humans can actually reach (C2 to C6) and it practically gave me near-random frequencies for each, not much useful data actually. It can only approximate C4 correctly (with around 255Hz). Can anyone suggest a better way to accomplish this?

    Thanks in advance,
    Justin
     
  3. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    UPDATE:

    No one's answering so I'll just update what I find here for other people with the same problems. Hahaha.

    I used a different algorithm. Found here:

    GetOutData and GetSpectrumData, what represent the values returned?
    http://answers.unity3d.com/questions/157940/getoutputdata-and-getspectrumdata-they-represent-t.html

    It's a lot more accurate, but still needs some tweaking. After altering a bit of the code to analyze a whole recorded track instead of real-time, I recorded data captured from a 61-key piano, which you can see here.

    upload_2015-4-8_14-19-53.png

    Red text is notes I don't really care about, since Wikipedia says that a human's vocal range normally starts from E2 (Lowest for Basses) to C6 (Highest for Sopranos). Highlighted cells are the ones that are way off from their actual frequencies.

    I did notice a pattern, though. The off frequencies are multiples of the actual frequencies. I did a little more research and found that there was such thing as "secondary partial frequencies", and maybe this algorithm sometimes mistakes the partial ones for the fundamental? I don't really know.

    I'm not really expecting anyone to answer here anymore (sad though), but if anyone has any input on why this might be happening, it'd be greatly appreciated. :)

    Cheers,
    Justin
     
    Last edited: Apr 8, 2015
    oxcamel and GibTreaty like this.
  4. GibTreaty

    GibTreaty

    Joined:
    Aug 25, 2010
    Posts:
    792
    I made a small 3D piano just to play around. When I get home I'll see if I can use the note algorithms I have to detect the notes that your voice makes.
     
  5. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    Hey GibTreaty!

    Looking forward to your results. :)

    Cheers,
    Justin
     
  6. GibTreaty

    GibTreaty

    Joined:
    Aug 25, 2010
    Posts:
    792
    Wow, looking into this further it's way more complicated than I thought. I figured I could just go into the audio data, pull a short piece of it out and round it to the nearest note frequency. Not completely sure how to go about doing it using AudioListener.GetSpectrumData. How would you know what frequency to multiply the data with? If you want to look into the piano key script that I had made you can check it out here...
    http://pastebin.com/WQZBufmL
     
  7. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    Hey GibTreaty,

    Yes, it's actually very complicated! Audio.GetSpectrumData() uses a window algorithm (in our case, the Blackman-Harris one) to convert a recorded wave form audio into a frequency distribution. This is essentially a graph where the most prominent frequency is the index of the array that contains the highest peak/value. From there, you can find the fundamental frequency using your bin length and sample rate. This varies, of course, depending on your bin length. A larger bin length yields more accuracy. The link I posted earlier contains information of how to implement the function going forward from GetSpectrumData.

    I've also seen online that maybe a Gaussian window might be more precise than a Blackman-Harris window, because the former still gives me the wrong frequencies, regardless of having a pattern or not. However due to time constraints, I'll have to settle for what's available for now. I'll post whatever updates I can.

    Thanks for the script though!

    Cheers,
    Justin
     
    GibTreaty likes this.
  8. toutenvrak

    toutenvrak

    Joined:
    May 2, 2015
    Posts:
    1
    Hello, I joined today to reply to your thread.

    I just bought a Leap Motion, and I had a look at Unity3d, looking for MIDI informations, and your tread caught my attention.

    I tried "Sonic Visualiser" http://www.sonicvisualiser.org/ and the plugins http://www.vamp-plugins.org/download.html

    It's actually "in my point of vue" the closest algorithm to detect notes (result in MIDI) from Wav.

    Have a look :cool:

    Best regards
    Toutenvrak FrenchMad
     
  9. sushionthego

    sushionthego

    Joined:
    Feb 1, 2016
    Posts:
    1
    Hi all! I'm also pretty interested in this subject matter.

    @theguywhodreams, any luck? :)

    @toutenvrak, can you elaborate on how you used sonic visualizer in conjunction with Unity? :) Beginner coder here.
     
  10. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    UPDATE:

    It's been a while and I recently upgraded to Unity 5, which messed up a lot of my work before. I had to do another pitch detection thing for another app and so I needed to look for solutions to the problems the new system gave.

    A huge problem was that the Microphone didn't work anymore. It was through this that I realized that muting the AudioSource also mutes the input from the Microphone, blocking any possible output from AnalyzeSound ().

    Through benjaminoutran in this post (http://answers.unity3d.com/questions/1113690/microphone-input-in-unity-5x.html), I found that you now needed to set the output channel of your Audio Source through an Audio Mixer to work around this problem, and you get the option of muting it through its Volume propoerty.

    I simplified benjaminoutran's code and incorporated it into the existing code I had that was based from aldonaletto's code in this thread (http://answers.unity3d.com/questions/157940/getoutputdata-and-getspectrumdata-they-represent-t.html). You'll see a lot of his computation in the code below, but I tweaked it a bit in an effort to get more favorable results.

    I mentioned earlier about the code detecting multiples of the expected frequencies, and apparently these are caused by the harmonics tied with sounds produced by the guitar and the human voice among other intruments, both of which I used to test the code below.

    What I tried doing is to introduce a Peak selection system, and what I wanted to achieve is that out of a distribution of frequencies, you take the first five peaks or the first five lowest fundamental frequency candidates that appear, because (1) the notes we usually want to produce through the human voice are in the lower frequencies, and (2) because the harmonics cause the algorithm to detect multiples of the expected frequency, ergo higher values.

    The selection process is supposedly as follows:
    1) Identify and isolate the peaks in the distribution
    2) Take the first 5 lowest peaks
    3) Take the one with the highest amplitude out of the five, this is the fundamental frequency

    But I realized peak detection is a much more arduous process that I personally did not know how to do, so what I did below was:
    1) Get at most the first five elements of the distribution with amplitudes higher than one another (lower index = lower frequency so it made sense)
    2) Get the one with the highest amplitude as the fundamental frequency

    So far, this code works well enough, but I only tested it on the notes C4 to E5, because those were the only ones I needed. It kind of struggles at detecting F4 because it sometimes detects the harmonics and not the actual note. I tested it with my own voice and a guitar. If anyone can test this solution or improve it in any way, it would be greatly appreciated.

    Regards,
    Justin

    Code (CSharp):
    1. using UnityEngine;
    2. using UnityEngine.UI;
    3. using UnityEngine.Audio;
    4. using System;
    5. using System.Collections.Generic;
    6.  
    7. [RequireComponent(typeof(AudioSource))]
    8. class Peak {
    9.     public float amplitude;
    10.     public int index;
    11.  
    12.     public Peak() {
    13.         amplitude = 0f;
    14.         index = -1;
    15.     }
    16.  
    17.     public Peak( float _frequency, int _index ) {
    18.         amplitude = _frequency;
    19.         index = _index;
    20.     }
    21. }
    22.  
    23. class AmpComparer : IComparer<Peak> {
    24.     public int Compare (Peak a, Peak b) {
    25.         return 0 - a.amplitude.CompareTo (b.amplitude);
    26.     }
    27. }
    28.  
    29. class IndexComparer : IComparer<Peak> {
    30.     public int Compare (Peak a, Peak b) {
    31.         return a.index.CompareTo (b.index);
    32.     }
    33. }
    34.  
    35. public class PitchTracker : MonoBehaviour {
    36.  
    37.     public float rmsValue;
    38.     public float dbValue;
    39.     public float pitchValue;
    40.  
    41.     public int qSamples = 1024;
    42.     public int binSize = 1024; // you can change this up, I originally used 8192 for better resolution, but I stuck with 1024 because it was slow-performing on the phone
    43.     public float refValue = 0.1f;
    44.     public float threshold = 0.01f;
    45.  
    46.  
    47.     private List<Peak> peaks = new List<Peak> ();
    48.     float[] samples;
    49.     float[] spectrum;
    50.     int samplerate;
    51.  
    52.     public Text display; // drag a Text object here to display values
    53.     public bool mute = true;
    54.     public AudioMixer masterMixer; // drag an Audio Mixer here in the inspector
    55.  
    56.  
    57.     void Start() {
    58.         samples = new float[qSamples];
    59.         spectrum = new float[binSize];
    60.         samplerate = AudioSettings.outputSampleRate;
    61.    
    62.         // starts the Microphone and attaches it to the AudioSource
    63.         GetComponent<AudioSource>().clip = Microphone.Start(null, true, 10, samplerate);
    64.         GetComponent<AudioSource>().loop = true; // Set the AudioClip to loop
    65.         while (!(Microphone.GetPosition(null) > 0)){} // Wait until the recording has started
    66.         GetComponent<AudioSource>().Play();
    67.  
    68.         // Mutes the mixer. You have to expose the Volume element of your mixer for this to work. I named mine "masterVolume".
    69.         masterMixer.SetFloat ("masterVolume", -80f);
    70.     }
    71.  
    72.     void Update(){
    73.         AnalyzeSound();
    74.         if (display != null){
    75.             display.text = "RMS: "+rmsValue.ToString("F2")+
    76.                 " ("+dbValue.ToString("F1")+" dB)\n"+
    77.                 "Pitch: "+pitchValue.ToString("F0")+" Hz";
    78.         }
    79.     }
    80.    
    81.     void AnalyzeSound(){
    82.         GetComponent<AudioSource>().GetOutputData(samples, 0); // fill array with samples
    83.         int i = 0;
    84.         float sum = 0f;
    85.         for (i=0; i < qSamples; i++){
    86.             sum += samples*samples; // sum squared samples
    87.         }
    88.         rmsValue = Mathf.Sqrt(sum/qSamples); // rms = square root of average
    89.         dbValue = 20*Mathf.Log10(rmsValue/refValue); // calculate dB
    90.         if (dbValue < -160) dbValue = -160; // clamp it to -160dB min
    91.  
    92.         // get sound spectrum
    93.         GetComponent<AudioSource>().GetSpectrumData(spectrum, 0, FFTWindow.BlackmanHarris);
    94.         float maxV = 0f;
    95.         for (i=0; i < binSize; i++){ // find max
    96.             if (spectrum > maxV && spectrum > threshold){
    97.                 peaks.Add (new Peak (spectrum , i));
    98.                 if (peaks.Count > 5) { // get the 5 peaks in the sample with the highest amplitudes
    99.                     peaks.Sort (new AmpComparer ()); // sort peak amplitudes from highest to lowest
    100.                     //peaks.Remove (peaks [5]); // remove peak with the lowest amplitude
    101.                 }
    102.             }
    103.         }
    104.         float freqN = 0f;
    105.         if (peaks.Count > 0) {
    106.             //peaks.Sort (new IndexComparer ()); // sort indices in ascending order
    107.             maxV = peaks [0].amplitude;
    108.             int maxN = peaks [0].index;
    109.             freqN = maxN; // pass the index to a float variable
    110.             if (maxN > 0 && maxN < binSize - 1) { // interpolate index using neighbours
    111.                 var dL = spectrum [maxN - 1] / spectrum [maxN];
    112.                 var dR = spectrum [maxN + 1] / spectrum [maxN];
    113.                 freqN += 0.5f * (dR * dR - dL * dL);
    114.             }
    115.         }
    116.         pitchValue = freqN*(samplerate/2f)/binSize; // convert index to frequency
    117.         peaks.Clear ();
    118.     }
    119. }
     
    oxcamel, ovirta and ThermalFusion like this.
  11. vikankraft

    vikankraft

    Joined:
    Feb 25, 2016
    Posts:
    88
    Sorry to be a little of topic but can I ask what microphone you use? This interested me quite alot but Im presuming if you dont use a really really good microphone in a sound isolated room your sampling will not be precise enough and pick up alot of noise.
     
  12. theguywhodreams

    theguywhodreams

    Joined:
    Apr 3, 2015
    Posts:
    7
    Hello!

    This has been tested with the built-in mic of the phones I deployed the app on and my laptop on which I debug the app. They're pretty generic if you ask me, but you can bump up or down the sensitivity of pitch detection by changing the threshold field in the code.

    Of course, an optimal environment for pitch detection will always be a relatively quiet room with little interference, but I guess for now this solution is what works the best for me. :)

    Cheers,
    Justin
     
  13. ovirta

    ovirta

    Joined:
    Mar 20, 2015
    Posts:
    42
    Thanks @theguywhodreams for sharing this code snippet. It is working reasonably well when compared to other Voice detection systems around for Unity. Attached screenshot from me "signing" in different pitch levels "ma, me, mi, mo...". Male voice is quite difficult for pitch detection algorithms but there is some trends visible there. Some basic smoothing algo might help quite a bit to start with. Need to come back to your work after a week of holidays.

    Very good work!

    Please ping back if any questions on my testing (and if you make breakthroughs in optimizing the code ;) )

    UPDATE: There were few build errors in the code snippet where spectrum-array were used without the brackets.
     

    Attached Files:

    Last edited: Jun 26, 2016
  14. Jaden_Schneider

    Jaden_Schneider

    Joined:
    May 9, 2016
    Posts:
    1
    Can you please post the code without the errors?
     
  15. UnityNinja007

    UnityNinja007

    Joined:
    Feb 11, 2016
    Posts:
    1
    theguywhodreams, Hi there, I think that u have done a great job, but i can't figured what falue i need to compare with values in your table for piano? Also if i will use your script there is always frequensy = 0, but i made some improvements and it is aroun 50Hz is that correct? It will be very good if u give me a quick answer. Thanks a lot!
     
  16. kurtdog

    kurtdog

    Joined:
    Mar 31, 2014
    Posts:
    14
    Here is a bug free version of theguywhodreams code. Thanks!
    Code (CSharp):
    1. using UnityEngine;
    2. using UnityEngine.UI;
    3. using UnityEngine.Audio;
    4. using System;
    5. using System.Collections.Generic;
    6.  
    7. [RequireComponent(typeof(AudioSource))]
    8. class Peak
    9. {
    10.     public float amplitude;
    11.     public int index;
    12.  
    13.     public Peak()
    14.     {
    15.         amplitude = 0f;
    16.         index = -1;
    17.     }
    18.  
    19.     public Peak(float _frequency, int _index)
    20.     {
    21.         amplitude = _frequency;
    22.         index = _index;
    23.     }
    24. }
    25.  
    26. class AmpComparer : IComparer<Peak>
    27. {
    28.     public int Compare(Peak a, Peak b)
    29.     {
    30.         return 0 - a.amplitude.CompareTo(b.amplitude);
    31.     }
    32. }
    33.  
    34. class IndexComparer : IComparer<Peak>
    35. {
    36.     public int Compare(Peak a, Peak b)
    37.     {
    38.         return a.index.CompareTo(b.index);
    39.     }
    40. }
    41.  
    42. public class PitchTracker : MonoBehaviour
    43. {
    44.  
    45.     public float rmsValue;
    46.     public float dbValue;
    47.     public float pitchValue;
    48.  
    49.     public int qSamples = 1024;
    50.     public int binSize = 1024; // you can change this up, I originally used 8192 for better resolution, but I stuck with 1024 because it was slow-performing on the phone
    51.     public float refValue = 0.1f;
    52.     public float threshold = 0.01f;
    53.  
    54.  
    55.     private List<Peak> peaks = new List<Peak>();
    56.     float[] samples;
    57.     float[] spectrum;
    58.     int samplerate;
    59.  
    60.     public Text display; // drag a Text object here to display values
    61.     public bool mute = true;
    62.     public AudioMixer masterMixer; // drag an Audio Mixer here in the inspector
    63.  
    64.  
    65.     void Start()
    66.     {
    67.         samples = new float[qSamples];
    68.         spectrum = new float[binSize];
    69.         samplerate = AudioSettings.outputSampleRate;
    70.  
    71.         // starts the Microphone and attaches it to the AudioSource
    72.         GetComponent<AudioSource>().clip = Microphone.Start(null, true, 10, samplerate);
    73.         GetComponent<AudioSource>().loop = true; // Set the AudioClip to loop
    74.         while (!(Microphone.GetPosition(null) > 0)) { } // Wait until the recording has started
    75.         GetComponent<AudioSource>().Play();
    76.  
    77.         // Mutes the mixer. You have to expose the Volume element of your mixer for this to work. I named mine "masterVolume".
    78.         masterMixer.SetFloat("masterVolume", -80f);
    79.     }
    80.  
    81.     void Update()
    82.     {
    83.         AnalyzeSound();
    84.         if (display != null)
    85.         {
    86.             display.text = "RMS: " + rmsValue.ToString("F2") +
    87.                 " (" + dbValue.ToString("F1") + " dB)\n" +
    88.                 "Pitch: " + pitchValue.ToString("F0") + " Hz";
    89.         }
    90.     }
    91.  
    92.     void AnalyzeSound()
    93.     {
    94.         float[] samples = new float[qSamples];
    95.         GetComponent<AudioSource>().GetOutputData(samples, 0); // fill array with samples
    96.         int i = 0;
    97.         float sum = 0f;
    98.         for (i = 0; i < qSamples; i++)
    99.         {
    100.             sum += samples[i] * samples[i]; // sum squared samples
    101.         }
    102.         rmsValue = Mathf.Sqrt(sum / qSamples); // rms = square root of average
    103.         dbValue = 20 * Mathf.Log10(rmsValue / refValue); // calculate dB
    104.         if (dbValue < -160) dbValue = -160; // clamp it to -160dB min
    105.  
    106.         // get sound spectrum
    107.         GetComponent<AudioSource>().GetSpectrumData(spectrum, 0, FFTWindow.BlackmanHarris);
    108.         float maxV = 0f;
    109.         for (i = 0; i < binSize; i++)
    110.         { // find max
    111.             if (spectrum[i] > maxV && spectrum[i] > threshold)
    112.             {
    113.                 peaks.Add(new Peak(spectrum[i], i));
    114.                 if (peaks.Count > 5)
    115.                 { // get the 5 peaks in the sample with the highest amplitudes
    116.                     peaks.Sort(new AmpComparer()); // sort peak amplitudes from highest to lowest
    117.                     //peaks.Remove (peaks [5]); // remove peak with the lowest amplitude
    118.                 }
    119.             }
    120.         }
    121.         float freqN = 0f;
    122.         if (peaks.Count > 0)
    123.         {
    124.             //peaks.Sort (new IndexComparer ()); // sort indices in ascending order
    125.             maxV = peaks[0].amplitude;
    126.             int maxN = peaks[0].index;
    127.             freqN = maxN; // pass the index to a float variable
    128.             if (maxN > 0 && maxN < binSize - 1)
    129.             { // interpolate index using neighbours
    130.                 var dL = spectrum[maxN - 1] / spectrum[maxN];
    131.                 var dR = spectrum[maxN + 1] / spectrum[maxN];
    132.                 freqN += 0.5f * (dR * dR - dL * dL);
    133.             }
    134.         }
    135.         pitchValue = freqN * (samplerate / 2f) / binSize; // convert index to frequency
    136.         peaks.Clear();
    137.     }
    138. }
     
    konose, TuanHA and oxcamel like this.
  17. kurtdog

    kurtdog

    Joined:
    Mar 31, 2014
    Posts:
    14
  18. dee-gelbart

    dee-gelbart

    Joined:
    Jul 18, 2015
    Posts:
    1
    I expect this one will be more reliable because it uses autocorrelation:

    https://pitchtracker.codeplex.com/

    The problem with the peak-picking approach that guywhodreams used is that the harmonic with the most energy is not necessarily the the fundamental frequency (i.e., the one whose frequency = the pitch frequency).

    The pitch is the distance between a harmonic and the next harmonic. Autocorrelation is essentially measuring this distance.

    I did a very quick test between the codeplex code and the code in this thread (kurtdog version) and the codeplex code seemed a lot better. But it would be good to hear from other people who have tried both since I rushed through the comparison.
     
    Last edited: Jan 21, 2017
    joseibanez, TuanHA and oxcamel like this.
  19. TuanHA

    TuanHA

    Joined:
    Dec 18, 2015
    Posts:
    1
    I've tried the code and it's possible to detect the loudest note (not very accurate)

    It's possible to detect a chord?
     
  20. rmib200

    rmib200

    Joined:
    Jan 19, 2017
    Posts:
    7
    Can you explain me how do you used it? I have the pitch class in my code and i'm trying to get the pitch from the recording of my mic to a text on my scene but i'm not been able to do that. I've used the PitchTracker.CurrentPitchRecord property to retrieve the pitch but it's not working, and I can't really understand how to use the code from the documentation. codesnipet.PNG
     
    Last edited: Aug 22, 2018
  21. ovirta

    ovirta

    Joined:
    Mar 20, 2015
    Posts:
    42
  22. xKyungsoo

    xKyungsoo

    Joined:
    Nov 9, 2018
    Posts:
    1
    Hello, I am using this post's code (with the Peaks) for my application. However, i need to literally put my mic in my mouth before it actually calculates a pitch. I tried on other devices and found that it generally expects a (too) high volume for the microphone input. Is there something I can do so that it I can leave my mouth at a comfortable distance from my mic?
     
  23. timmehhhhhhh

    timmehhhhhhh

    Joined:
    Sep 10, 2013
    Posts:
    157
    not sure why some felt the need to freak out over the way you asked your question, but i would agree it wasn't super clear how this thing should be consumed within the context of a unity app, especially from a mic feed.

    i've gone ahead and started bringing it to life within this context here. i wouldn't expect phenomenal support, it's partly just a vehicle for me to dive back into unity's latest take on ecs. but, it works, and there's a demo scene.

    i'd also second what @ovirta recommended - it seems to work well and looks to be behind a bunch of their games. if you want something a bit more stable and with better support, it's a very affordable price :).