Search Unity

  1. Check out the Unite LA keynote for updates on the Visual Effect Editor, the FPS Sample, ECS, Unity for Film and more! Watch it now!
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. Improved Prefab workflow (includes Nested Prefabs!), 2D isometric Tilemap and more! Get the 2018.3 Beta now.
    Dismiss Notice
  4. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice
  5. Want to see the most recent patch releases? Take a peek at the patch release page.
    Dismiss Notice

Klattersynth TTS - Support Thread

Discussion in 'Assets and Asset Store' started by tonic, Aug 10, 2017.

  1. tonic


    Oct 31, 2012
    :eek: Klattersynth TTS
    Learn more from official website of the asset:

    Klattersynth TTS is the first asset of its kind available for the Unity cross-platform engine:
    Small and fully embedded speech synthesizer.

    What features does Klattersynth TTS have?
    • It does not use OS or browser speech synth, so it sounds the SAME on all platforms. :cool:
    • Dynamically talks what you ask it for.
    • Generates and plays streamed speech in real-time.
    • In WebGL builds the AudioClips are quickly pre-generated and then played.
    • Contains English Text-To-Speech algorithm (transform to phonemes).
    • Alternatively you can enter documented phonemes directly, skipping the rules for English TTS conversion.
    • You can ask current loudness of the speech for tying effects to audio.
    • Uses normal AudioSource components: 3D spatialization, audio filters and reverb zones work like usual!
    • Contained in one ~100 KB cross-platform DLL file.
    • When embedded with your game or app and compressed for distribution, compresses down to less than 30 KB. o_O
    • Supports all Unity versions starting from 5.0.0 and available for practically all platforms targeted by Unity.
    Why Klattersynth TTS is different from many other speech-related assets for Unity?
    • No need for the underlying platform to offer speech features (OS or browser).
    • No need for a network connection for external generation of audio clips.
    • No need to pre-generate the samples before creating a build of your app or game. The clips are either streamed realtime or generated on the fly when the app or game is running.
    Visit the official website of the asset to try out a WebGL build yourself!

    Demo videos of Klattersynth TTS:
  2. Obsurveyor


    Nov 22, 2012
    Is this considered done or are you still working on the phonemes? The F's sound more like static and Th's are kind of just a pop. Also, in the WebGL demo, the base frequency doesn't seem to affect whisper very much. Are there more audio tweaks available?
  3. tonic


    Oct 31, 2012
    Hi @Obsurveyor, I won't be actively working on the sounds of phonemes. It's only a distant possibility that I'd add 1-2 more later, or try to adjust them. But with this technique there's not going to be huge improvements in that area, a synth this small is bound to have a bit of limitations.

    The example voices in "Text Entry" demo are made by adjusting these three available parameters: "Ms Per Speech Frame" (effectively controls the speed), "Flutter" and "Flutter Speed" (which can add a bit of unsteady weirdness to sound for example, although normally the flutter is just somewhat inaudible variance to the voice wave).

    Here's an image from the inspector:
    (this is the "Slow and unsteady" voice of the text entry demo)

    Attached Files:

  4. DbDibs


    May 23, 2015
    Very interesting, a couple of questions though. Since it's being generated realtime, is it possible to adjust the actual speed/pitch realtime as well? (eg. in the WebGL demo, being able to adjust "Base Voice Frequency" and having it change realtime instead of having to prerender it, though I understand WebGL HAS to have it prerendered). If so, this would be PERFECT for my needs! And as for my second question - I completely forgot what it was! haha.
  5. tonic


    Oct 31, 2012
    Hi @DbDibs, you're correct - WebGL has to have audio prerendered, so in WebGL builds Klattersynth will need to generate the whole clip just before playing it. It doesn't take long, but it is pre-generated before actually starting to play the clip.

    However, it is of course possible to just adjust the pitch parameter of the AudioSource playing the generated clip, as you can with any AudioClip. This will of course both change the pitch and slow down at the same time when you lower it (and vice versa).

    When used in streaming mode, the synth will latch to the parameters given at the time of starting to speak that particular line (also the msPerSpeechFrame is locked on to at initialization time, to minimize any extra memory allocations needed later). Even real-time streamed audio is generated in batches, so fine-tuned control of parameters would need to be specified in advance (if batch size is not very small). That's not a feature of the API now, but it's a possibility for future version.

    However, currently supported way is that one could simply instruct the synth to talk e.g. just a single word at a time. And just adjust the base frequency for each word to talk once previous one is finished. This would work both with streamed and pre-generated (and possibly cached) speech clips.
  6. lzt120


    Apr 13, 2010
    Does this plugin support Chinese words ?
  7. tonic


    Oct 31, 2012
    @lzt120, short answer: No.

    Long answer: the text-to-speech only has an approximate mapping for English language and no other languages. There's support for entering phonemes directly (documentation has list of those). It may be possible to compose some Chinese words using the phonemes directly (which would take time and experimentation). But even then there's no possibility to express tones in the pronunciation of Chinese language.

    Thanks for the question.
  8. IceBeamGames


    Feb 9, 2014
    Hey Tonic. I am getting this error: "Can't pre-gen speech clips while speech is being streamed (synth is active)".

    I am trying to pre-generate a load of speech clips using this function:

    Code (CSharp):
    1.     SpeechClip [] GenerateSpeechClipArray(string[] speachStrings)
    2.     {
    3.         SpeechClip [] rtn = new SpeechClip[speachStrings.Length];
    4.         StringBuilder speakSB = new StringBuilder();
    6.         for (int i = 0; i < speachStrings.Length; i++)
    7.         {
    8.             speakSB.Length = 0;
    9.             speakSB.Append(speachStrings[i]);
    10.             rtn[i] = speechSynth.pregenerate(speakSB, voiceFrequency, voicingSource, bracketsAsPhonemes, true);
    11.         }
    13.         return rtn;
    14.     }
    I'm not entirely sure what I am doing wrong? Do I need to wait for a short time while the speechSynth pregenerates?
  9. tonic


    Oct 31, 2012
    Hi @Tomdominer,

    By a quick glance that looks fine to me.

    Could you verify that the speechSynth instance which you're using is not playing some other speech clip right at the time when you're asking it to pregenerate stuff?

    Also, in case the speech synth is flagged to use streaming mode, then also the AudioSource component used by the Speech also isn't allowed to playing anything when the synth is asked to pregenerate a clip.

    Does the included Pangrams Example work for you? It pre-generates its clips in a batch, so you can use it as a reference. Please check the KlattersynthTTS_Example_Pangrams_Controller.cs and the IEnumerator pangramsDemo() method. There's the if (!clipsGenerated) { ... } code block which contains the batch generation.
    (Note 1: It's a coroutine, but only for update the progress info while clips are being generated - it would also work just as well without being inside a coroutine. Note 2: There's 3 different speech synths used in the batch generation, but it works just as well if the code is modified just to use a single one.)