Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

[Generative AI] DeepVoice - Text To Voice

Discussion in 'Assets and Asset Store' started by AiKodex, Jul 9, 2023.

  1. ZealUnity

    ZealUnity

    Joined:
    Apr 13, 2014
    Posts:
    64
    Why is it not possible to ship a large TTS model with a "high end GPU required" disclaimer? Continue offering the cloud service as an option for everyone else.

    Please, this asset looks amazing, but it needs an offline mode. You will get a HUGE boost in sales (and take a huge load off your servers).
     
    Last edited: Nov 15, 2023
    sirleto likes this.
  2. leanerdesigner

    leanerdesigner

    Joined:
    Nov 10, 2021
    Posts:
    3
    I'd buy it if it had an offline mode.
     
    sirleto likes this.
  3. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello ZealUnity,

    Thank you for bringing up this question. This makes it easier for us to explain why we went with a cloud inference method.

    The maximum upload size inside Unity is 6GB which is far too less for AI models. Secondly, most models are written in python. Even though it is possible to convert the model into ONNX or rewrite the model in C# for inference, the workflow is not the easiest and there isn’t nearly enough documentation on doing so especially with this nascent technology. We have already tried to do so in our asset Ai.Fy. We have ONNX AI models with 10,000 parameters. Any higher and there are issues. A decent output for a diffusion model requires it to have parameters on the magnitude of a billion (currently).
    Thirdly, inference on these models requires high end NVIDIA GPUs with a specific CUDA version which limits the number of users who can actually use this asset. Users may even need to download specific CUDA drivers for the asset to work correctly.

    Currently, it’s an easy to use asset accessible to everyone, including personal laptops and even mobile device builds. There is nothing to setup or install after you download the asset and is essentially plug and play. The asset occupies a few megabytes. The updates take very little time to upload and get downloaded by users in seconds in comparison to days for a package that a gigabyte or bigger.
     
  4. Barritico

    Barritico

    Joined:
    Jun 9, 2017
    Posts:
    382
    Excuse my ignorance.
    What does the 60,000 characters mean?

    My game is car racing. The co-pilot tells the pilot, by voice, the directions. Do you mean that I have 60,000 characters for all indications? Do you mean that if I have 1000 players they can only have 60 characters per player?

    Sorry for the question, I prefer to "exaggerate clumsy" to remove all doubts.

    Thanks
     
  5. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Barritico,

    60,000 characters per month means that you can create audio clips with a strong of length of 60,000 characters. This includes spaces and punctuations as well. Every time you press the generate button, the number of characters in the text box will be deducted from the total count for the month. 60,000 characters is around 16-18 pages of text, so you can rest assured you have enough characters for the month for your game script.
     
    Barritico likes this.
  6. Barritico

    Barritico

    Joined:
    Jun 9, 2017
    Posts:
    382

    Thanks!!!!
     
  7. ZealUnity

    ZealUnity

    Joined:
    Apr 13, 2014
    Posts:
    64
    To your three points...

    1- I was unaware of any 6g Unity asset limit, but you wouldn't be distributing the model itself with the asset bundle. Just host it on your website so those users who want it can get it direct from you.

    2- Not sure why you would have to re code anything. Anyone interested in running the model locally should have no problem running a python script (similar to how you run automaic1111 locally for Stable Diffusion).

    3- "Disclaimer: NVIDIA 40x series GPU and Python 3.10.6 required to run the model locally, or you may use our FREE cloud service". Problem solved.

    Nobody is saying to stop offering the free cloud service. It's a good solution for many cases.

    However, the fear of quota limits, your company/servers shutting down, privacy concerns, etc... it's stopping a lot of people from purchasing this otherwise amazing asset. An offline option will give everyone (power users and casual users) tremendous peace of mind.

    Like I said, it's a one time investment (building the locally deployable version/python scripts), in exchange for a huge boost in sales and a huge reduction in your server/bandwidth costs.
     
    sirleto, lazydev999 and jacko93 like this.
  8. Pen4711

    Pen4711

    Joined:
    Feb 14, 2018
    Posts:
    4
    The ability to save a custom voice setting by name would be great for long term projects when you have to use the same voice multiple times. :D
     
    sirleto likes this.
  9. sirleto

    sirleto

    Joined:
    Sep 9, 2019
    Posts:
    149
    i think it is like this: if you offer an offlien solution, however "complicated" (get this specific runtime, download that huge model, require this specific driver and that hardware which is quite expensive, too) ... people can download that and just keep it as "backup" whenever your server fails in the future.

    we dont know if you will survive 5 years in the market, or your server, or your pricing, etc.

    but most proper games developed need new audio produced for more than a few years ... so if you want to get the bulk of careful indies + AA + AAA then offering any offline solution is the right thing to do.

    for me, i am looking at your software right now at only 30% price (flash sale) and STILL can not decide if i want to buy it. not because its not cheap, but because if i invest into anything it must continue to be useable for a long time. either its that good and reliable, or i dont even need to try and toy around with it ...
     
  10. claudius_I

    claudius_I

    Joined:
    May 28, 2017
    Posts:
    254
    Hello
    Where can I see how many characters I have left of the 60 thousand for the month?
     
  11. cyanb

    cyanb

    Joined:
    Aug 1, 2022
    Posts:
    2
    The audio trimmer/joiner/equalizer UI is broken for me. (unity 2022.3.5f1)
    Unable to preview the changes there and the trimmer preview also doesnt match the slider.
     
  12. gadoneitor15

    gadoneitor15

    Joined:
    Jun 13, 2019
    Posts:
    1
    How can I make an evil laugh and then have them say an angry phrase?
    this what im using
    (jajaja - \"acaben con ellos\" gritó enojado)
     
  13. Haapavuo

    Haapavuo

    Joined:
    Sep 19, 2015
    Posts:
    97
    Hi, and thanks for the amazing asset and being active on Unity Forums!

    I have 2 remarks:
    1) Same bug as above: The audio trimmer preview slider does not match the audio.
    2) Are you planning to add more multi voices? I am especially interested in whispering female & male voices, and perhaps a more shouting style voice (imagine some sort of a war leader).

    Thanks!
     
  14. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Thank you for summarizing your points and possible solutions.

    1. The issue of a model being "hosted" somewhere will arise again if we decide to offer the model through our website. The website's streaming speed will be affected by multiple users downloading a rather large file from the server. Hosting on public platforms like Google Drive, Mega, and GitHub is not an option either, as we would have to implement their authorization methods, adding another hurdle to accessing the service.

    2. Running the model locally requires dozens of Python packages to be installed on your PC as a prerequisite. This leads to more chances of running into compatibility issues.

    3. Adding this disclaimer is a good solution. However, previous points make it impractical to run the model locally.
    We agree with the arguments you have put forward in your discussion, and we are grateful that you have shared your concerns, even from our viewpoint. We understand the concerns and are working on a system that will enable us to offer the model locally.
     
  15. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Claudius,

    You can click on the "status" button underneath the text box to see the number of characters you have used.
     
  16. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Cyanb,

    We have sent you an email regarding this.
     
  17. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Haapavuo,

    Yes of course, we are working on adding more voices. Please stay tuned, we will be pre-announcing the updates here.

    Yes, we are aware of the scaling issue on larger screen sizes.
    Of course, you should be able to work with any screen size and resolution and we understand that changing the scale of your screen to use an asset better is not a solution.

    However, if you can, could you set the scale to 150% and try and adjust the sliders. Editor programming in Unity does not provide much flexibility when it comes to designing Editor UI (hence at times you can see clipping and UI errors pop up in the updates from Unity itself).

    We will work on increasing compatibility with larger screen sizes as an update.
     
  18. lgarczyn

    lgarczyn

    Joined:
    Nov 23, 2014
    Posts:
    68
    A bit disappointed by the setup.

    The example scene is simply a strange render of the editor window? I'd expected scripts showing example script usage.

    The plugin is located directly at the root instead of at /Plugins, AND cannot be moved without breaking everything. So many hardcoded paths everywhere. A search and replace appears to have fixed it, but no idea if things still work correctly.

    Occasional floods of errors: upload_2023-11-23_20-3-44.png

    Does not handle properly the absence of the unity audio engine, which happens to anyone using FMOD.

    Just one 1000 line long block of code, with a LOT of warnings for basic things, like unused variables, incorrect indentation, etc. It obviously says nothing about how good the package is, but it makes it hard to do anything without breaking the package.

    The embedded sound player plays sound distorted, crackling, and slowed down massively.

    The range sliders are very hard to click, and for some reason you end up clicking the wrong handle every single time. The range overlay on the audio wave is also offset and displayed incorrectly.

    I'd expected an API to trim and join clips. Instead all of that is hidden in editor code, except for the bare bones in waveutils.

    Clean UI, OK voice selection (a bit sad at the impossibility to create custom voices), but the code is not up to any standard.
     
    Last edited: Nov 23, 2023
  19. VeryBadPenny

    VeryBadPenny

    Joined:
    Jun 3, 2018
    Posts:
    41
    Hello, I am using this tool a lot and I like it. One challenge we are facing is that we lose track of the source settings for a given voice sample (additionally, it forgets the editor settings each time we close the Editor). We do not want to copy/paste/annotate everything by hand (i.e. ^C, ^V every time we make a voice clip we like) and we want that to be automatic. Ideally I would like the "Generate" button to create something with embedded metadata.

    So... I am wondering if anyone has written a ScriptableObject framework which encapsulates the metadata around a generated AI voice? I use SO's a lot for audio but haven't dug into the DeepVoice code. Ideally I'd like to generate a single object holding the WAV file, the voice model settings, the text string used and the version of DeepVoice used in generation, etc.... I know how to do all of this except extracting the metadata from the Editor tool and am posting here to ask before I start trying to write it myself. I'd also add a "Generate" button to the scriptable object to regenerate the voice I guess.

    I also looked into capturing the metadata either in the WAV file (as LIST chunks or whatever) ... that's less appealing than a wrapper for DeepVoice which just creates a ScriptableObject with the WAV file as a component. But I could be persuaded.

    Anyway if nobody else did this, and I write something useful, I'll post it back here, might be helpful.
     
    Last edited: Nov 23, 2023
  20. sirleto

    sirleto

    Joined:
    Sep 9, 2019
    Posts:
    149
    if i would continue to use this product, i think the easiest thing i would do myself is to encode the few parameters into the filename + selected voice + first 50 or so characters of the text that is being said.

    but i do not use this product anymore, as i the quality doesnt work for me. i think the few good voices sound truly good, but the rest is either unuseable from an accoustic POV or the actors (well, the females honestly) are too annoying and not versatile useable enough.

    but the major problem i have quality wise, is that it sounds just too much like audio books. i understand it was trained from audio books, and all sounds like that. any mistake the "AI" does on reading, isnt one where the "voice actor" misunderstood the emotion but one where the "audio book reader" does a misreading.

    so its perfecet for narator speech in games, but any dialogues are not "voice acting" at all.
    i never worked around to give instructions that change this "tone", and to my ear its not useable.

    i purchased the product at 30% of original price at sale, and i feel like: okay, nice concept, good to support them, but i wont use it. would i have bought it at full price of usd 80, i would be pissed about the unusability in anything but a hobbyist game.

    also i have used 2 hours on the first evening after sale to test it, and came to about 30% of my monthly quota.
    this means to use it in a real production with proper messages, texts, dialogues, etc. one would need to write a batch system, that can be run every month to produce all needed until the quota is used up.

    so in this regard, i am on the side of those people saying: if its not offline, its neither future proof nor truly useable for anything bigger than a few experiments.
     
    VeryBadPenny likes this.
  21. VeryBadPenny

    VeryBadPenny

    Joined:
    Jun 3, 2018
    Posts:
    41
    Thanks for your input, in reply to my question
    Personally I dislike long filenames, but (despite this) I already use the text of the voice clip for subtitles, e.g. vo_hello would automatically be recognised as a voice clip (for ducking) and would put the subtitle "hello" on the screen. I am looking for a better solution for more extensive voice clips. I find the DeepVoice quite powerful for my simplistic applications and just need the additional metadata. I already started working on it.
     
  22. ZealUnity

    ZealUnity

    Joined:
    Apr 13, 2014
    Posts:
    64
    Went ahead and picked up the asset during the Black Friday sale. Even though our current project can't rely on pure cloud based service, it sounds like you guys are actively working on an offline mode.

    This asset deserves support, it's great (and with an offline mode, it would be AMAZING great).
     
  23. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    [Announcement]

    There seems to have been an outage in the service from UTC 02:00 to 03:00.
    The most likely cause is the authentication system from Unity. We checked our server logs and it seems that there was a server error on Unity's invoice verification and payment systems.

    The service is back to normal now. Please check and let us know.
     
  24. VINNUSAURUS

    VINNUSAURUS

    Joined:
    Jul 6, 2017
    Posts:
    21
    Hi, can it make moaning sounds?
     
  25. jacko93

    jacko93

    Joined:
    Feb 23, 2016
    Posts:
    4
    Hi, I've noticed a significant change in Lily's voice, and it appears to have been swapped out with an entirely different one. I had put in a considerable amount of time into shaping dialogues for my character using her original voice. However, I now find myself having to recreate all the conversations I had crafted before with a different voice. This not only demands extra effort, but it also depletes my characters quota.
    Could you please consider reverting to the old voice?
     
  26. UnicornsRock420

    UnicornsRock420

    Joined:
    Dec 16, 2021
    Posts:
    3
    The Window > DeepVoice > Text field, the AI ignores the instructions and the generated voice says the instructions. Any suggestions?
     

    Attached Files:

  27. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello UnicornsRock420,

    That is correct, as you can hear, the voice shouts "Test" as instructed. The only drawback is that the instruction is also said out loud but can be trimmed away using the in-built voice trimmer.
     
  28. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello jacko93,

    We profusely apologize for this mistake. Lily's voice is unrecoverable due to a technical error. Rest assured that this will not happen with other voices. Please let us know your invoice number via email info@aikodex.com and we will recover your quota for the month.
     
  29. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    [Announcement]

    Automatic Quota Reset to 60,000 characters. [Quota changes]

    60,000 characters allotted for the period 1-01-2024 to 31-01-2024.
     
  30. lofwyre

    lofwyre

    Joined:
    Apr 19, 2007
    Posts:
    179
    Hi, this is a great product, working very well thanks. If I could request in regards to voices is it possible to get a few effect style voices like monsters, demons, ork, dwarven, etc. It would really help fill out fantasy style games.

    Cheers
     
  31. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Yes of course, we’ve been working on an entirely new model and that will have a very wide variety of voices you can use. Including all the characters you’ve mentioned! However, it could take up to 2 weeks or a bit longer to deliver as it is an entirely separate asset. Please stay tuned!
     
  32. NewCo-Tech

    NewCo-Tech

    Joined:
    Jan 28, 2020
    Posts:
    7
    Hello, we are very interested in your plugin, and we have few questions.
    We are working on an application, where we want to have some products and descritpions for that product, and the idea is to have some text-to-speech plugin that can read description instead of user.

    1. Is it possible to use this plugin on iOS and Android devices?
    2. If is possible, do we need internet connection or we can use plugin in offline mode?
    3. How many languages we can use in one application? Intereted only for iOS and Android apps.

    Thank you in advice! :)
     
  33. Rafal_Marchewka

    Rafal_Marchewka

    Joined:
    Apr 1, 2021
    Posts:
    5
    I would like to ask if its possible to stress the accent on individual words in a phrase? The documentation only mentions how to make pauses, add emotion and change tempo, but I didn't find any information on how to stress the accent in individual words or emphasize particular words when generating voices.
     
  34. ZenWayne

    ZenWayne

    Joined:
    May 8, 2020
    Posts:
    20
    I also hope for Chinese language support. According to Steam reports, Chinese users make up over 25% of the user base, I believe many game developers would benefit from this support. As for server network issues, if you're targeting Chinese developers, I don't think it's a significant problem. In fact, many of them use VPNs to address network issues. Perhaps you only need to make a simple statement regarding network issues.
     
    Last edited: Apr 19, 2024
  35. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Wuyan_ren,

    Thank you for the statistics, and we know that Chinese developers make up a very large chunk of the user base. The good news is, DeepVoice already supports Chinese, so if you type/paste (as plain text) Chinese text into DeepVoice, the language output should be Chinese. The reason we say we do not officially support it is because at times, the AI might confuse it for Japanese texts. We are still trying to resolve this issue. This occurs when the common characters in both the languages are used to generate text. We have had complaints from users that the AI was confusing the two languages which led to a character loss. Please let us know your experience with the Chinese language generated by DeepVoice if you plan to try it out. You can send us a feedback on info@aikodex.com
     
    ZenWayne likes this.
  36. biscito

    biscito

    Joined:
    Apr 3, 2013
    Posts:
    138
    60,000 per user for the my app or for my total user in my app ?
     
  37. biscito

    biscito

    Joined:
    Apr 3, 2013
    Posts:
    138
    I like that
     
  38. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    60,000 characters are offered per month to you. You can distribute them to your users further but they will be using your character quota of 60,000.
     
  39. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    [Announcement]

    Automatic Quota Reset to 60,000 characters. [Quota changes]

    60,000 characters allotted for the period 1-02-2024 to 29-02-2024.
     
  40. Shoebox11

    Shoebox11

    Joined:
    Sep 25, 2015
    Posts:
    6
    When I imported this asset I get an error right off the bat and Deep Voice does not show up under the Window menu.

    Error is Assets\DeepVoice\Editor\Scripts\DeepVoiceEditor.cs(7,13): error CS0234: The type or namespace name 'EditorCoroutines' does not exist in the namespace 'Unity' (are you missing an assembly reference?)
     
  41. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    To resolve the issue with the missing Editor Coroutines package, please go to the package manager, search for Editor Coroutines in the Unity Registry and click on install. The package is offered for free by Unity in 2020.x and is inbuilt in versions 2021.x and above in Unity SRPs.
     
  42. Shoebox11

    Shoebox11

    Joined:
    Sep 25, 2015
    Posts:
    6
    that got rid of that error. Using 2022.3, but now getting several other errors the first on different script lines

    1. Assets\DeepVoice\Editor\Scripts\DeepVoiceEditor.cs(1178,62): error CS0029: Cannot implicitly convert type 'System.Type' to 'Type'

    2. Assets\DeepVoice\Editor\Scripts\DeepVoiceEditor.cs(1174,48): error CS1061: 'Type' does not contain a definition for 'GetMethod' and no accessible extension method 'GetMethod' accepting a first argument of type 'Type' could be found (are you missing a using directive or an assembly reference?)

    I am using with some other assets installed already I will try a fresh install and see if I get any errors to rule out any asset systems causing an issue
     
  43. biscito

    biscito

    Joined:
    Apr 3, 2013
    Posts:
    138
    Well, I'm in need of a in device real-time version. You have the best quality, no doubt. I'll buy it when there is a real-time version available

    VoiceGPT ??

     
    Last edited: Feb 9, 2024
  44. lofwyre

    lofwyre

    Joined:
    Apr 19, 2007
    Posts:
    179
    Hey @AiKodex how is the update coming a long (or separate package)? Keen for those more fantasy oriented voice types.

    Cheers
     
  45. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Shoebox11,

    Apologies for the late response. Yes please, could you try a fresh install in a new project and let us know the results. We have not come across the errors you are facing. They could be caused due to interreference with other assets (even though that shouldn't be the case due to different namespaces).
     
  46. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello biscito,

    Yes, VoiceGPT will come out next week if all goes well!
    The generation is quite fast compared to DeepVoice. You can even use it on any device as we now run on HTTPS with an SSL certificate.
     
  47. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hey lofwyre,

    The new asset, VoiceGPT, will be released this week or the next if all goes well!
     
    TomLeeLive likes this.
  48. Micho_Dev

    Micho_Dev

    Joined:
    Nov 5, 2019
    Posts:
    9
    Have you changed or updated on Lily's voice on mono? I can't get the example voice you used.
     
  49. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    It seems that there was a delay regarding this. The asset will be released soon according to the Asset Store Team. We will keep you informed.
     
  50. AiKodex

    AiKodex

    Joined:
    Jan 21, 2021
    Posts:
    373
    Hello Micho_dev,

    Yes, the voice of Lily was changed - the only voice that was changed and replaced. The voice has been replaced with a similar voice now.