Search Unity

  1. Improved Prefab workflow (includes Nested Prefabs!), 2D isometric Tilemap and more! Get the 2018.3 Beta now.
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. Let us know a bit about your interests, and if you'd like to become more directly involved. Take our survey!
    Dismiss Notice
  4. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice
  5. Want to see the most recent patch releases? Take a peek at the patch release page.
    Dismiss Notice

Advice Please: Character Animation + Facial Animations + Lip Sync

Discussion in 'Animation' started by plmx, Sep 25, 2018.

  1. plmx

    plmx

    Joined:
    Sep 10, 2015
    Posts:
    192
    Hi all,

    I have been developing with Unity for a while now, but have not needed character animation yet. Now is the time and I am unsure which combination of software products to use. Obviously I am using Unity as the game engine, but AFAIK I need additional software. This is for a 3D Virtual Reality Game with close-up interactions of the player with the characters.

    I have the need to create and animate multiple human characters; I would like to do this by modifying existing assets if possible and not start from scratch. I need to include facial animations (laughing, crying, etc.) plus lip synchronization (or animating the lips) for cut scenes.

    Now, I am unsure which tools to use for creating/rigging the actual characters, for creating/using animations like walking, crouching, etc., and for syncing lips to text. I looked at Poser, DAZ Studio, Unity itself (with Asset store assets), ...

    ...what would you recommend for the above requirements and why? If you have any relevant tutorials please share :)

    Thanks,

    Philip
     
  2. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,032
    Wow, not doing anything by halves, are you? That's a pretty ambitious set or requirements.

    I've done a fair amount of work on lip-sync animations, so I might be able to point you in the right direction where that's concerned. Sadly, lip-sync animation is an under-utilized portion of animation, mainly because it's so bloody difficult. For the purposes of VR, I'm assuming 3D modeling. For the most realistic approach, you would probably want to use a combination of bone animation and shape-key animation. The human jaw works as a bone, and should usually be animated by bone. But the lips are too complicated for bone animation, and work better as a series of small shape keys. It's usually a good idea to define and create shape-keys for different flex positions. Then you combine those flex positions in-engine to create different mouth-shape animations. Combine this with the bone jaw and it should be possible to create some very convincing facial positions for phonemes. If your 3D model also has a tongue, you can probably use bones to animate that. Since you don't see the tongue all that often, you can probably get away with only three or four bones for tongue animations.

    All of that is just the rigging. Actually creating the animations themselves is immensely costly if you are doing it by hand. Unfortunately, there aren't all that many options for automatic generation of these animations. But not many is not the same as none.

    Annosoft Command-Line program

    This is a common open-source program that is used for extracting phonetic animation information from recorded audio. It also has an option for providing a text transcription of the audio, which improves the recognition and speeds up the process. This command-line tool uses the Windows SAPI programming library, so it only runs on Windows. I've used this program in the past for automatic lip-sync animation. While it isn't perfect, it is reasonably fast and can provide a decent basis for automating the process. Specifically, it provides phoneme recognition and timing, which is what you really need for higher-quality lip sync animation.

    Once you have your timing, you will have to apply it to your pre-created Unity animation poses. For that you are largely on your own. I cooked up a plug-in a few years back that handled a lot of that for you, but I haven't had time to revisit that plug-in yet, so you will have to tackle the scripting on your own. Thankfully, the scripting shouldn't be that hard, and you can customize it to your needs. Unity is really great for cooking up quick-and-dirty tools. The process is much faster if you don't have to worry about bundling a tool for other people to use.

    As VR becomes more common, the demand for effective Lip-syncing solutions is going to continue to rise. I've been looking into the possibilities for more cross-platform solutions, but there aren't many options. I've got the beginnings of a solution in the works, but it would utilize a mobile-only library, so it would have to run on smartphones. With any luck more effective tools will continue to shape up in the future.
     
    plmx likes this.
  3. plmx

    plmx

    Joined:
    Sep 10, 2015
    Posts:
    192
    Hi Richard,

    thanks very much for your insights; the Annosoft program looks very promising. I will probably need to go the combined bone/shape key approach.

    So, effectively, I will a) need morph and bone positions for all phonemes in my human model, and b) have to create a script which translates the anno output of phonemes+timing into a Unity animation to play back?

    I did a fair amount of scripting for the Unity editor already, and I agree with your assessment both of the quick cooking up of scripts and about bundling, or not bundling, the result ;-)

    Thanks again,

    Philip
     
  4. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,032
    If you are ambitious, and you are developing on a Windows machine, it is actually possible to write a C# script for Unity that will run the Annosoft command-line program, and parse the results right there in Unity. With this approach you don't need to leave the Unity environment. The one downside is that the Annosoft program requires using WAV files as input. And no one really uses WAV files natively in Unity projects, or at least not for speech files. (they're great for sampling, but not so great for extended playback, like speech and music)

    Here's a quick suggestion. If you are planning on exporting your project to a mobile platform at any point, you will want to configure your playback script to take it's current play-head position from the audio file, and not calculate it purely by time. On mobile platforms, there is frequently a slight delay in playback for compressed audio files such as MP3s. If your lip-sync playback is based on just the update loop, it might start playing the animation before the audio kicks in. If you base the animation playback based on the current position of the playing audio, the animation will always be perfectly in sync with the spoken audio.
     
    plmx likes this.
  5. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,032
    Also, another quick pointer for facial animation. When a face changes to make a sound, the actual action starts taking place BEFORE the actual sound is made. When you get timing based on audio, it generally gives you the time that the individual phonemes start and end, but it doesn't necessarily give you the facial lead-in. Thankfully, this lead-in time tends to be fairly brief, and you can easily just program in a setting into a script to automatically create these lead-ins and lead-outs using an adjustable lead-in/lead-out variable. (most likely a float) Just have the settings for a particular pose start animating toward that pose a tenth of a second before they hit that pose. And then give them a tenth of a second after the sound ends to return to the default position. This will give you facial animation that looks more natural and believable.

    For 2D animation using a simple set of sprites, this level of nuance is not necessary. But once you start getting into 3D animation, it helps to include these kinds of details, as the human eye is good at discerning discrepancies when it comes to natural human speech.
     
    plmx likes this.
  6. plmx

    plmx

    Joined:
    Sep 10, 2015
    Posts:
    192
    Hi Richard,

    thanks again for your insights! Running everything through Unity/C# seems like a good way to go, since I am on Windows. And thanks for the heads-up regarding the lead-in.

    If I may ask, how did you create your 3D characters and in particular the shape-keys for the mouth/lip area? Did you create your characters from scratch in Blender or a similar program, or did you work by adaptation in Poser, Daz, or similar?

    Philip
     
  7. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,032
    I've been a Blender-head for quite some time. I just cooked up a quick low-polycount character in Blender for testing purposes. I'm not a fan of the "builder" programs like Poser, DAZ, and the like. I would consider using Make Human, but then I would customize the generated models in Blender after the fact. With the Blender approach, you manually create everything. I created a Preston-Blair set of blend shapes for the facial poses, which is a less flexible approach, but does involve producing the smallest number of shape keys.
     
    plmx likes this.
  8. plmx

    plmx

    Joined:
    Sep 10, 2015
    Posts:
    192
    Cool - I used Blender in the past for (non-character) modeling, so I will give it a try.

    Thanks again for your insights!
     
  9. Automoda

    Automoda

    Joined:
    Apr 27, 2017
    Posts:
    89
    Hey if you ever do a scripted method to use this data, shoot me a copy. I'm an animator and a terrible programmer. Still, I'd like to give lip-sync a try sometime.