Principles of Facial Animation in LE3 - ME3Tweaks/LegendaryExplorer GitHub Wiki

What is FaceFX and What Does it Do?

FaceFX is a proprietary facial animation system designed for use with Unreal Engine and created by OC3 Entertainment. It was used in the creation of both the original Mass Effect trilogy and the Legendary Edition. The exact version used was 1.7.3.1 and BioWare made modifications to accommodate Wwise (the audio encoding system ME2, ME3, and LE2 and LE3 use) as well as an internal animation tool, “RoboBrad.” RoboBrad is a system for assisting in generating base performances and setting up default camera shots for scenes. A version of RoboBrad was also used for Dragon Age titles. 

The editing and creation of new FaceFX Animations (FXA) for use in the Mass Effect Legendary Edition is possible through the Legendary Explorer (LEX) and its bespoke FaceFX Editor (FFXE).

All characters' lip-sync is controlled through FaceFX animation files, referred to in the toolset as FXAs. These files contain instructions that control expressions and mouth shape information, as well as offering more precise control over head movements than can be achieved through the use of Gestures alone.  FXAs work in conjuction with, but separately from, Gestures and Poses. See information on LEX's Matinee Editor for more information on these aspects of animation.

FFXE is capable of importing and exporting animation data as JSON files.

Tips for Learning

  • A basic working familiarity with LEX's Dialogue Editor and Sound Explorer is highly recommended before getting to grips with FFXE. Facial animation is an integral part of dialogue editing. A general understanding of animation in 3D helps greatly as well, however, basic principles are explained.
  • FFXE's main complexity comes from the fact that, unlike conventional animation environments, results cannot yet be previewed live. Additionally, lines are animated shape by shape, and results are seen by viewing the conversation in-game. With practise, fewer iterations are necessary before making a line look believable, but be prepared to take some time with FFXE.
  • The key for good custom facial animation using FFXE is to understand it is less about strict accuracy, and more about good landmarking. Focusing on getting the larger shapes at the right time will go a long way.
  • Focus on getting m_Jaw+ right before moving on to other shapes. The jaw is a good landmarker for visualising the line and identifying where to place other shapes.
  • To get a good sense of jaw movements, try saying the words aloud with your finger held to your chin.
  • Have your line's audio file open in an audio editor like Audacity that shows its data along a marked timeline. This allows for better coordination between what you see and hear.

Basic Anatomy of the FFXE

The basic layout of the FFXE, running on LEX's predecessor, ME3Explorer, displaying a line from ME3

The Animations Column

The Animations column looks very similar across all human and asari FXAs, and displays each type of animation available in the FXA. Individually, these are called Tracks.
The first 15 tracks in the list are called phonemes in animation terms, and these correspond to the shapes human mouths make when they speak. Not every letter appears on the list, and this is because many sounds share the same shape. Humans and asari share phoneme sets. It should be noted that turians, krogan and salarians each have their own, and their FXAs will look different as a result. For more information on specific phonemes, see the Table of Phonemes section.

The tracks labelled Orientation Head  and Emphasis Head control the positioning of the head in different ways. Pitch controls up and down, Yaw controls side to side, and Roll deals with tilt. Gaze Eye controls the positioning of the eyes.

Tracks after these are Gestures and Emotion sets. Many of the Gestures that can be found in FXAs are the same as those controlled using the Matinee Editor, but the animator has more control over their speed, duration and intensity in FFXE.

The Lines Column and Info Pane

The Lines Column shows all FXAs assigned to a specific speaker within a given conversation. It shows their subtitles if available, and the length of the line's associated audio asset. The list is gender-specific. If a Female FXA was opened, only the character's FXAs used when speaking as/to a female will be shown in the column, and vice versa for Male. See “FFXE and Gender” for more information on why this is.

The Info Pane displays information about the FXA's basic name, its WwiseEvent name, and the InterpID it is associated with. It is possible to rename FXA files from within this pane and associate them with different lines, however the applications for this procedure are beyond the scope of this article at present.

The Timeline Display

The grey graph area displays a selected track's information, shown as patterns of dots and lines. These dots are called keys, and the lines between them are called tangents. Animations have minimum and maximum weight values. A key tells the game how strongly an animation should be present on the face at a given point in time. The tangent shows how the shape will change in strength over time to get to the next key. Sharp, spiky movements represent rapid, dramatic shifts, and smooth curves represent slower transitions. The movement that occurs between keys is called tweening, in animation terms. Understanding and controlling the relationship between keys and tangents is essential for creating convincing animation. On the graph, left and right is backwards and forwards in time respectively, and up and down is weight value. 

Basic Controls

Left-Click
Selects keys. Displays a key's tangent bezier manipulator, used for fine control on curves.

Right Click Menu:
Add Key: Places a key at the exact position of the cursor in both time and weight value. This can be adjusted after the fact.
Add Key with 0 Weight: Places a key at the exact position of the cursor in time, but at 0 weight value. Excellent for quickly establishing peaks and valleys.
Offset All Keys After This Point: Pushes or pulls a track's keys back and forth in time by defined amounts past the point of the cursor only. Entering negative numbers will pull things back in time, positive numbers pushes forward.

Key Slide:
To slide a key up or down in weight value (vertical) left-click and drag up or down.
To slide a key up or down in time (horizontal) hold Shift, left-click and drag.
To slide a key in both directions at once, hold CTRL, left-click and drag.

 

Fixed Time Span

The effects of all tracks stack together to produce an animation. By default, the timeline graph shows each track's information from the first key to the last key in its individual sequence. This means that by default, tracks appear at different proportional ranges to each other. Fixed Time Span is a feature that locks the display to defined values in time, and this persists across all tracks, showing them in proportion with each other. This allows for a better understanding of where each track is at any given point in time, and allows the animator to focus in or out.

Reference Curve

Any track may be set as a Reference Curve by right-clicking it in the Animations column. This displays the track's information behind those of any other track, and displays it proportionally to whatever track is selected.


FFXE and Gender

Every line of dialogue has two FXA assets, a male and female variant. Shepard always uses the Male FXA if he is male, and Female FXA if female.
Mark Meer and Jennifer Hale deliver identical lines with different emphasis and timings. This means that player character lines must often be animated twice, because information that works for one will very often not be accurate for the other. 

Although NPC audio is always stored and delivered from the Male asset section, if the player's Shepard is female, the Female FXA will always be used. This is an artefact of the localisation infrastructure to accommodate languages other than English. A potentially useful side effect of this is that characters can deliver the same audio asset with different expressions depending on Shepard's gender. For stability reasons, always ensure dialogue lines have a male and female FXA variant. If the dialogue line cannot be encountered by both sexes, both FXA variants should still be present, but only one need be accurate.

The original animators will have had access to an auto-generation system to compensate for this issue. As of yet, no form of lip-sync automation exists for FFXE.

Understanding Emotion Sets

In general terms, Emotion Sets are “shortcuts” for changing a character's general expression as they speak. Technically speaking, each Emotion Set references a facial morph and its intensity, in much the same way that phonemes work.

In the Animations Column, an Emotion Set will be a track named something like: E_Neutral_Perplexed and generally exhibits values from 0  to 1.

Emotion Sets have different letters in front of them. Presumably these affect different areas of the facial morph, however their exact purpose is unknown:
E S Stern1
E B Stern1
E Y Stern1
E WB Stern1
All of these are individual tracks, and often have very similar curves if present together in the same FXA. Experimentation is required, but often any one of the correct Emotion Sets will give a generally acceptable effect.

Understanding m_Jaw+

m_Jaw+ is the track controlling the opening and shutting of the character's jaw, and the biggest landmarker where it comes to speech. Getting good at understanding this will allow you to read along with a line just by looking at its keys. At a value of 1, it is all the way open, and a value of 0 is closed. It is very rare to see the jaw open at a value greater than 0.40 during typical speech,  with 0.60 considered the maximum threshold for a widely open jaw. Values higher than this may start to look strange. 
Once a character begins speaking, jaw movements usually fluctuate between 0.40 and 0.20, with sharp dips towards 0 for plosives and other sounds requiring a closed jaw. Keeping your jaw movements in this range will appear closer to vanilla animation values.

Jaw- does exist, and appears to only really be used to make the jaw flutter closed more during plosive sounds. Most everything required for animation can be achieved through using Jaw+ alone.

Lip-Sync Recipes in LE3/ME3

The following is a table providing baseline suggestions for how to create various shapes utilising this system. Examples have been collected by studying vanilla FXAs. It should be noted that these are generalised suggestions intended as a starting point, and are not definitive. Further alterations may be required due to delivery of the line. Languages other than English may have different values that work better. These will work for humanoid characters, including humans, asari, and EDI.

Target Sound (IPA) As In m_Jaw+ Movements Other Phonemes Notes
b, p buy, cab, pie, cap At no more than 0.06 throughout sound M: peak at 0.70 Jaw-: peak at 0.11 Open: peak cascades off M's completion at 0.11 These sounds are referred to as plosives. Sometimes, adding OW at a low value  can help push the shape further if necessary.
d, t, dj, tj dye, cad, tea, cat, dew, tune - Flap: peak at 0.60 The surrounding sounds dictate jaw state. Flap controls tongue touching front teeth.
giraffe, justice, edge, joke At no more than 0.11 throughout sound OW: peak at 0.30 -
ʃ Shepard, wish, issue, motion At no more than 0.20 throughout sound OW: peak at 0.40 -
θ, ð thorn, throw, wrath, sheathe, thy, father, breathe At no more than 0.20 throughout sound TH: peak at 0.95 Add Open at a low value for a stronger appearance.
h,  æ,  aɪ,  eɪ,  ɪ, ʌ high, hat, price,  face, kit, strut Peak at 0.40 over target sound - Most straight vowel sounds only need  the jaw to open wider to look effective. If the shape needs more, use EH  or Open at low values.
j yes, you, hallelujah Peak should cascade off EE's completion with peak at 0.20 EE: peak at 0.20 If the next sound is any vowel other than O, release EE to 0 as normal. If the next letter is O, maintain EE at 0.20 throughout the OW sound.
f, v love, strafe, violin - FV: peak at 0.95 This phoneme is slow to form. Whilst most sounds operate well within 0.10s, FV looks best when it is allowed to “dominate” the word's animation. Recommend 0.20.
u, hw, oʊ, uə you, brew, woo, whine, follower, influence At no more than 0.11 for duration of sound OW: peak at 0.60 Open: peak at 0.13 -

Video Tutorial

If you learn better by watching, have a look at this resource video which explains these and other topics in detail, on YouTube.