Client: Formosa Interactive

Need: Seamless simultaneous facial and audio capture

Author: Chris McMahon


Facial capture within a voiceover booth is sometimes seen as a technological constraint on artistic performance – but this needn’t be the case. By integrating Faceware’s ProHD Headcam hardware into its pipeline, sound studio Formosa Interactive can deliver simultaneous facial and audio capture without breaking the rhythm of performance or production.

“Voiceover artist” is an apt term for a profession that’s just as much an art form as that witnessed on set or stage. Within the intimate confines of a VO booth, artists pour heart and soul into performances comedic, dramatic and everything in-between. The four walls of the booth comprise an almost sacred space, where actors must remain unperturbed to give themselves over to the performance at hand.

Considering this, it may seem counterintuitive to introduce a head-mounted tracking system into the process – a piece of gear that could potentially hinder a performance rather than empower. But this need not be the case.

pa050094Using Faceware’s ProHD Headcam hardware, Los Angeles-based sound studio Formosa Interactive has found simultaneous voice and face capture to be swift, simple, and adaptable process – one that enables sound engineers and actors alike to do their job without any of the hassle.

Read on to learn how Formosa has implemented Faceware into its voiceover pipeline, from initial capture to delivering results in post.

Immersive Performance

Formosa Interactive was established as a dedicated division of Formosa Group in 2013 following considerable demand from the video game industry.

“We have nearly 20 seasoned and experienced full time employees who have provided world-class, high-quality post-production audio content for video games over the last two decades,” explains William “Chip” Beaman, vice president of Formosa.

Formosa’s clients are some of the biggest in the industry, with projects including Activision’s Call of Duty: Infinite Warfare, Bethesda and id Software’s Doom reboot, Microsoft’s animated series Halo: The Fall of Reach, and indie-developer Camouflaj’s Republique.

Beaman’s own history stretches back more than a decade. He’s seen firsthand how advances in technology have minimized the hardware and hassle required to capture effective facial performances: “You used to need huge amounts of lumbersome equipment that didn’t allow for any flexibility in the capture process,” he says. “Now it’s very little – solutions like Faceware’s head-mounted cameras make things incredibly simple. They’re easy to set up, they have a robust end-to-end pipeline, and most importantly they don’t slow things down.”

Ultimately, this empowers Formosa to achieve simultaneous voice and face capture within its VO booths in a way that’s comfortable for the actors and streamlined for the sound engineers.

“It’s an incredibly versatile setup,” says Dave Natale, Formosa’s recordist and lead engineer.

“Faceware’s solutions are easy to get up and running and easy to tear down, and it all integrates seamlessly into our workflow. In the world of voiceover work, where you have to get good results fast, that ease-of-use is vital.”

Fast Results, Fluid Production

Formosa relies on Faceware’s ProHD Headcam hardware to capture raw data right in the booth. Formosa’s clients then take that data into post, using Analyzer software to track the performance without the need for facial markers. The performances can then be sent to Retargeter for final animation prep.

Capturing voicework and facial performance simultaneously has both creative and practical benefits. For starters, the capture is more true-to-life than it would be if the voice work and facial movements were recorded at separate times.

“There are huge advantages to getting that true lip-sync,” says Natale. “If you’re putting together asynchronous audio and facial capture, you could end up with one foot in the uncanny valley. With Faceware Headcams, since the video is recorded in combination with the audio in the VO booth, the output is an exact frame lock. That results in a more vivid recreation of reality that’s also true to the actor’s original performance.”

This approach is possible because Faceware’s ProHD Headcams are lightweight, comfortable, and unobtrusive, meaning actors can wear them without being drawn out of the performance they’re delivering.

“We use the exact same mics and headcams that are used on the mocap stage, so the actors are comfortable and familiar with them,” says Natale. “That means a more natural final performance: things are totally organic. It allows us to create those perfect moments that really sell a scene.”

In Call of Duty: Infinite Warfare, for instance, Formosa was brought on board to capture audio and facial performances for picture-in-picture comms sequences. Here, the faces of non-player characters appear on screen, creating a focal point around those elements and the dialogue being delivered. Thankfully, using its in-booth Faceware ProHD Headcam setup, the actors were fully comfortable in their performances, delivering audio and video that perfectly matched the subtle movements and inflections of their character’s faces.

pa050136In practical terms, considering the quick-and-easy setup of Faceware’s hardware, it’s not just the rhythm of the actor’s performance that remains uninterrupted, but that of the production itself. The wheels of the studio can continue to turn while clients receive great quality capture.

“The downtime between scenes recorded on stage is much longer compared to just recording VO in a booth with us,” says Natale. “Faceware doesn’t slow down how we work – logistically it’s much simpler. Our sessions tend to be more focused and fluid, allowing the team to run through the recording agenda at a much faster clip.”

Staying Synchronised

There’s another reason why Formosa chooses Faceware ProHD Headcams: “Faceware is by far the easiest to work with, because I don’t have to run two recording rigs,” explains Natale. “It’s 100 times easier with Faceware than it is with the other tech, because, technically-speaking, it just runs and records video in the background. It’s so simple to get up and running.”

This ease is exemplified in the ability to quickly track specific takes: “While other vendors require a unique time code for every shot of the day which, in turn, requires running and tracking multiple clocks and having expensive recorders, Faceware’s hardware allows us to use the same time code data for multiple takes,” affirms Beaman.

Having that locked time code makes it easier to go back and deliver enhancements or tweaks to a performance after the fact, saving even more time later on in the post process: “We can get the exact same time over and over; the data doesn’t foul up and require someone to go through by hand with each file and adjust it,” explains Natale. “It saves about half of the editing time on our side, because we don’t have to match between multiple sessions with different operating clocks.”

Keeping Rhythm

In-booth facial capture with Faceware ProHD Headcams is not only cheaper and easier than asynchronous capture of audio and performance, but it also saves the studio many headaches down the line. Formosa calls Faceware “second-to-none” in providing brilliantly lifelike performances across all types of digital media, reducing time investment and increasing creative iteration later on in the production cycle. And that, in turn, means great, immersive games.

“Faceware hardware cuts down on time, tracking, and expense,” says Beaman. “It’s a precise method offering reliability, affordability, and truly excellent results. The versatility of taking it into a VO booth too is just the icing on the cake.”


Using Faceware in a VO Booth

Setting up Faceware’s tools in a voice-booth is an incredibly simple process – and the setup utilized by Formosa will work in 99% of all similar booths. Below, performance capture supervisor Christopher Jones describes the process and approach.

How do you set up a Faceware ProHD headcam?

“The ProHD headcam arrives in Pelican case and can be unpacked and set up within 15 minutes. Fitting an actor is an intuitive process, and helmet sizing is simple: with three different helmet sizes and three different thicknesses of padding for the front, top, and back, a secure fit can be found by trying different choices until the actor finds the one that’s comfortable.

“Next, an adjustable belt holding the battery pack is put on the actor, and finally the camera is attached to the helmet. An operator will frame up the camera directly in front of the actor’s face and set focus. The directors and producers can then view the realtime HD feed from the camera in full resolution in the control room to gauge the nuance of performance.”

What are the benefits of simultaneous voice and face capture?

“It’s a growing trend we’ve seen for years. From a creative standpoint the actor knows the cohesiveness of their emotional expression will stay true to their intended performance. This includes really hitting the lip sync and bringing more of their true intent to their character’s animation.

“From a practical standpoint, there will be certain jobs that cannot be won without that capacity for simultaneous capture. It’s also a possible service revenue stream: a VO studio can bill clients for capture equipment, labor, post work, and media management.”

How do you sync the performance with Pro Tools software?

“This process is straightforward, and can be accomplished in many ways, in part due to the HD-SDI video signal produced by the ProHD Headcam. Most often that signal is routed into a Digital Disk Recorder that accepts audio and timecode from Pro Tools. At Formosa, their deck is setup to record automatically when Pro Tools records, using the Timecode to trigger recording and keep things perfectly in sync for post.”

How is the data then processed in post using Faceware’s Analyzer and Retargeter?

“It’s in post that the Faceware Creative Suite Pipeline plays its part. Once the performance is captured, the data is run through Faceware’s Analyzer software to track even the most minute details of an actor’s facial performance. This is performed quickly and easily via an intuitive workflow; technicians can easily read and trim their imported footage by time-code, frame, or job length, streamlining the animation pipeline.

“The Autotrack feature furthers this efficiency by tracking the performance with one button, or the technician can create a custom Tracking Model to refine and pinpoint the exact performance desired. These global tracking models can be shared between users for greater efficiency across an entire animation team.

“The data is then exported to Faceware’s Retargeter – a plugin to Autodesk Maya, Motionbuilder, and 3ds Max. Retargeter uses the tracking data exported from Analyzer to produce high-quality, lifelike facial animation.

“Retargeter operates with any character or rig; if you can keyframe it, Retargeter can drive it. Users can easily teach Retargeter how they want their rig to work with a simple Character Setup process. Just load your Analyzer performance data and use the intuitive pose-based workflow to achieve realistic, high-quality animation. The Shared Pose Libraries can further increase consistency and speed. If time is short, the AutoSolve feature can automate the animation process with command-line access and batch commands.”