Facial data capture
The Facial Capture extension collects quantitative data such as facial feature points, head rotation, and head translation. You can use this data to drive 3D facial stickers, headgear, pendant applications, or digital humans, adding more vivid expressions to virtual images.
This guide is intended for use-cases where the facial capture extensions is used independently to capture facial data, while a third-party rendering engine is used to animate a virtual human.
To avoid collecting facial data through callbacks and building your own collection, encoding, and transmission framework, use the Agora MetaKit extension for facial capture.
Prerequisites
Ensure that you have:
- Implemented the SDK quickstart in your project.
- Integrated Video SDK version 4.3.0, including the face capture extension dynamic library libagora_face_capture_extension.so.
- Obtained obtained the face capture authentication parameters authentication_informationandcompany_idby contacting Agora technical support.
Implement the logic
This section shows you how to integrate the Facial Capture extension in your app to capture facial data.
Enable the extension
To enable the Facial Capture extension, call enableExtension:
- Java
- Kotlin
rtcEngine.enableExtension("agora_video_filters_face_capture",    "face_capture",     true,     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE);rtcEngine.enableExtension("agora_video_filters_face_capture",    "face_capture",     true,     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE)When you enable the Facial Capture extension for the first time, a delay may occur. To ensure smooth operation during a session, call enableExtension before joining a channel.
Set authentication parameters
To ensure that the extension functions properly, call setExtensionProperty to pass the necessary authentication parameters.
- Java
- Kotlin
rtcEngine.setExtensionProperty(    "agora_video_filters_face_capture",    "face_capture",    "authentication_information",     "{"company_id":"xxxxx","license":"xxxxx"}",     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE);rtcEngine.setExtensionProperty(    "agora_video_filters_face_capture",    "face_capture",    "authentication_information",     "{"company_id":"xxxxx","license":"xxxxx"}",     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE)Retrieve facial data
Retrieve the raw video data containing facial information through the onCaptureVideoFrame callback.
- Java
- Kotlin
public boolean onCaptureVideoFrame(int sourceType, VideoFrame videoFrame) {    if (null != videoFrame.metaInfo) {        VideoFrameMetaInfo metaInfo = videoFrame.getMetaInfo();        SparseArray<IMetaInfo> customMetaInfo = metaInfo.getCustomMetaInfo(            "FaceCaptureInfo"        );        if (null != customMetaInfo && customMetaInfo.size() >= 1) {            String face_info =                ((FaceCaptureInfo) (customMetaInfo.get(0))).getInfoStr();            Log.d(TAG, "Face Info: " + face_info);        }    }    return true;}override fun onCaptureVideoFrame(sourceType: Int, videoFrame: VideoFrame): Boolean {    videoFrame.metaInfo?.let {        val metaInfo = videoFrame.metaInfo        val customMetaInfo = metaInfo.getCustomMetaInfo("FaceCaptureInfo")        if (customMetaInfo != null && customMetaInfo.size() >= 1) {            val faceInfo = (customMetaInfo[0] as FaceCaptureInfo).infoStr            Log.d(TAG, "Face Info: $faceInfo")        }    }    return true}Currently, the facial capture function outputs data for only one face at a time. After the callback is triggered, you must allocate memory separately to store the facial data and process it in a separate thread. Otherwise, the raw data callback may lead to frame loss.
Use facial information to drive virtual humans
The output facial data is in JSON format and includes quantitative information such as facial feature points, head rotation, and head translation. This data follows the Blend Shape (BS) format in compliance with the ARKit standard. You can use a third-party 3D rendering engine to further process the BS data. The key elements are:
- faces: An array of objects, each representing recognized facial information for one face.- detected: A float representing the confidence level of face recognition, ranging from 0.0 to 1.0.
- blendshapes: An object containing the face capture coefficients. The keys follow the ARKit standard, with each key-value pair representing a blendshape coefficient, where the value is a float between 0.0 and 1.0.
- rotation: An array of objects representing head rotation. It contains three key-value pairs. All values are floating points between -180.0 and 180.0.- pitch: The pitch angle of the head. Positive values represent head lowering, negative values represent head raising.
- yaw: The angle of head rotation. Positive values represent left rotation, negative values represent right rotation.
- roll: The tilt angle of the head. Positive values represent right tilt, negative values represent left tilt.
 
 
- translation: An object representing head translation, with three key-value pairs:- x,- y, and- z. The values are floats between 0.0 and 1.0.
- faceState: An integer indicating the current face capture control state:- 0: The algorithm is in surface capture control.
- 1: The algorithm control returns to the center.
- 2: The algorithm is restored and not in control.
 
- timestamp: A string representing the output result's timestamp, in milliseconds.
This data can be used to animate virtual humans by applying the blendshape coefficients and head movement data to a 3D model.
Disable the extension
To disable the Facial Capture extension, call enableExtension:
- Java
- Kotlin
rtcEngine.enableExtension(    "agora_video_filters_face_capture",     "face_capture",     false,     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE);rtcEngine.enableExtension(    "agora_video_filters_face_capture",     "face_capture",     false,     Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE)Reference
This section contains content that completes the information in this page, or points you to documentation that explains other aspects to this product.
Sample project
Agora provides an open source sample project on GitHub for your reference. Download or view Face Capture for a more detailed example.