Multi-view 3D Pose Measure 基于多目动作捕捉的毫米级运动评估测量

July 11, 2022

This technology uses the traditional computer vision multi-view 3D reconstruction and GAN motion capture technology. Based on the coarse monocular 3D motion capture, the data of multiple cameras are automatically frame synchronized to form the 3D motion capture data of multiple angles under the same time series, and then based on the principle of binocular visual depth detection, the 3D motion data is filtered and corrected, so as to realize the fine 3D measurable motion capture system of multiple cameras (see the following notes for the specific technical principle). This system can obtain the detailed motion analysis results from the athlete video taken by multiple cameras, such as the ankle distance at each time point, wrist movement speed and so on. This technology can obtain millimeter level real-time motion measurement data when the camera resolution reaches 2k and the shooting reaches 120 frames / s. This technology is applicable to the fine motion analysis of various professional training, such as javelin throwing events, broad jump, pole vault and other indoor and outdoor field events. At the same time, there are few restrictions on the application conditions. Four cameras can be set up easily. It can be directly set up and deployed when the tested players are single, and the area of the court is not larger than the size of the basketball court.


Based on deep learning and human3.6m data set, 3D motion estimation of 16 key points of human body is carried out. In the case of single camera, the rough three-dimensional coordinate data of key points of human body in space are given.

Rotate the three-dimensional spatial position data of joint points of all action sequences to face the same direction (determined by the angle of shoulder joint relative to the longitudinal plane). Then, the matching calculation is carried out. The case of the smallest gap fitting in each sequence is taken as the case of time unified matching. All image sequences are clipped and matched into a group of time synchronized multi angle image sequences.

Based on the principle of binocular triangulation ranging, binocular measurement is carried out by pairwise matching, and then supplemented into multi camera stereo vision measurement to measure the distance from each key point to the camera.

The 3d pose data of human key points in the synchronous time series are solved by Kalman filter and linear regression, or by deep learning, and the data of human key points that are highly consistent with the actual situation are obtained. Various action evaluation and measurement calculations are carried out according to the coordinates of key points.


  1. 多目视觉深度检测原理
  2. 基于的三维动作捕捉原理
  3. 多图像序列自动帧同步过程
  4. 基于最小二乘、卡尔曼滤波或深度学习的误差去除原理