Multi-view 3D Pose Measure 基于多目动作捕捉的毫米级运动评估测量

July 11, 2022

This technology uses the traditional computer vision multi-view 3D reconstruction and GAN motion capture technology. Based on the coarse monocular 3D motion capture, the data of multiple cameras are automatically frame synchronized to form the 3D motion capture data of multiple angles under the same time series, and then based on the principle of binocular visual depth detection, the 3D motion data is filtered and corrected, so as to realize the fine 3D measurable motion capture system of multiple cameras (see the following notes for the specific technical principle). This system can obtain the detailed motion analysis results from the athlete video taken by multiple cameras, such as the ankle distance at each time point, wrist movement speed and so on. This technology can obtain millimeter level real-time motion measurement data when the camera resolution reaches 2k and the shooting reaches 120 frames / s. This technology is applicable to the fine motion analysis of various professional training, such as javelin throwing events, broad jump, pole vault and other indoor and outdoor field events. At the same time, there are few restrictions on the application conditions. Four cameras can be set up easily. It can be directly set up and deployed when the tested players are single, and the area of the court is not larger than the size of the basketball court.

本技术使用传统计算机视觉的多目三维重建技术和成熟的动作捕捉技术,将二者相结合。基于各个摄像机的粗略单目三维动作捕捉数据,将多摄像机的数据进行自动帧同步,构成同一时间序列下的多个角度的三维动作捕捉数据,而后基于多目视觉深度检测原理,进行三维动作数据的过滤和矫正,从而实现了多摄像机精细三维可测量动作捕捉系统(具体技术原理参见后文注释)。本系统可以从多摄像机拍摄的运动员视频,获得精细的动作分析结果,例如在各个时间点的踝关节距离、腕关节移动速度等等。本技术在摄像机分辨率达到2k,拍摄达到120帧/秒的情况下,可以获得毫米级的实时动作测量数据。本技术适用于各种专业训练的精细动作分析,例如标枪等投掷项目,跳远、撑杆跳等室内外场地项目。同时应用条件限制少,可以架设四个摄像机、被测运动员单一、面积不大于篮球场大小的场地都可以直接架设部署使用。

Based on deep learning and human3.6m data set, 3D motion estimation of 16 key points of human body is carried out. In the case of single camera, the rough three-dimensional coordinate data of key points of human body in space are given.

Rotate the three-dimensional spatial position data of joint points of all action sequences to face the same direction (determined by the angle of shoulder joint relative to the longitudinal plane). Then, the matching calculation is carried out. The case of the smallest gap fitting in each sequence is taken as the case of time unified matching. All image sequences are clipped and matched into a group of time synchronized multi angle image sequences.

Based on the principle of binocular triangulation ranging, binocular measurement is carried out by pairwise matching, and then supplemented into multi camera stereo vision measurement to measure the distance from each key point to the camera.

The 3d pose data of human key points in the synchronous time series are solved by Kalman filter and linear regression, or by deep learning, and the data of human key points that are highly consistent with the actual situation are obtained. Various action evaluation and measurement calculations are carried out according to the coordinates of key points.

注:

  1. 多目视觉深度检测原理
    基于对极几何的双目三角测距原理,两两匹配进行两两双目测量,而后补足成多目立体视觉测量,进行各关键点到摄像机的距离测量。
  2. 基于的三维动作捕捉原理
    基于深度学习和Human3.6M数据集,进行人体16关键点的三维动作估计。在单摄像机拍摄的情况下,给出人体关键点在空间中的粗略三维坐标数据。
  3. 多图像序列自动帧同步过程
    将所有动作序列的关节点三维空间位置数据,旋转到面朝一致的方向(由肩关节相对于纵平面的角度确定)。然后进行匹配计算,将各个序列中差距贴合最小的情况作为时间统一匹配的情况,将各个图像序列中开始时间最靠后的时刻作为起点时刻,将各个图像序列中结束时间最靠前的时刻作为终点时刻,剪裁所有图像序列,匹配成一组时间同步的多角度图像序列。
  4. 基于最小二乘、卡尔曼滤波或深度学习的误差去除原理
    将同步时间序列中的人体关键点三维空间位置数据,经过卡尔曼滤波和线性回归去抖,或者直接经过专业训练的三维动作数据集进行深度学习去抖,得到和实际情况高度符合的人体关键点三维空间位置数据,根据关键点的坐标进行各种动作评估测量计算。