We're hiring!

Continuous 3D Hand Pose Tracking using Machine Learning & Monado OpenXR

Marcus Edel avatar

Marcus Edel
April 20, 2021

Share this post:

Our hands are our primary operating tools, so their location, orientation, and articulation in space is vital for many human-computer interfaces. Automated hand pose estimation can be very useful for diverse applications such as virtual/augmented reality (XR), sign language recognition, gesture recognition and robotics. Collabora is particularly interested in using hand pose estimation in XR as this application meshes nicely with our work on Monado, the world’s first open-source OpenXR runtime.

Recent interest in hand pose estimation is driven by the marked advantage it can give to many fields, such as virtual sports coaching and factory worker safety. Pose estimation has the potential to create a new generation of automated tools designed to precisely measure human movement. In addition, pose estimation enhances existing applications in a broad range of areas, including: Augmented Reality, Animation, Gaming and Robotics. This is not by any means an exhaustive list, but it includes some of the primary ways in which pose estimation is shaping our future.

Although the two fields of hand pose and body pose estimation have significant overlap regarding their objectives and difficulties, hand pose estimation has a unique set of problems such as lack of characteristic local features, pose ambiguity, and substantial self-occlusion, making it a challenging problem to solve.

Pose Tracking using Machine Learning

Traditional vision-based markerless methods for 3D hand pose estimation rely heavily on depth information, requiring either multi-view configurations or depth cameras. However, such hardware requirements add severe limitations to possible applications by significantly increasing the set up overhead and cost. Depth cameras have the added constraint of only working in indoor scenes, and requiring relatively high power consumption.

To circumvent this problem, we have tackled the challenging task of estimating the hand joint locations for both hands using only monocular RGB input images. To be more concrete; as part of a project backed by INVEST-AI, a program managed by IVADO Labs, we have developed a multi-stage neural network-based solution that accurately locates and tracks the hands despite complex background noise and occlusion between hands. Our system estimates 2D and 3D joint locations without any depth information.

Below is a preview of what the hand tracking looks like. We are currently working on integrating it into the Monado XR codebase, so it can be used out-of-the-box with different devices.

Hand Pose Dataset

In tandem, we are working on a large-scale real-world hand pose dataset that will allow us to train a pipeline that outperforms current work in 3D using only RGB data.

Custom build camera-rig with 12 cameras to capture hand-poses from different viewpoints.

Last but not least, our solution is built to run on resource-constrained devices such as the VIM 3 and the Rock PI N10.

In our next post, we will dive into the machine learning details of our innovative pipeline. This will be followed by a third post covering how we recorded and annotated our large-scale real-world hand pose dataset. Stay tuned!

Comments (1)

  1. Trevor Flowers:
    Apr 20, 2021 at 06:44 PM

    Excellent and timely now that the WebXR Hand Input API is progressing rapidly.

    Reply to this comment

    Reply to this comment

Add a Comment

Allowed tags: <b><i><br>Add a new comment:

Search the newsroom

Latest Blog Posts

Quick hack: Patching kernel modules using DKMS


DKMS is a framework that is mostly used to build and install external kernel modules. It can also be used to install a specific patch to…

Build your own application with GTK 4 as a Meson subproject!


Building GTK 4 as a Meson subproject for your own application is not only useful for Windows builds, but also for many Linux distributions…

Profiling virtualized GPU acceleration with Perfetto


Recently, we have been using Perfetto to successfully profile Apitrace traces in crosvm through VirGL renderer. We have now added perfetto…

Continuous 3D Hand Pose Tracking using Machine Learning & Monado OpenXR


As part of a project backed by INVEST-AI, a program managed by IVADO Labs, we have developed a multi-stage neural network-based solution…

An easy to use MTP implementation for your next embedded Linux project


Did you know you could run a permissively-licensed MTP implementation with minimal dependencies on an embedded device? Here's a step-by-step…

Rust: integrating LLVM source-base code coverage with GitLab


Earlier this year, the Rust compiler gained support for LLVM source-base code coverage. In this post we'll explain how to setup a CI job…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2021. All rights reserved. Privacy Notice. Sitemap.

Collabora Limited is registered in England and Wales. Company Registration number: 5513718. Registered office: The Platinum Building, St John's Innovation Park, Cambridge, CB4 0DS, United Kingdom. VAT number: 874 1630 19.