V-Hands: Touchscreen-based Hand Tracking
for Remote Whiteboard Interaction
UIST 2024

  • Xinshuang Liu1,2,†
    xinsliu01@gmail.com
  • Yizhong Zhang2,‡
    yizhongzhang1989@gmail.com
  • Xin Tong2
    xtong.gfx@gmail.com
  • 1UC San Diego
  • 2Microsoft Research Asia
  • †Work done during the internship at Microsoft Research Asia
  • ‡Corresponding author

V-Hands: An innovative technique for enhancing remote communication through real-time hand gesture visualization using touchscreens. (a) The presenter interacts with a digital whiteboard, with their hand gestures captured and interpreted by the touchscreen in real time. (b) The audience can directly visualize these hand gestures, providing a seamless and interactive experience. (c) With an edge-mounted camera, the system functions as a lightboard, allowing both parties to write and draw as if separated by a piece of transparent glass. This technique requires only a touchscreen for hand tracking, making it compatible with various devices such as cell phones, iPads, and laptops. V-Hands opens up significant potential for commercial applications in remote communication.

Abstract

In whiteboard-based remote communication, the seamless integration of drawn content and hand-screen interactions is essential for an immersive user experience. Previous methods either require bulky device setups for capturing hand gestures or fail to accurately track the hand poses from capacitive images. In this paper, we present a real-time method for precise tracking 3D poses of both hands from capacitive video frames. To this end, we develop a deep neural network to identify hands and infer hand joint positions from capacitive frames, and then recover 3D hand poses from the hand-joint positions via a constrained inverse kinematic solver. Additionally, we design a device setup for capturing high-quality hand-screen interaction data and obtained a more accurate synchronized capacitive video and hand pose dataset. Our method improves the accuracy and stability of 3D hand tracking for capacitive frames while maintaining a compact device setup for remote communication. We validate our scheme design and its superior performance on 3D hand pose tracking and demonstrate the effectiveness of our method in whiteboard-based remote communication.

Examples

Given the capacitive frames of two hands of a subject interacting with the touchscreen (first row), our method reconstructs the corresponding 3D hand poses (second row) in real time. The ground truth hand poses of the captured capacitive frames are displayed in the third row for comparison. Note that our method can accurately reconstruct various poses of two hands, even for overlapped hands as demonstrated in (c).

Acknowledgements

The website template was borrowed from Michaël Gharbi and MipNeRF360.