A new attack framework aims to infer keystrokes typed by a target user at the opposite end of a video conference call by simply leveraging the video feed to correlate observable body movements to the text being typed.
The research was undertaken by Mohd Sabra, and Murtuza Jadliwala from the University of Texas at San Antonio and Anindya Maiti from the University of Oklahoma, who say the attack can be extended beyond live video feeds to those streamed on YouTube and Twitch as long as a webcam’s field-of-view captures the target user’s visible upper body movements.
“With the recent ubiquity of video capturing hardware embedded in many consumer electronics, such as smartphones, tablets, and laptops, the threat of information leakage through visual channel[s] has amplified,” the researchers said. “The adversary’s goal is to utilize the observable upper body movements across all the recorded frames to infer the private text typed by the target.”
To achieve this, the recorded video is fed into a video-based keystroke inference framework that goes through three stages —
- Pre-processing, where the background is removed, the video is converted to grayscale, followed by segmenting…