Lip-syncing robot learns to speak and sing using AI

Columbia engineers create a robotic face that learns to lip-sync with speech and singing by observing itself and humans, aiming for more natural, human-like interactions. The technology could transform communication in entertainment, education, and care settings.

Reuters

January 22, 2026

Lip-syncing robot learns to speak and sing using AI

FILE PHOTO: A message reading "AI artificial intelligence", a keyboard, and robot hands are seen in this illustration taken January 27, 2025.

Dado Ruvic/Reuters

FILE PHOTO: A message reading "AI artificial intelligence", a keyboard, and robot hands are seen in this illustration taken January 27, 2025.

Columbia University engineers have built a robotic face that can learn to move its lips in sync with speech and singing by watching itself in a mirror and then observing humans in online videos, aiming to make humanoid robots appear less "uncanny" in face-to-face interaction.

In a study published in Science Robotics, the team describes a two-step "observational learning" approach rather than programming fixed rules for facial motion.

"We used AI in this project to train the robot, so that it learned how to use its lips correctly," said Hod Lipson, James and Sally Scapa Professor of Innovation in the Department of Mechanical Engineering and director of Columbia’s Creative Machines Lab.

First, a robotic face driven by 26 motors generated thousands of random expressions while facing a mirror, learning how motor commands change its visible mouth shapes.

Next, the system watched recordings of people talking and singing and learned how human mouth movements relate to emitted sounds.

"That learning is a sort of motor-to-face kind of model," added Lipson.

"Then using that learned information of how it moves and how humans move, it could sort of combine these together and learn how to move its motors in response to various sounds and different audio."

With both models combined, the robot could translate incoming audio into coordinated motor actions and lip-sync across a range of languages and contexts without understanding the audio's meaning, the researchers said.

They also demonstrated how their robot used its abilities to articulate words to even sing a song called "metalman" from of its AI-generated debut album “hello world_.”

The results are not perfect: the team reported difficulty with sounds such as "B" and puckering sounds like "W," and said performance should improve with more exposure.

Lipson said their lip motion research is part of a broader push toward more natural robot communication in applications such as entertainment, education and care settings.

"I guarantee you, before long, these robots are going to look so human. People will start connecting them and it's going to be an incredibly powerful and disruptive technology," he added.

Production: Matt Stock/Reuters