How AI Could Solve the Metaverse’s Language Problem
LAS VEGAS — A key of the metaverse is its inherent democratization: Everybody on the planet theoretically has access to the same virtual world(s), with none of the usual barriers to entry that geography imposes.
Except for one. If the people meeting in the metaverse don’t speak the same language, sharing an experience could get difficult. Sure, there are services like Google Translate and Skype Translator that do a good job of offering written or spoken translation, but the issue quickly becomes one of scale: Those services are generally designed for one-to-one conversations, whereas a metaverse experience often wants to incorporate dozens, if not hundreds of people. If each person is speaking their own language, that’s a hard problem to solve.
Enter Onemeta AI. Its Verbum service, which is debuting at CES 2023, can provide real-time translation of up to 50 individuals, all speaking a different language (it supports 82 and 40 dialects, the company says). And it doesn’t just deliver real-time transcripts — the AI can provide voice, too.
“You could have 50 people on a Zoom call, and they could each have their own native tongue,” explains David Politis, spokesperson for Onemeta. “They would hear someone speak in Japanese, but they would then hear it in English or in Italian or in Russian and onscreen they would see it in their language as well.”
We got a chance to demo Verbum at CES on Thursday night. As we spoke through a headset to a woman in Central America, the system translated our words to Spanish and her responses to English. Although there was a slight delay, the conversation felt natural and flowed well. Words were transcribed within a second being spoken, and the AI voice — which sounded as good if not better than the TikTok lady — came on about a second after that.
Onemeta is initially aiming Verbum at group meetings for international teams, but the service is clearly applicable to metaverse experiences as well: Imagine a MMORPG where the users are all over the world and want to talk to each other quickly in real-time situations (think Call of Duty), or an esports tournament where the audience wants to both understand the action and socialize with each other at the same time.
“The most commonly spoken language is English,” says Politis. “But if your native language is Portuguese or Russian, your English is rarely going to be the same as your native language. And so there is going to be miscommunication — it’s just going to happen. We can eliminate almost all of that.”
There’s definitely a need for what Onemeta is offering with Verbum, but its success will depend on whether others — in particular Microsoft and Google, which have resources Onemeta doesn’t — rise to meet the same challenge.