

Google’s need to address the possibility of deepfakes is more than just technical, though. It’s also a service that other communication platforms want to have, sparking Zoom’s acquisition of Kites for that purpose in June. Last year, Google Translate added a real-time transcription feature not long after incorporating an instant translation feature for Google Assistant on Android. Google has been keen to promote its translation services, adding new features and availability regularly. Such progress poses concerns on related techniques being misused for creating spoofing artifacts, so we designed Translatotron 2 with the motivation of avoiding such potential misuse.” “The performance of voice conversion has progressed rapidly in the recent years, and is reaching a quality that is hard for automatic speaker verification systems to detect. “The trained model is restricted to retain the source speaker’s voice, and unlike the original Translatotron, it is not able to generate speech in a different speaker’s voice, making the model more robust for production deployment, by mitigating potential misuse for creating spoofing audio artifacts,” the researchers wrote. By skipping over the identification of the previous Translatotron, the AI will ignore attempts to translate someone’s words into a different voice. The update also resolves an issue present in the first Translatotron where people could exploit the technology to speak as themselves in one language, and have the translation sound like an entirely different person, even just using samples played from standard TVs and radios. “Experimental results suggest that Translatotron 2 outperforms the original Translatotron by a large margin in terms of translation quality and predicted speech naturalness, and drastically improves the robustness of the predicted speech by mitigating over-generation, such as babbling or long pause,” the researchers wrote.


Translatotron 2 is better at translating languages than its predecessor model and processes and recites speech faster and with fewer errors than its earlier incarnation. The result sounds more natural and friendlier than a pure text or artificially-voiced translation. The system encodes the source speech, picks out the right sword sounds, known as phonemes, and synthesizes the decoded results into whatever language the user chooses. Translatotron and its successor are designed to listen to someone speaking in one language, translate what they are saying into a second tongue, then broadcast the translated speech as though the original speaker were now fluent in another language. The researchers published details of Translatotron 2 in a paper this month. Translatotron 2 performs better as a translator and voice mimicker but deliberately cuts out the potential for synthesizing someone else’s voice as a convincing deepfake, which was raised as a concern after the 2019 release of the first Translatotron. Google researchers have created a new version of the Translatotron AI translation model that recreates a speaker’s voice in a different language.
