For over 15 years, Mango Languages has prided itself on providing our users with a high-quality language learning experience. Over the years, we have done this by developing a teaching methodology grounded in best practices from Second Language Acquisition and Learning Science research, curating a group of highly-trained linguists and language teachers to manually develop and edit course content, and incorporating engaging and effective app features. And believe it or not, we have relied exclusively on audio recordings from real, native-speaking humans.
That’s right! Our 130+ language courses contain hundreds of thousands of words, phrases, and sentences in over 70 languages… and every single one of those words, phrases, and sentences was recorded by a native speaker. In fact, even the slower, articulated recordings available for each word were recorded by native speakers just… speaking slowly and clearly articulating. Unlike competing language learning apps, we don’t artificially slow down audio.
In all honesty, this is by no means the fastest or cheapest way to get audio, and many companies happily rely on language generated by artificial intelligence (AI), such as text-to-speech (TTS) technology. So, why has Mango stuck with native speaker recordings? Well, when it comes to language learning, we know that it’s not just the quantity but the quality of language input that matters. For us, that has always meant real, native-speaker recordings. (By the way, you certainly can get quality input from non-native speakers, too. But that’s the subject of another blog post.
But times are changing! By now, you’ve probably noticed that AI has been getting exponentially more powerful over the past few years, with an explosion in the past year with the arrival of sophisticated ‘large language models’ (LLMs) like Open AI’s ChatGPT or Google’s Bard. The truth is, in many situations, AI-generated language can pass for the real thing! In fact, it’s indistinguishable from human beings. Listen to the two sentences below. Can you tell which one was recorded by a native speaker, and which one was generated by AI?
How can we leverage TTS to support more learners?
Feeling more confident in the quality of audio generated by TTS, we’ve been exploring the myriad of benefits that come with these automated yet sophisticated tools. Here are a few solutions that neural TTS (that’s a fancy way to say TTS powered by AI!) brings to some of the challenges we face in language technology (and some caveats).
Build new courses quickly. When a political crisis or natural disaster displaces thousands of people overnight, learning a new language becomes a necessity for both refugees and humanitarian workers. And sometimes unexpected cultural phenomena drive language learning demands. TTS allows us to respond to such needs (or wants) within weeks or even days, instead of months or years.
Make updates quickly so that our users can benefit from them sooner. Language and culture are continually evolving and changing, which means that we are constantly updating our courses. TTS will allow us to adapt more quickly and update our courses without waiting on recording processes and the availability of in-demand voice talent. While it’s easy enough to make changes to, say, our Spanish course because we practically have native Spanish-speaking voice talent on speed dial (and we love our recording studio!), this isn’t the case for languages for which we don’t have ready access to native speakers. TTS libraries are available for nearly 150 languages, which makes it much faster and easier to get audio for those less-common languages.
Support personalized learning content. We want to enable users to take a more active role in their learning process, which means making choices about what they are learning. There are several ways in which TTS can support this. We just recently launched a new vocabulary tool. This includes Mango-curated vocabulary lists — and we sourced thousands of native-speaker recordings for these words. But users can also create their own vocabulary lists consisting of words and phrases that are important to them! We can’t possibly add native-speaker recordings for all of the new words that our users add to their lists in real time, but TTS can!
Another tool, Mango Reader, allows learners to read pretty much anything on the internet (in 12 available languages at the time of this publication) and instantly translate and listen to words they don’t know. This tool is powered by Google’s state-of-the-art translation engines and TTS models.
The long and short of it is that we can leverage TTS to create more content more quickly and meet the needs of more learners. We’ve thoroughly tested the TTS that we use to generate audio, and we are confident in its quality. Does this mean that we’re just going to hand over the reins to computers? Of course not. We’ll still always send humans in for quality control. And a lot of the time, will replace some of the TTS (like placeholder audio) with language recorded by native speakers. But this takes time! With the help of TTS, we can produce a working course that you can use much sooner.
While the technology around AI, including TTS, is incredibly sophisticated and can perform what may seem like linguistic magic, it has its limits. Our internal stress test of this technology has exposed many of its weaknesses and shortcomings. And so, while we’re optimistic about this technology, we are approaching it with caution to ensure that our content developers and our learners experience the best of it.
Ultimately, the costs of introducing AI, including TTS, into our courses are vastly overshadowed by the benefits that AI brings, most notably that it allows us to provide a more agile product and respond more quickly to user needs. We’re excited to see where the future will take us!