Last year, we wrote about how we solved the problem of extracting complex information from email texts with the level of accuracy needed by our email scheduling tool.
In this post, we will consider how we used our setup to go from being able to extract information in one language (English) to being able to extract it in N different languages… in a single shot!
For context, x.ai is a scheduling system that helps people get meetings on their calendar. Typically a meeting requires some annoying back-and-forth emails between the parties involved. One popular use of x.ai is the ability to simply CC the Scheduling AI (email@example.com or firstname.lastname@example.org) on your email with your guests. The scheduling engine then automatically detects the relevant meeting information from your email text and follows up with your guests with a set of times that work for you. If you haven’t tried it you should sign up (it’s free, forever!) and give it a whirl.
How x.ai’s Scheduling Engine Works
As x.ai is a fully automated scheduling engine, this implies that it must be able to “understand” the English language and automatically extract the relevant bits of meeting information from it, like the proposed times, the meeting location, the people involved, etc.
Automatically extracting this information from email text falls under natural language processing, a branch of data science that has seen impressive progress in recent years, facilitated by a relentless sequence of technical and theoretical breakthroughs in deep learning. At x.ai we make heavy use of neural machine translation (NMT) models based on recent transformer architectures (e.g. BERT). You can read in more detail about this in our original blog post.
Solving the initial technical challenge was very expensive as it required getting massive amounts of hand-labeled data used to train and get our models to the high level of accuracy the application demands (as a quick side note to the magnitude of this endeavor, we had around 100 people employed over about a 4 year period to hand-label around 32 million data points). What if we wanted to now support the same functionality in Spanish, or Afrikaans, or Chinese? Do we need to repeat the same expensive process of putting humans in the loop to collect the data with which to train in each of those languages? That would be very expensive. Fortunately, there is another solution.
x.ai’s Use of Translation
The same NMT algorithms that we use at x.ai stem precisely from the problem of translating from one language to another. What if we could also use them to detect the source language of a piece of text in a different language and translate it to English before we ran it through our information extraction algorithms?
The success of this approach relies on two assumptions:
- The translation algorithm is “good enough” to produce an English text where the most relevant bits of information (times, locations, people involved) are properly translated.
- Our English language trained models are robust enough against language “noise” to be able to detect the relevant pieces of information even if the contextual text around them is a little “funny.”
For the first assumption, the final English does not need to be fully grammatically correct. In fact, it could even be so off that it would confuse a human. The beauty of it is that one can think of the output of the translator as an internal meeting scheduling language which is never shown to anyone, since only the internal models consume it.
The second assumption is based at least in part on the breadth of our initial data labeling endeavor and whether it captures sufficient variance to train the models to be efficient at extracting the relevant information from a semantic background dominated by noise.
How We Began Testing Our Approach
Given the diverse demographics of our user base, about 3 percent of our system’s incoming meeting requests already happen in languages other than English, providing a perfect testing bed for these assumptions. The program we followed was to insert a language detection/translation model in our production pipeline. If the incoming text was detected to be in a foreign language, it got translated into English. The translated text was then used as the input to our information extraction models.
Say the incoming text was:
Amy, encuentra 30 minutos el martes que viene para una llamada con Manolo y Elias. Elias es opcional.
The translated text was:
Amy, find 30 minutes next Tuesday for a call with Manolo and Elias. Elias is optional.
Which is a perfectly reasonable sounding English email. Our NMT models then correctly deciphered the 30 mins, the Elias being optional and the next Tuesday pieces.
Our scheduling system verifies the information captured on the email with the user before taking a final action. We can track how often the user modifies the detected information in order to get a measure of accuracy. In this way, we systematically collected data for our 10 top languages (Chinese, Spanish, German, Japanese, among others…) and measured the accuracy of this approach. We found that approach worked exceptionally well. In fact, the accuracy was essentially the same as that of running our models on regular English-language emails.
Our Models Have Been Training For This Challenge
In hindsight, this is a huge surprise. Our models are trained on hundreds of thousands of hand-labeled emails, which means they have been exposed to a lot of variance in the way people speak, and the models have “learned” to extract the useful and relevant meeting information from a very noisy language context. Everyone has their own style in writing emails, and email text is often full of grammatically wrong constructs and unclear intents.
Because of this, the additional “noise” introduced by possible errors in the translation did not noticeably affect the model performance. In fact, by hand-checking a bunch of this for sanity, we observed that the time constraints, the location details and the people involved (in other words, the relevant info for our use case) were translated with high fidelity, even if the contextual text around had some weirdness introduced by the translators.
For example, the email in Spanish:
Hola Alvaro, vamos a quedar la semana que viene. Meto al Scheduler para que nos encuentre un rato el lunes por la tarde. yo te llamo.
Got internally translated to:
Hello Alvaro, we are going to meet next week. I put Amy to find us for a while the Monday in the afternoon. I call you.
It will be evident to any native Spanish speaker that the “we are going to meet next week”, the “find us for a while” and the “I call you” are imperfections of the translation, not fully faithful to the meaning of the original. However, our model is for example able to extract that this is a phone call where the sender of the email will call Alvaro under any language variant of “I will call you”, since it has seen all sorts of ways (grammatically correct or not) of expressing that vary information.
Good-Enough Translation, Excellent Results
Even if the full meaning of the original email is sometimes lost in the translation, the specific pieces of information that are actually relevant to the meeting, such as phone numbers, date ranges, etc. … are not. It is hard for us to say whether this approach will allow anyone out there to generalize their NLP support from one to N languages in one shot without training models in every single language. It did for us due to the particulars of our use case and the sheer volume of our training data (which exposed our models to sufficient noise and variance). It should, at least, allow anyone to take a good first stab at the challenge of going “global.”
We hope that this approach is useful to other data scientists doing NLP out there. With good-enough models in English, it is very possible that the important pieces of information your English-based model can extract are not lost in translation.
Feel free to ping me and the team on Twitter for a discussion and give us some feedback over on Product Hunt.