Deep learning is a field within machine learning which uses algorithms that contain many layers of processing and transformations. x.ai data scientist Adam Kleczewski created the above visualization by training a type of deep learning model called a Recurrent Neural Network (RNN) on the scheduling related emails in our data base.
Blue – Noun
Purple – Verb
Orange – Name
Green – Adjective
Red – Conjunction
Yellow – Adverb
A RNN makes predictions based on sequential data. When a RNN is trained on sequences of words, it learns to represent each word as a high dimensional vector which encodes the model’s understanding of that word. By projecting these high dimensional vectors into a two dimensional space, it’s possible to visualize their relationships and glean insights into the concepts that the model has learned.
In the above visualization the position of the word is determined by the two dimensional projection of its word vector. The size of each word reflects the frequency of the word in our dataset, and the color of each word indicates the word’s part of speech.
If you take a step back and view the image as a whole, the large scale structure of the image is determined by words’ part of speech. Nouns tend to lie in the center of the image, verbs tend to lie on the upper right side, and first names form a large orange cluster in the bottom left part of the image.
If you look more closely, you will find many clusters and regions that contain words with similar meanings. For example, the days of the week and the months of the year each form a tight cluster on the far left side of the image. Positive words expressing approval (“right”, “correct”, “great”, “good”, etc.) form a group in the upper left side of the image. City names and other location related words are located to the right of the orange cluster of names. Further to the right of the location region, you’ll find a group containing the names of a number universities (“Harvard”, “Stanford”, “NYU”, “USC”, etc.). Below the cluster of universities, the names of a number of popular social media companies (“Google”, “Facebook”, “LinkedIn”, “Twitter”, “YouTube”, and “Pinterest”) appear.
The RNN learned all of this semantic understanding without a human ever having to code a definition of concepts like nouns, verbs, universities, cities, meetings, or social media. This is the power of deep learning algorithms. They are able to discover relevant information in a dataset without a human programmer ever explicitly telling the algorithm what to focus on. With millions of emails in our database, x.ai has entered a regime where deep learning algorithms have become a viable means of tackling our most difficult natural language processing challenges.