Computing, Data, Startup

Reflecting on our tech stack in 2017

Computing, Data, Startup

It has been three and half years since we founded x.ai. We have built an AI scheduling assistant. Our assistant (which goes by the names Amy and Andrew Ingram) is a fully autonomous agent. To build Amy and Andrew, we’ve needed to collect and process an enormous amount of scheduling related emails (5 million and counting).

For that reason, the technology choices we made early on have been critical to our success so far. I’d like to take a moment to reflect on these choices and their evolution. Though we are constantly evaluating and changing our setup, this post might be useful for entrepreneurs launching their own AI startups and looking for inspiration.

Computation

Back in 2014, AWS was really THE game in town with some of the other players starting to play catch up. We were on AWS in our past venture and saw no reason to change. Today we’re still 100% on AWS and super happy with it. We love seeing the continued innovation and new features that come out year after year. Having a Virtual Private Cloud right out of the gate set us on the right footing. We continued to buy AWS Reserved Instances to optimize for cost. We even sold some mismatched Reserved Instances in the market place. Recently, we’ve been tightening our security and with a feature like AWS Cloudtrail a switch away saved us countless hours.

Programming Languages

We started x.ai with 2 languages in production: Scala and JavaScript. On the one hand, we leveraged Scala’s strength in functional programming and type safety to build complex decision logics and machine learning modules. On the other, we leveraged JavaScript’s powerful libraries and vast communities to build all of our web apps and the APIs that connect to the outside world. Over time, we introduced Python to the stack as well. The cutting edge of Machine Learning is happening in Python. We don’t want to miss out there. Having 3 languages in production does have its challenges. Throughout the years, we continued to battle syncing schemas across languages and their associated frameworks and tools. We have some contract tests but much work is still needed.

Code Deployment and Task Management

AWS CodeDeploy was our initial choice for deploying many of our applications. Like many of the AWS services, CodeDeploy was easy to set up and met many of our early needs. It certainly helps that we were 100% on AWS. As our task management needs grew, we layered in Mesos and Marathon for resource management and started running various Singularity jobs on it. At this point in time, we have a small number of applications in Docker and we are actively looking to expand Docker’s footprint in our system. We also got started early with Circle CI for continuous integration. Circle CI worked well with both our Scala and JavaScript codebases. Over time, we migrated to Circle 2.0 and built additional integration with it.

Logging, Monitoring and Analytics

We have all kinds of services in these areas. When it came to metrics and visualization, we started with AWS Cloudwatch + Librato for engineering and BIME for business, selections driven primarily by cost. Librato was nicely integrated with Cloudwatch, provided the basic time series visualization we needed, and came with basic arithmetic capabilities. We felt Librato was a pretty good value. BIME was painful. Its integration with our Mongo Database was clunky and the UI was simply horrible. Not too long ago, we upgraded our stack. We dropped Librato and BIME. We added Datadog and New Relic for engineering metrics and monitoring and SiSense for Business Analytics. We are much happier now and plan to keep this stack for a while. Lastly, we’ve had Loggly for centralized logging since the beginning. It’s growing well with us, and we’re happy enough to continue to be a customer 🙂

Queues and Storage

Very early on in designing our infrastructure, we knew we wanted to incorporate queues into our system to decouple our various services. We picked AWS SQS. We loved that right out the box it was highly scalable and available while being quite cheap. We were up and running quickly and it just worked. Over the years, we built additional functionalities around SQS to address additional, more complex use cases but SQS remains a core part of our system.

Since day one, MongoDB has been our primary database. To read more of our thoughts about Mongo, check out this post. In addition, we leverage AWS S3 to store data that we don’t need real-time access to, such as EBS images and data backups.

Final Thoughts

Building an applied AI solution from scratch is really challenging. We knew it would be hard going into it but it turned out to be even harder than what we anticipated (entrepreneurs are eternal optimists after all). Picking a tech stack is similar to many decisions in a startup. Making a decision and moving forward quickly is usually the best course of action. Do some research and then jump in. Iterate later when you have better grasp of your specific problems and needs.