Challenge:
Our client is an established Gates Foundation-backed EdTech company offering an online learning and evaluation platform for educators. They approached APrime to modernize their platform architecture, which consisted of multiple Rails applications hosted on dedicated application server VMs – all of which were out-of-support.
Maintaining virtual machines consumed most of the DevOps team’s time, and each code release took multiple days of manual effort and needed to be shepherded by both engineers and ops. With compatibility issues preventing upgrades of the outdated VMs and applications, the company faced two major challenges: security vulnerabilities and slow feature rollouts.
The client asked us to modernize their infrastructure, allowing them to more easily upgrade versions, scale out their services, and reduce the operational toil of running their platform.
Solution:
Our team quickly identified that the biggest threats to business continuity were the security vulnerabilities stemming from an inability to upgrade outdated infrastructure and packages. In response, we planned a two-step approach:
- Containerize the applications and migrate them off out-of-support Ubuntu VMs + enable Ruby upgrade to address that immediate security thread
- After that, move on to reducing operational toil by migrating the containers to run on serverless infrastructure.
We started the containerization effort by implementing a series of custom Docker base images based on Alpine Linux for a variety of different Ruby versions, as well as a handful of NodeJS versions needed to build static frontend assets. This opened the path for upgrading and testing newer Ruby versions. We then containerized each application by splitting each application process into individual components and creating an image for each; for example, we split the background processor from the web server. This allowed us to move each component on its own, reducing the risk of the migration and also enabling the engineers to scale each process separately.
As the applications were containerized, we launched Ubuntu VMs in EC2 that were running an in-support release, and ran the images for the application in Docker on the VMs. We progressively added these VM targets to the Application Load Balancer, and were able to entirely decommission the old VMs that were riddled with security vulnerabilities. As an added benefit, containerization also simplified the local development setup by reducing build and run dependencies to just Docker.
With the immediate security threat mitigated, we shifted focus to reducing operational toil by moving the containers to a serverless infrastructure. To remove the burden that comes with managing EC2 instances directly, we moved the applications to run on fully-managed and abstracted hardware. Because the client’s DevOps team was small and had almost no Kubernetes experience but were well established in AWS, we decided to utilize AWS ECS with Fargate serverless compute.
We built several extensible modules onto their existing Terraform repository to spin up ECS clusters, ECS services, and load balancing to run applications in ECS for their various pre-production and singleton production environments. We also built modules for observability and alerting, since this new infrastructure exposed a whole new set of metrics and log streams. Once the ECS clusters were initialized, configuring an application with a load balancer, dashboards, alerts, secrets, etc required modifying a single HCL file in their Terraform repository.
In the previous setup, it would take a DevOps engineer multiple days to spin up an application in a single environment, but with ECS and Terraform they could spin up an application in all of their environments in only half a day – with the added benefit of out-of-the-box dashboards and alerts.
Our team worked with the DevOps and engineering teams to convert three of their highest-churn applications, and then provided runbooks and training to enable that DevOps team to independently execute the remaining migrations.
Transformation:
By the end of our engagement, the client’s DevOps team was leveraging their new infrastructure platform for easier, faster and more frequent deployments. Deployment time dropped from a full week to just a few hours – mostly spent on manual feature validation – leading to significant productivity gains and faster delivery of bug fixes and new features to users.
With a dramatically improved security posture, an easy path for future upgrades and increased team bandwidth, the client is now poised to achieve their ambitious innovation and scaling goals.
Let Aprime help you overcome your challenges
and build your core technology
Are you ready to accelerate?