How we reduced our annual server costs by 80% — from $1M to $200k — by moving away from AWS

An interview with Zsolt Varga, the tech lead and general manager at Prerender

Trey Huffine
Level Up Coding

--

This week we interviewed Zsot Varga, the lead engineer and manager at Prerender.io. He shares how Prerender saved $800k by removing their reliance on AWS and building in-house infrastructure to handle traffic and cached data.

“The goal was to reduce costs while maintaining the same speed of rendering and quality of service. Migrations like this need to be carefully planned and executed, as incorrect configuration or poor execution, would cause downtime for customer web pages and social media clicks and make their search rankings suffer and potentially increase our churn rate.”

=> Be interviewed in Level Up Coding ➡️ Fill out this form
=> Looking for an amazing job? ➡️
Visit the Level Up hiring platform

Can you describe Prerender and the most interesting technical problem you’re solving

Prerender, in simple terms, caches and prerenders your JavaScript pages so search engines can have a pure HTML file to crawl and index, and all it needs is to have the proper middleware installed on the site, avoiding users the pain of costly and long JavaScript workarounds.

However, all this data and processes need to happen on a server and, of course, we used AWS for it. A few years of growth later, we’re handling over 70,000 pages per minute, storing around 560 million pages, and paying well over $1,000,000 per year.

Or at least we would be paying that much if we stayed with AWS. Instead, we were able to cut costs by 80% in a little over three months with some out-of-the-box thinking and a clear plan. Here’s how you could too.

Planning a Migration: Our Step-by-Step Guide

Up until recently, Prerender stored the pages it caches and renders for its clients using servers and services hosted on Amazon Web Services (AWS) — being AWS one of the largest cloud providers, offering virtual servers and managed services.

Prerender had hitherto used AWS to store the pages it cached until they were ready to be picked up by Google, Facebook, or any other bot/spider looking for content to be indexed. This provided much of Prerender’s functionality — delivering static HTML to Google and other search engines, and dynamic, interactive JavaScript to human users.

The problem? Storing multiple terabytes of prerendered web page contents in this way on a 3rd party server is hugely expensive. Storing the cached pages in this way was costing Prerender astronomical amounts of money in maintenance and hosting fees alone.

But there was another catch that not many start-ups take into account and there’s not too much of a conversation around it: traffic cost.

Getting data into AWS is technically free, but what good is static data for most software? When moving the data around it became a huge cost for Prerender and we started to notice the bottleneck that was holding us back.

The solution? Migrate the cached pages and traffic onto Prerender’s own internal servers and cut our reliance on AWS as quickly as possible.

When we did a cost projection we estimated that we could reduce our hosting fees by 40%, and decided a server migration would save both Prerender and our client’s money.

The goal was to reduce costs while maintaining the same speed of rendering and quality of service. Migrations like this need to be carefully planned and executed, as incorrect configuration or poor execution, would cause downtime for customer web pages and social media clicks and make their search rankings suffer and potentially increase our churn rate.

To mitigate the potential consequences, we planned a three-phase process by which we could easily revert back to the previous step if anything went wrong. If for whatever reason the new servers didn’t work, we could easily roll back our changes without any downtime or service degradation noticeable to customers.

The caveat with continual and systematic testing is that it takes place over weeks and months.

Moving Prerender Away From AWS: a Weekly Overview

Phase 1 — Testing (4 to 6 Weeks)

Phase 1 mostly involved setting up the bare metal servers and testing the migration on a small and more manageable setting before scaling. This phase required minimal software adaptation, which we decided to run on KVM virtualization on Linux.

In early May, the first batch of servers was running, and 1% of Prerender traffic was directed to the new servers. Two weeks into the migration, we were already saving $800 a day. By the end of the month, we’d migrated most of the traffic workloads away from AWS, reducing the daily chrome rendering workloads costs by 45%.

On the server-side, our cost was currently at $13K per month. Combined with AWS, we had already cut our expenses by 22%.

The testing phase was crucial to make sure the following processes would run smoothly. We worked on improving the system robustness with more monitoring & better error handling. Besides the server monitoring dashboard we already had, we also set up a new rendering monitoring dashboard to be able to spot any error or performance issue that occurred.

Thanks to our constant monitoring and clear communication, tests were successful, our savings projections were exceeded and everything was in place to start phase 2 of the migration.

Phase 2 — Technical Set-Up (4 Weeks)

The migration period between June and early July was mostly technical set-up after the first phase of the migration served as a proof of concept. Implementation of the second phase mostly involved moving the cache storage to the bare metal servers.

When the migration reached mid-June, we had 300 servers running very smoothly with a total 200 million cached pages. We used Apache Cassandra nodes on each of the servers that were compatible with AWS S3.

We broke the online migration into four steps, each a week or two apart. After testing whether Prerender pages could be cached in both S3 and minio, we slowly diverted traffic away from AWS S3 and towards minio. When the writes to S3 had been stopped completely, Prerender saved $200 a day on S3 API costs and signaled we were ready to start deleting data already cached in our Cassandra cluster.

However, the big reveal came at the end of this phase around June 24th. In the last four weeks, we moved most of the cache workload from AWS S3 to our own Cassandra cluster. The daily cost of AWS was reduced to $1.1K per day, projecting to 35K per month, and the new servers’ monthly recurring cost was estimated to be around 14K.

At this point, there were still some leftovers on S3 which cost around $60 per day and would completely die out naturally in a few weeks. Although we could have moved all the data out to cut it to zero immediately, it would have left us a one-time “money waste” of $5K to move data out of AWS.

Moving data around is where you’ll start running into huge bottlenecks. In the words of our new CTO (Zsolt Varga):

The true hidden price for AWS is coming from the traffic cost, they sell a reasonably priced storage, and it’s even free to upload it. But when you get it out, you pay an enormous cost.

Small startups often don’t calculate the traffic cost, even tho it can be 90% of their bill”

For example, if you are in the US West(Oregon) region, you have to shell out $0.080/GB whereas in the Asia Pacific (Seoul) region it bumps up to $0.135/GB.

In our case, it was easy around the $30k — $50k per month. By the end of phase two, we had reduced our total monthly server costs down by 41.2%.

Phase 3 — Implementation and Scaling (4 to 6 weeks)

At this stage, the migration was well underway and was already saving Prerender a considerable amount of money. The only thing left to do was migrate all the other data onto the native servers.

This step involved moving all the Amazon RDS instances shard by shard. This was the most error-prone part of the whole process, but since a fair amount of the data had already been migrated, any hiccups or bottlenecks wouldn’t have brought the whole migration crashing down.

Here’s a big picture view of this last stage in the migration process:

  • We mirrored PostgreSQL shards storing cached_urls tables in Cassandra
  • We switched service.prerender.io to Cloudflare load balancer to allow dynamic traffic distribution
  • We set up new EU private-recache servers
  • We keep performing stress tests to solve any performance issues

The migration proved to be a resounding success in the end. Our monthly server fees dropped below our initial estimate of 40% to a full 80% by the time all the cached pages were redirected.

What We Learned

There is a lot at stake in a server migration if things go wrong or fall behind schedule. That’s why we made sure to implement fail safes at each stage of the migration to make sure we could fall back if something were to go wrong. It’s also why we tested on a small scale before proceeding with the rest of the migration.

We avoided the dangers by carefully planning each stage of the migration, testing each stage of implementation before scaling, and making it easy to correct any errors should anything go wrong. That way, we could reap the benefits of saving on server fees while keeping any potential risks to a minimum.

What motivated you to work on the problem that Prerender solves?

I was excited by the idea to work on a platform that helps to move the web forward.

You see, with Prerender our customers are rolling out user experience-focused websites and instead of concentrating on SEO they provide the best for their customers. In the past years anytime we built a new landing page we always used Wordpress just to get the best SEO out of it and reserved the power of SPA’s only for the non-indexed pages like the administration section. But now, I work with a company which helps to solve problems that held me back in the past ^.^

What technology stack do you use, and why did you choose this stack?

We use Javascript everywhere, since we solve the “issues” caused by Javascript rendering we want to build as much expertise as possible in this field. But for the other parts, we are taking advantage of CloudFlare’s distributed system for fast response and global scalability. While our uptime guarantees are supported by Digital Ocean’s cloud platform. We also use a myriad of other SaaS providers to maximize our effectiveness.

What will the world look like once your company achieves its vision?

When the question comes up “Can we use React for our new site?” the answer will be “For sure!”, because right now the marketing departments are always vetoing anything which can reduce the SEO ranking. I would say, rightfully. As for our customers even if they lose a 1% of effectiveness they would need to pump their ads budget with hundreds of thousands of dollars.

What does a typical day look like for you?

Haha, lots of customer calls! As we aim to keep our dedicated team small and effective, I am more often than not in the onboarding calls with them. But it’s fun for me! I always loved to talk with customers, learn about their situation, and talk about solutions. This makes my job a lot easier, since we don’t have to come up with ideas, our customers are telling us everything we need to know. And I believe this is the best kind of situation, to be customer driven and my KPI is the number of happy customers.

Describe your computer hardware setup

Oh my, this would be worth an article itself. I am kinda a geek, and have 8 dedicated servers at home, while I am mostly working on my macbook for convenience. But when I get time for programming I spin up my “workstation” which runs Manjaro. But rarely when I get a bit of me time, I secretly turn on my windows pc for gaming. And at time of writing, I am surrounded by laptops, raspberries, and tablets as well.

Building machines and running downscaled tests is my late-night hobby.

Describe your computer software setup

VSCode is a definitive solution for me, I am not really fond of any programming language and it gives me the freedom to just install an extension and write IDE supported code in seconds. Also, I had the luck to be in the beta group for CoPilot and it is a definitive game changer.

For source control GitHub is awesome, but I would never discount other solutions either. GitLab has become a really awesome tool in recent years.

Messaging, I think Slack still is the most widespread professional choice, and since it does its job, there is no reason to switch away from it. But recently I found a very interesting software called Spike and for the past 3 months, I have been using it as my de facto email client as it makes email conversations much easier.

Essential tools: Docker, there is no other way, it changed the industry for the best. I still remember the dark old days when we had to install dependencies and solve package conflicts…

But yeah, Kubernetes slowly is on the same level of adaptation.

Do you have any advice for software engineers who are just starting out?

Don’t be afraid to talk with the customers. Throughout my career, the best software engineers were the ones who worked with the customer to solve their problems. Sometimes you can sack a half year of development time just by learning that you can solve the customer’s issue with a single line of code. I think the best engineers are creating solutions for real world problems.

Are you hiring and for what roles?

Always! We always aim to only hire when we can ensure that our new colleagues will have a meaningful role and they make a definite contribution. But at the moment we have grown so much that we need to grow our team in every department. So, instead of listing just check our career page :D https://saas.group/career

Where can we go to learn more?

Check out our site at prerender.io and if you are interested to have a call with me about prerendering and how it changes the web reach me in email at varga@prerender.io I am always happy to jump on a call and learn about your situation and use cases ^.^

Zsolt Varga is the General Manager of Prerender, a Google-recommended software tool used by more than 12,000 companies that allows search engines to better crawl and index Javascript websites.

Level Up Coding

Level Up is a community of 3 million monthly developers (learn more and follow or read more startup interviews). We also work with the best startups and most innovative tech companies 🔥

We also provide free tools for developers to grow their careers: coding interview course, automatic resume builder, portfolio API

Follow us: Twitter | LinkedIn | Newsletter

--

--

Founder | Software Engineer. Building products that empower individuals.