How to back up Firestore easily and automatically

tzar
Level Up Coding
Published in
7 min readAug 24, 2020

--

Firebase and Firestore provide a great, fully managed platform for developers to quickly prototype and launch their products. Unfortunately, backups seem to be a neglected part of the platform with no easy to use controls to capture periodic snapshots of your database. While you’re unlikely to lose any data due to hardware failure, backups are still critical because of the other source of failures: developers.

Migrations go wrong, code has bugs. Even if you strive for perfection (and attain it!), will the next person to work on it be so dedicated? You need backups, and as there’s no button to do it in Firestore, we’re just going to have to do it ourselves. Fortunately, it can now be implemented fairly simply by using a combination of Firebase Functions (to initiate and automate the backups) and Google Cloud Storage (to store them).

Step 1: Plan your Backup Strategy

Before you configure anything, you’re going to need to consider the following:

  • How frequently do you want to take your backups? More frequently = more reads & higher data stored = higher cost. Less frequently = higher potential data loss.
  • Can you run them on-demand? Given our primary failure mode is developer failure, backing up as a pre-deploy step might be justified.
  • How often do you want to delete back ups? For certain industries, you’ll have specific time windows that you’ll need to retain data for, which includes backups. You may also not be allowed to retain data after that window ends, which again includes backups. So there needs to be a way to automatically remove them too.
  • Does the GDPR apply? Right to be Forgotten may apply to backups too, which is why I (as did many developers) did a double take when it was first announced and mentally crossed out the EU from my target market in the future. My position on this has since mellowed, but you may not be able to store backups containing identifying information long term if you want to be GDPR compliant unless you have some way to forget personal data from old backups, such as keeping all personal information in a separate database/service with a different configured backup cycle.
It’s a real headache… (Photo courtesy of Unsplash)

With that considered, you should now have a retention time and a regulation induced headache. Let’s get to actually implementing the backups.

Step 2: A place for our backups to go

Create a bucket in Google Cloud Storage. Note that this bucket MUST be in the same region or multi-region as your Firestore database.

  • Go to Google Cloud Storage
  • Make sure you have the correct project selected in the topbar!
  • Click “Create Bucket”
  • Give it a name — I like using a mixture of an obvious description and a not obvious suffix for some bonus security by obscurity
  • Choose the same region or multi-region as your Firestore database. For example, the US multi-region.
  • Choose a storage class. If you’re only planning on keeping your backups for a couple of months at a time, Nearline is ideal as it has a minimum retention period of only 30 days. Otherwise, Coldline has a 90 day retention period and offers further cost savings.
  • For access control, as this bucket will only store backups “Uniform” access control makes the most sense and is the easiest to maintain. We don’t want our backups accessible to anyone other than us and this makes it easy to enforce that.
  • Finally, the advanced settings. This is where we set up our retention period — tick “Set a retention policy” and add your desired configuration.
  • Click “Create”.

Assuming your bucket ended up in the same project as your Firebase application, your application’s service account should automatically have permission to read and write from your bucket. You can double check this by opening your bucket in the console, going to the “Permissions” tab and looking for yourproject@appspot.gserviceaccount.com. It should have the “Storage Admin” role.

Step 3: Coding our Automated Backups

Time to write some code to trigger a backup. The easiest way to accomplish this is to implement a Firebase Function — specifically, a Scheduled Function. I’m assuming that you’ve already used Firebase Functions before and have it set up on your project. If not, do that first.

We need to make sure that our new function can run data exports by granting our service account yourproject@appspot.gserviceaccount.com the “Cloud Datastore Import Export Admin” role. You can either do this through the Google Cloud Console’s IAM interface or from the command line (replace both instances of yourproject):

gcloud projects add-iam-policy-binding yourproject \
--member serviceAccount:yourproject@appspot.gserviceaccount.com \
--role roles/datastore.importExportAdmin

In the function we implement, we’re going to use the Firebase Admin SDK (yarn add @google-cloud/firestore to add this to your functions project) to trigger the creation of a backup job. As we’re running this from inside Firebase Functions, we can get most of the project configuration we need from the environment. Just make sure to replace gs://your-bucket-name with your actual bucket name! You can also change the schedule to match your own requirements, I have it set as daily in this example:

Once we have this in place, our scheduled backup function will now be deployed with the rest of our application with a simple firebase deploy. It should then happily run according to the schedule you set. To call it manually, we can either use the web UI:

  • Go to the Firebase Console for your project and then choose Functions from the sidebar
  • Find your function, then click the three dots on the right hand side of the row and select “View in Cloud Scheduler”
  • Find your schedule and click “Run Now”

Or we can use the gcloud CLI (e.g as part of your CI/CD process):

  • To get a list of your scheduled functions: gcloud scheduler jobs list
  • To run one: gcloud scheduler jobs run your-scheduler-name

Once your function has run, have a look in the bucket you created earlier. You may need to wait a little while for the backup to complete (gcloud firestore operations list to check on recent operations), but eventually you should see a timestamped folder containing the backup:

A successful backup!

Let’s talk about cost

Firestore itself has an interesting pricing model. It bills based on reads and writes, total storage & total transferred. But the really expensive part? The reads and writes. They’ve recently added a warning to their documentation that clarifies that reads are definitely charged when exporting data (i.e backing up), but this is something I had to find out the hard way.

Depending on how much data you’re storing and how frequently you back up, the cost of taking backups may exceed the actual running costs of your application by several orders of magnitude. In my current project, I need to store a great deal of infrequently accessed currency conversion rate data and this means extremely expensive backups.

Unfortunately, the only way around this at this stage is to either back up less or only selectively back up your data. You can explicitly state in your backup code which collections you want to include by modifying the collectionIds parameter. So if you e.g only want to back up your users and restaurants collections, you’d specify collectionIds: ['users', 'restaurants'] .

A big gotcha is that nested collections (e.g users/{userId}/posts) also need to be added as well or they will not back up. It’s not well documented, but to save you some trial & error this is how you back up subcollections:

collectionIds: [
// This will back up users/{userId}
'users',
// This will back up users/{userId}/posts/{postId}
'posts'
]

For the data that you’ve decided is too expensive to back up, make sure you have a disaster recovery plan for how you’re going to be able to get that data back if something goes wrong.

Afterthoughts

I hope that at some stage Firebase can improve their backup tooling. At the moment it’s expensive for some common workloads and not user or developer friendly. I would say it’s one of the weaker parts of an otherwise great developer experience. Backups are something you need to have implemented for your service from day zero, so it’s my opinion that every responsible managed DB service should include prominent tools for backup and recovery and at the very least offer a basic, simple to set up automated backup solution. Making backing up your data cost-prohibitive or too technically challenging can lead to a lot of clients shooting themselves in the foot or deciding to risk it and I fear this is what Firestore has done.

Whether this applies to you or you’re just starting out, hopefully this article has shown you how to put together a robust backup system for your own Firebase application. And don’t forget to test that your backups work every so often. There are too many horror stories out there of projects who have put together a backup system, and then never actually tried restoring from one until a disaster had already occurred…

--

--