Enterprise Serverless 🚀 Security

Published in

Level Up Coding

9 min readJul 6, 2020

This section in the series on Enterprise Serverless specifically covers some of the security aspects and tooling when it comes to serverless applications from my own perspective. You can read the ideas behind the series by clicking on the ‘Series Introduction’ link below.

The series is split over the following sections:

Security

Below are some of the key areas that spring to mind personally based on previous large cloud projects, but not limited to this list (i’ll add further sections when they come to mind no doubt):

API Validation ✅

On large scale serverless projects it is important to validate incoming API requests up front as this is typically the public facing API for bad actors and attackers (under posting/over posting/bad payloads etc). AWS serverless services that are typically used are AWS API Gateway (REST) and AWS AppSync (GraphQL).

AWS AppSync

With AWS AppSync I have typically put in a validate method for each endpoint into the pipeline, for example on a fictitious ‘get customer’ query:

Mapping Template

{
   type: 'Query',
   field: 'getCustomer',
   request: resolve(
'src/customer/resolvers/pipelines/Before.req.vtl'
   ),
   response: resolve(
   'src/customer/resolvers/pipelines/After.res.vtl'
   ),
   kind: 'PIPELINE',
   functions: ['validateGetCustomer', 'getCustomer'],
},

Validate Get Customer (basic example)

#set($errors = [])
#set($valid = $util.matches("${customerId}", $ctx.args.customerId))
#if (!$valid)
   #if ($util.isNullOrEmpty($ctx.args.customerId))
      #set($propertyRequiredError = "${propertyRequiredError}")
      $util.qr($errors.add($propertyRequiredError.replace("{0}", "Customer ID")))
   #else
      #set($propertyNotValidError = "${propertyNotValidError}")
      $util.qr($errors.add($propertyNotValidError.replace("{0}",
      $ctx.args.customerId).replace("{1}", "customer ID")))
   #end
#end
#if ($errors.size() > 0)
   $utils.error($util.toJson("${validationError}"), null, null, $errors)
#end
{}

This ensures that we validate the payload and/or arguments of the request up front using regexes, lengths, types etc before passing to the next part of the pipeline which may be a DynamoDB or Lambda resolver. GraphQL will often offer up some basic pre validation before even hitting your API but you can’t rely on it.

💡 I typically use the before step of the pipeline for setting up correlation IDs (shared VTL file), and the after step for any post error transposing if required.

AWS API Gateway

With API Gateway you can perform basic request validation up front using JSON schema and request validators, although this can be a mine field when choosing integration types with lambda — and it’s worth spending some time playing about with each to understand how they work and what suits your solution (its a post in it’s own right).

IAM least privilege

When it comes to building out any cloud services on AWS one of the fundamental security tasks in my opinion is ensuring that specific services only have the IAM permissions to do the task at hand (and no more!).

For example, if you're allowing a lambda to put a message on a queue, ensure you tie down the lambda role so it can only add the message to the specific queue, and not every SQS queue in the account.

📖 “With Lambda functions, it’s recommended that you follow least-privileged access and only allow the access needed to perform a given operation. Attaching a role with more permissions than necessary can open up your systems for abuse. With the security context, having smaller functions that perform scoped activities contribute to a more well-architected serverless application. Regarding IAM roles, sharing an IAM role within more than one Lambda function will likely violate least privileged access.” — AWS Serverless Application Lens

A real world example could be a DynamoDB resolver in AWS AppSync where the service role associated to the Data Source only has the DynamoDB policies for the specific table and action type required(query, scan, update etc).

This could be a GraphQL endpoint for querying a specific order, for example:

Data Source

{
   type: 'AMAZON_DYNAMODB',
   name: 'OrderDataSource',
   description: 'Order Data Source',
   config: {
      tableName: {
         'Fn::ImportValue': 'order-table-${self:custom.prefix}',
      },
      serviceRoleArn: {
         'Fn::GetAtt': ['OrderServiceRole', 'Arn'],
      },
   },
}

Service Role

OrderServiceRole: {
   Type: 'AWS::IAM::Role',
   Properties: {
   RoleName: '${self:custom.prefix}-order-sr',
   AssumeRolePolicyDocument: {
      Version: '2012-10-17',
      Statement: [
      {
         Effect: 'Allow',
         Principal: {
         Service: ['appsync.amazonaws.com'],
      },
         Action: ['sts:AssumeRole'],
      }],
   },
   Policies: [
      {
         PolicyName: '${self:custom.prefix}-order-sr-policy',
         PolicyDocument: {
         Version: '2012-10-17',
         Statement: [
         {
            Effect: 'Allow',
            Action: ['dynamodb:Query'],
            Resource: [
            {
               'Fn::ImportValue':
               'order-table-${self:custom.prefix}-arn',
            },...

This ensures in this example that when the Data Source is linked to the Function Configuration that the GraphQL resolver in this instance only as the permissions required to perform the specific query task at hand on a specific order table; and can’t do anymore (inadvertent damage).

💡 In this specific example above one issue you may face on large projects is running out of IAM roles over the account with this level of fine grained access — limits can be found here. Say you have 100 individual GraphQL endpoints in your solution, and 20 developers working with ephemeral environments, then thats 2K potential IAM roles straight away at any given point in time (without the rest of your solutions roles).

AWS White papers and resources

I personally find the following useful:

Runtime and Package Versions

A key aspect of any large cloud project is ensuring your packages, project dependencies and runtime versions do not have any known vulnerabilities.

The great thing with AWS Lambda is that Amazon manage the underlying patching of runtimes; however if you are using containers, for example with AWS Fargate, you will need to manage the use of base image versions yourself.

It is also imperative that packages and project dependencies are scanned as part of your CI/CD pipelines using open source solutions to make the team aware of any security vulnerabilities so they can be mitigated.

Don’t log secrets!

When it comes to logging be sure not to log secrets or personal identifiable information such as email addresses or bank details, as this is a bad actors gold mine 😈. It may also go against your companies policies, GDPR, or the terms and conditions of your services for your customers.

An example when using AWS AppSync with the serverless-appsync-plugin would be to change the logging level in your serverless YAML/JSON based on stage your deploying to:

logConfig:
      loggingRoleArn: { Fn::GetAtt: [AppSyncLoggingServiceRole, Arn] } # Where AppSyncLoggingServiceRole is a role with CloudWatch Logs write access
      level: ERROR # Logging Level: NONE | ERROR | ALL
      excludeVerboseContent: false

This maps through CloudFormation to the following. I have tended to switch the verbosity and level of of the logging depending on whether we are in ephemeral/QA/PP or Prod, as previously on default we could see bearer tokens being logged in the AppSync CloudWatch request logs.. 🙈 In ephemeral environments and QA the extra logging could be useful for debugging.

📖 “We strictly advise against sending, logging, and storing unencrypted sensitive data, either as part of HTTP request path/query strings or in standard output of a Lambda function.” — AWS Serverless Application Lens

VPC Configurations ⚙️

When working on large serverless cloud projects you may need to connect resources such as lambda to AWS services that reside in a VPC (for example RDS or DocumentDB). In this scenario it is important that inbound and outbound traffic rules etc are correctly setup.

Rate Limiting

Setting up rate limiting correctly on your APIs in line with your customers usage is an important task on any large scale project and shouldn’t be forgotten about, which can be done at various levels of your infrastructure (API Gateway/AWS WAF/CloudFront/AWS AppSync/CloudFlare etc) and will prevent DoS and denial of wallet attacks etc for example.

💡 On a previous project with AWS AppSync I contacted AWS about the default rate limiting as it was not specified in the documentation early on. It is now, and is set as 1000 requests per second over the full account, although it can be increased through speaking to Amazon.

MFA on AWS Accounts — all team members always!

One of the first tasks you should do when creating a new AWS account is ensuring that you have MFA enabled as standard, but that doesn’t just stop with the admin setting it up, this should be enforced for all team members throughout the lifetime of the project and account!

Your security is only as good as your weakest link — and one team member without MFA enabled being hacked opens up all of your assets to the hacker — as well as denial of wallet attacks and more through using your account for rogue activities, regardless of how stringent the rest of the team has been. 😈

S3 bucket configuration

Check your ‘private’ buckets are not public… 🙏🏼

Continuous threat modelling

On the previous large scale cloud projects I have managed one of the key areas that has worked well has been continuous threat modelling throughout the project i.e. in development and when any part of the service changes significantly after go live.

Why this has worked well is it is typically in my experience a task which is left to the end of a project before full security sign off — and can be an arduous task on a large scale project with many moving parts to do in one go at the end.

It’s also beneficial to mitigate any issues when you're working in that area of code at the time (prior or during), as opposed to a large amount of potential re-work at the end of the project. It also gives security the confidence knowing you are baking this into your processes throughout the full SDLC.

Budgets

Set budgets in the AWS console on your account to be alerted proactively if costs start to spiral for some reason..

Encryption at rest and in transit

When storing customer data it is really important to consider encryption of the data at rest and in transit. Great examples of storing data encrypted at rest are DynamoDB, AWS DocumentDB, AWS Aurora, AWS Elasticsearch, AWS Parameter Store, and AWS S3 to name a few services.

So why encrypt the data at rest? Essentially at a basic level this means that as the data is stored it is encrypted into a different format using encryption keys, therefore rendering useless in a data breach (unless of course the attacker has access to decrypt the data by some means 🔐 ).

Data in transit is essentially ‘live data’ which is being transferred between services. Many AWS services such as AWS EFS, AWS Glue and AWS Redshift allow default configuration to protect against snooping, access or modification of data in transit. Many services by default use SSL/TLS for communication.

Early Penetration Testing

Running tools such as Burp Suite during the development stages of a large scale cloud project can prevent rework at the end of the project in the same manner as threat modelling above, as you will pick up most issues in my experience early on and mitigate them through development work prior to having a professional outsourced company run penetration tests at the end of the project. This may include resolving issues around rate limiting, response headers etc — which when resolved for one endpoint typically fixes them for all in my experience 👍

💡 BurpSuite Pro can be ran as part of your CI integration from looking through the features documentation, although it’s not something I have ever done on a project so far — something for me to look into. (it’s always been done outside of the pipeline manually at different stages of the project work)

Load Testing

When it comes to confidence in your system it’s imperative to test it under load in my opinion, utilising expected figures plus future growth and taking into consideration seasonal spikes (sales, holidays etc). It’s also beneficial to test earlier in the project if possbile to prevent rework as above. Security issues can typically be found off the back of load testing in conjunction with setting up rate limiting in my experience, to ensure your an attacker can’t grind your system to a halt.

There are a number of tools which can be used such as JMeter or Artillery when working with APIs; but it is also important to test downstream processes or backend integrations which are not public facing; perhaps integrations using queues or batch processing triggered by CloudWatch events, to ensure they work under high load as expected too.

💡 The results of the load testing may indicate customer response times under peak load or in event driven systems when an async process is complete and emails sent out to customers for example. It doesn’t just need to be about at what load the API becomes unresponsive at in my opinion..it’s all about the customer experience.

Conclusion

The one thing I would say with regards to security is that it is the role of every person on the team in my opinion, and needs to be considered from day one of the project and throughout.

Next section: Useful Resources 🚀
Previous section: AWS Limits & Limitations 🚀

Wrapping up

Lets connect on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you found the articles inspiring or useful please feel free to support me with a virtual coffee https://www.buymeacoffee.com/leegilmore and either way lets connect and chat! ☕️

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

About me

“Hi, I’m Lee, an AWS certified technical architect and polyglot software engineer based in the UK, working as a Technical Cloud Architect and Serverless Lead, having worked primarily in full-stack JavaScript on AWS for the past 5 years.

I consider myself a serverless evangelist with a love of all things AWS, innovation, software architecture and technology.”

** The information provided are my own personal views and I accept no responsibility on the use of the information.