The complexities of multi-application, multi-environment, multi-region, multi-account

By Kevin Teague
Published on Oct 15, 2019

When you host your applications on the AWS cloud you open the door to a multiverse of considerations: multi-account, multi-region, multi-environment and multi-application. It’s easy to underestimate the work, time, cost, and pain-factor of managing this multiplication of considerations. 

Most companies’ AWS cloud journey goes something like this:

  1. Learn the basics of AWS.
    Get initial hands-on experience. Create a single account and use the AWS console to manually provision the resources for a single application in a single region.
  2. Adopt Infrastructure as Code (IaC) automation and provision to multiple environments.
    The benefits of being able to automate the provisioning of your infrastructure in a repeatable manner becomes apparent when you want to have multiple environments (i.e. development, staging and production).
  3. Provision applications or resources to multiple regions.
    As the importance of the reliability and responsiveness of a production application increases, it makes sense to provision some or all of the application in more than one AWS region. Regulatory requirements, disaster recovery and support global dev teams are also other key reasons for going multi-region.
  4. Provision application or resources to multiple accounts.
    Isolating applications and environments in their own dedicated AWS accounts is a foundation of good security. Depending upon a companies priorities, multi-accounts may happen before multi-region, or often both of these aspects are considered at the same time.

As each multi-<thing> is considered, the complexity of AWS resources provisioned increases. With a single application in only one environment and in a single region and account, a set of resources are provisioned just once. But consider what happens as each multi-<thing> is layered on:

  1. Multi-environment: Resources would be provisioned 3 times in a typical development, staging and production environment set-up for a single application.
  2. Multi-region: Resources would be provisioned 7 times for an application that has 3 environments and is using two regions for development and three for production.
  3. Multi-account: Adding multiple accounts doesn’t increase the total number of resources provisioned, but it splits the resources into their own configuration considerations. If there were accounts for production, staging, development, tools, security, disaster recovery and shared services, configured groups of resources could need to be provisioned 10 times. However the overall complexity is significantly increased as different resources need different considerations depending upon which account they belong in. Cross-account communications also opens up it’s own can of worms.
  4. Multi-application: All of the above is for a single application. A company might only have one or two primary applications, but typically have many supporting applications for internal business use they want to run in AWS. Consider a medium sized organization with 4 applications provisioned across 3 regions and 6 accounts. That is a total of 72 distinct configurations where provisioned resources can appear!

One final consideration is that AWS offers many services: EC2, S3, CloudWatch, CloudFormation, ECS, EKS, Lambda, SQS. Just provisioning resources to support one application across all those services created the term AWS sprawl. Multiply that sprawl across environments, regions and accounts and it can feel impossible to have a handle on everything you have in AWS.

When it comes to tackling the complexity of multitudes, multi-application and multi-environment are the first two that arise in typical Infrastructure as Code project. It’s easy to write a CloudFormation or Terraform project that deploys an application once. Adding one layer of configuration files for each environment for an application is also straight-forward. But when it comes time to consider regions and accounts, these can add significant complexities to the project. Most Infrastructure as Code projects don’t automate everything. Setting up a new account and managing cross-account access control are tasks that only need to be done a few times. It simply doesn’t make sense to spend the development time on something that can be done manually much quicker. Similarly when handling multi-region provisioning, to be able to handle small configuration differences cleanly between two regions often requires significant refactoring and development effort, instead it is common to simply copy and paste complete trees of code and configuration despite the maintenance burden this can lead to down the road.

One of the driving concepts behind the creation of Waterbear Cloud was that, while it might not make sense for an individual organization to invest in the months of development time to automate the complexities of multiple accounts and regions, it does make sense to spend that development effort if the number of Infrastructure as Code projects is large enough. When a customer chooses to go with Waterbear Cloud, a portion of their spend is pooled with the other Waterbear Cloud customers, and the burden of handling the complexities of multi-account and multi-cloud is shared between many.

This is especially valuable to Waterbear Cloud customers, as often the complexities of being multi-account and multi-region are iceberg tasks in IaC project. While a cloud engineer’s initial gut instinct to go multi-account might be, “this isn’t going to be too hard”, the hidden complexities that are discovered under the surface can cause much head banging.

One example that we experienced while building mulit-account support was with the Waterbear Cloud Notification service. With our Notification service, we envisioned all AWS CloudWatch Alarms notifying a single central Lambda function, regardless of which account or region an alarm incident was generated from. By routing all alarms into a single place, we can build intelligence into alarm handling – for example, an application runs a nightly job to do data back-ups can be identified as noise and filtered out.

However, when sending alarm notifications in AWS, there are two wrinkles: an alarm must send to an SNS Topic that is in the same region as the alarm, and if the SNS Topic is in another AWS account, there needs to be a custom access Policy that allows the source account to send notifications to that topic. As well, for security considerations, that policy should only allow your own AWS accounts access it. The final solution was to provision SNS Topics in every active region in every account, and have all of those SNS Topics notify a central Lambda function (as the SNS Topic to Lambda can do cross-account, cross-region notification). The CloudWatch Alarms have an AlarmDescription field which we populate with JSON that includes the final SNS Topics which contains the end recipients of the notifications.

Overall, developing that solution took us about 70 hours of effort. However, as we encapsulated the work between improvements to our open source AIM and our proprietary Notification service project, by simply configuring in an AIM Project’s YAML files the region and account that the notification Lambda will live, and the list of active regions for a project, then running ‘aim provision service notification” we can now deploy a tested, robust cross-account, cross-region notification solution in less than 10 minutes.

In conclusion, our past experiences with the pain of enabling Infrastructure as Code projects to be multi-region and multi-account informed how we designed and built Waterbear Cloud tooling. Building these considerations into our tooling involved no small amount of blood, sweat, and tears, but now when we provision complex multi-account, multi-region environments from a simple set of configuration files and it just works...well, thatgives us a “Wow!” feeling every time.

Interested in saving yourself some major pain, time, and expense? Get in touch and let us help you get started.