Skip to main content

Conventions

Here’s a summary of all of our conventions across Terraform, Stacks, Catalogs, etc.

SweetOps Conventions

SweetOps is built on top of a number of high-level concepts and terminology that are critical to understanding prior to getting started. In this document, we break down these concepts to help you better understand our conventions as we introduce them.

Components

Components are opinionated, self-contained units of infrastructure as code that solve one, specific problem or use-case. SweetOps has two flavors of components:

  1. Terraform: Stand-alone root modules that implement some piece of your infrastructure. For example, typical components might be an EKS cluster, RDS cluster, EFS filesystem, S3 bucket, DynamoDB table, etc. You can find the full library of SweetOps Terraform components on GitHub. We keep these types of components in the components/terraform/ directory within the infrastructure repository.

  2. Helmfiles: Stand-alone, applications deployed using helmfile to Kubernetes. For example, typical helmfiles might deploy the DataDog agent, cert-manager controller, nginx-ingress controller, etc. Similarly, the full library of SweetOps Helmfile components is on GitHub. We keep these types of components in the components/helmfile/ directory within the infrastructure repository.

One important distinction about components that is worth noting: components are opinionated “root” modules that typically call other child modules. Components are the building blocks of your infrastructure. This is where you define all the business logic for how to provision some common piece of infrastructure like ECR repos (with the ecr component) or EKS clusters (with the eks component). Our convention is to stick components in the components/terraform directory and to use a modules/ subfolder to provide child modules intended to be called by the components.

caution

We do not recommend consuming one terraform component inside of another as that would defeat the purpose; each component is intended to be a loosely coupled unit of IaC with its own lifecycle. Further more, since components define a state backend, it’s not supported in terraform to call it from other modules.

Additional Considerations

  • Components should be opinionated. They define how your company wants to deliver a service.

  • Components should try to not rely on more than 2 providers, in order to have the most modular configuration. Terraform does not support passing a list of providers via variable, instead, all the providers should be statically listed inside the module. So Using 1-2 providers ensures a simple way exists to create any number of architectures with a given component (e.g. “primary” and “delegated” resources). There are few if any architectures with ternary/quaternary/etc relationships between accounts, which is why we recommend sticking to 1-2 providers. https://github.com/hashicorp/terraform/issues/24476

  • Components should not have a configuration setting in their names (e.g. components/terraform/eks-prod is poor convention). Prod is a type of configuration and the component shouldn’t differ by stage, only by configuration. The acceptable exception to the convention is naming conventions ...-root which can only be provisioned in the root account (E.g. AWS Organizations).

  • Components should try to expose the same variables as the upstream child modules unless it would lead to naming conflicts.

  • Components should use context.tf Terraform.

  • Components should have a README.md with sample usage

  • Components should be well formatted (e.g. terraform fmt)

  • Components should use remote-state where possible to obtain values automatically from other components. All remote-state lookups belong in the remote-state.tf file. See How to Use Terraform Remote State.

  • Components should try to upstream as much business logic as possible to child modules to promote reuse.

  • Components should use strict version pinning in components and lower-bound pinning in terraform modules. See our best practice for this. See How to Keep Everything Up to Date after pinning. See Proposed: Use Strict Provider Pinning in Components for more context.

Stacks

We use Stacks to define and organize configurations. We place terraform “root” modules in the components/terraform directory (e.g. components/terraform/s3-bucket). Then we define one or more catalog archetypes for using the component (e.g. catalog/s3-bucket/logs.yaml and catalog/s3-bucket/artifacts).

Stacks are a way to express the complete infrastructure needed for an environment using a standard YAML configuration format that has been developed by Cloud Posse. Stacks consist of components and the variables inputs into those components. For example, you configure a stack for each AWS account and then reference the components which comprise that stack. The more modular the components, the easier it is to quickly define a stack without writing any new code.

Here is an example stack defined for a Dev environment in the us-west-2 region:

# Filename: stacks/uw2-dev.yaml
import:
- eks/eks-defaults

vars:
stage: dev

terraform:
vars: {}

helmfile:
vars:
account_number: "1234567890"

components:
terraform:

dns-delegated:
vars:
request_acm_certificate: true
zone_config:
- subdomain: dev
zone_name: example.com

vpc:
vars:
cidr_block: "10.122.0.0/18"

eks:
vars:
cluster_kubernetes_version: "1.19"
region_availability_zones: ["us-west-2b", "us-west-2c", "us-west-1d"]
public_access_cidrs: ["72.107.0.0/24"]

aurora-postgres:
vars:
instance_type: db.r4.large
cluster_size: 2

mq-broker:
vars:
apply_immediately: true
auto_minor_version_upgrade: true
deployment_mode: "ACTIVE_STANDBY_MULTI_AZ"
engine_type: "ActiveMQ"

helmfile:

external-dns:
vars:
installed: true

datadog:
vars:
installed: true
datadogTags:
- "env:uw2-dev"
- "region:us-west-2"
- "stage:dev"

Great, so what can you do with a stack? Stacks are meant to be a language and tool agnostic way to describe infrastructure, but how to use the stack configuration is up to you. We provide the following ways to utilize stacks today:

  1. atmos: atmos is a command-line tool that enables CLI-driven stack utilization and supports workflows around terraform, helmfile, and many other commands

  2. terraform-provider-utils: is our terraform provider for consuming stack configurations from within HCL/terraform.

  3. Spacelift: By using the terraform-spacelift-cloud-infrastructure-automation module you can configure Spacelift continuously deliver components. Read up on why we Use Spacelift for GitOps with Terraform .

Global (Default) Region

The global region, annotated gbl, is an environment or region we use to deploy unique components. A component may be deployed in the gbl region for any of the following reasons:

  1. The AWS Service itself is global (e.g. S3 bucket, Cloudfront)
  2. The AWS Service is forced into a specific region (IAM, Route 53 - similar to AWS declaring something as "global")
  3. The AWS Service should only be deployed exactly once across regions (AWS Identity Center)
  4. The resource isn't in AWS, Kubernetes, or anywhere we can reasonably assign to a region. This is common with third-party providers such as Spacelift.

However, the AWS provider still needs a region to be defined. We set the global region to the primary region as default. This is intended to cause the least confusion when looking for resources, yet the "global region" can be any region.

Catalogs

Catalogs in SweetOps are collections of sharable and reusable configurations. Think of the configurations in catalogs as defining archetypes (a very typical example of a certain thing) of configuration (E.g. s3/public and s3/logs would be two kinds of archetypes of S3 buckets). They are also convenient for managing Terraform. These are typically YAML configurations that can be imported and provide solid baselines to configure security, monitoring, or other 3rd party tooling. Catalogs enable an organization to codify its best practices of configuration and share them. We use this pattern both with our public terraform modules as well as with our stack configurations (e.g. in the stacks/catalog folder).

SweetOps provides many examples of how to use the catalog pattern to get you started.

Today SweetOps provides a couple important catalogs:

  1. DataDog Monitors: Quickly bootstrap your SRE efforts by utilizing some of these best practice DataDog application monitors.

  2. AWS Config Rules: Quickly bootstrap your AWS compliance efforts by utilizing hundreds of AWS Config rules that automate security checks against many common services.

  3. AWS Service Control Policies: define what permissions in your organization you want to permit or deny in member accounts.

In the future, you’re likely to see additional open-source catalogs for OPA rules and tools to make sharing configurations even easier. But it is important to note that how you use catalogs is really up to you to define, and the best catalogs will be specific to your organization.

Collections

Collections are groups of stacks.

Segments

Segments are interconnected networks. For example, a production segment connects all production-tier stacks, while a non-production segment connects all non-production stacks.

Primary vs Delegated

Primary vs Delegated is a common implementation pattern in SweetOps. This is most easily described when looking at the example of domain and DNS usage in a multi-account AWS organization: SweetOps takes the approach that the root domain (e.g. example.com) is owned by a primary AWS account where the apex zone resides. Subdomains on that domain (e.g. dev.example.com) are then delegated to the other AWS accounts via an NS record on the primary hosted zone which points to the delegated hosted zone’s name servers.

You can see examples of this pattern in the dns-primary, dns-delegated and iam-primary-roles / iam-delegated-roles components.

Live vs Model (or Synthetic)

Live represents something that is actively being used. It differs from stages like “Production” and “Staging” in the sense that both stages are “live” and in-use. While terms like “Model” and “Synthetic” refer to something which is similar, but not in use by end-users. For example, a live production vanity domain of acme.com might have a synthetic vanity domain of acme-prod.net.

Docker Based Toolbox (aka Geodesic)

In the landscape of developing infrastructure, there are dozens of tools that we all need on our personal machines to do our jobs. In SweetOps, instead of having you install each tool individually, we use Docker to package all of these tools into one convenient image that you can use as your infrastructure automation toolbox. We call it Geodesic and we use it as our DevOps automation shell and as the base Docker image for all of our DevOps tooling.

Geodesic is a DevOps Linux Distribution packaged as a Docker image that provides users the ability to utilize atmos, terraform, kubectl, helmfile, the AWS CLI, and many other popular tools that compromise the SweetOps methodology without having to invoke a dozen install commands to get started. It’s intended to be used as an interactive cloud automation shell, a base image, or in CI/CD workflows to ensure that all systems are running the same set of versioned, easily accessible tools.

Vendoring

Vendoring is a strategy of importing external dependencies into a local source tree or VCS. Many languages (e.g. NodeJS, Golang) natively support the concept. However, there are many other tools which do not address how to do vendoring, namely terraform.

There are a few reasons to do vendoring. Sometimes the tools we use do not support importing external sources. Other times, we need to make sure to have full-control over the lifecycle or versioning of some code in case the external dependencies go away.

Our current approach to vendoring of thirdparty software dependencies is to use vendir when needed.

Example use-cases for Vendoring:

  1. Terraform is one situation where it’s needed. While terraform supports child modules pulled from remote sources, components (aka root modules) cannot be pulled from remotes.

  2. GitHub Actions do not currently support importing remote workflows. Using vendir we can easily import remote workflows.

Generators

Generators in SweetOps are the pattern of producing code or configuration when existing tools have shortcomings that cannot be addressed through standard IaC. This is best explained through our use-cases for generators today:

  1. In order to deploy AWS Config rules to every region enabled in an AWS Account, we need to specify a provider block and consume a compliance child module for each region. Unfortunately, Terraform does not currently support the ability loop over providers, which results in needing to manually create these provider blocks for each region that we’re targeting. On top of that, not every organization uses the same types of accounts so a hardcoded solution is not easily shared. Therefore, to avoid tedious manual work we use the generator pattern to create the .tf files which specify a provider block for each module and the corresponding AWS Config child module.

  2. Many tools for AWS work best when profiles have been configured in the AWS Configuration file (~/.aws/config). If we’re working with dozens of accounts, keeping this file current on each developer’s machine is error prone and tedious. Therefore we use a generator to build this configuration based on the accounts enabled.

  3. Terraform backends do not support interpolation. Therefore, we define the backend configuration in our YAML stack configuration and use atmos as our generator to build the backend configuration files for all components.

The 4-Layers of Infrastructure

We believe that infrastructure fundamentally consists of 4-layers of infrastructure. We build infrastructure starting from the bottom layer and work our way up.

Each layer builds on the previous one and our structure is only as solid as our foundation. The tools at each layer vary and augment the underlying layers. Every layer has it’s own SDLC and is free to update independently of the other layers. The 4th and final layer is where your applications are deployed. While we believe in using terraform for layers 1-3, we believe it’s acceptable to introduce another layer of tools to support application developers (e.g. Serverless Framework, CDK, etc) are all acceptable since we’ve built a solid, consistent foundation.

Terraform

Mixins

Terraform does not natively support the object-oriented concepts of multiple inheritances or mixins, but we can simulate by using convention. For our purposes, we define a mixin in terraform as a controlled way of adding functionality to modules. When a mixin file is dropped into a folder of a module, the code in the mixin starts to interact with the code in the module. A module can have as many mixins as needed. Since terraform does not directly, we instead use a convention of exporting what we want to reuse.

We achieve this currently using something we call an export in our terraform modules, which publish some reusable terraform code that we copy verbatim into modules as needed. We use this pattern with our terraform-null-label using the context.tf file pattern (See below). We also use this pattern in our terraform-aws-security-group module with the https://github.com/cloudposse/terraform-aws-security-group/blob/main/exports/security-group-variables.tf.

To follow this convention, create an export/ folder with the mixin files you wish to export to other modules. Then simply copy them over (E.g. with curl). We recommend calling the installed files something .mixin.tf so it’s clear it's an external asset.

Resource Factories

Resource Factories provide a custom declarative interface for defining multiple resources using YAML and then terraform for implementing the business logic. Most of our new modules are developed using this pattern so we can decouple the architecture requirements from the implementation.

See https://medium.com/google-cloud/resource-factories-a-descriptive-approach-to-terraform-581b3ebb59c for a related discussion.

To better support this pattern, we implemented native support for deep merging in terraform using our https://github.com/cloudposse/terraform-provider-utils provider as well as implemented a module to standardize how we use YAML configurations https://github.com/cloudposse/terraform-yaml-config.

Examples of modules using Resource Factory convention:

Naming Conventions (and the terraform-null-label Module)

Naming things is hard. We’ve made it easier by defining a programmatically consistent naming convention, which we use in everything we provision. It is designed to generate consistent human-friendly names and tags for resources. We implement this using a terraform module which accepts a number of standardized inputs and produces an output with the fully disambiguate ID. This module establishes the common interface we use in all of our terraform modules in the Cloud Posse ecosystem. Use terraform-null-label to implement a strict naming convention. We use it in all of our Components and export something we call the context.tf pattern.

https://github.com/cloudposse/terraform-null-label

Here’s video from our https://cloudposse.atlassian.net/wiki/spaces/CP/pages/1170014234 where we talk about it.

There are 6 inputs considered "labels" or "ID elements" (because the labels are used to construct the ID):

  1. namespace

  2. tenant

  3. environment

  4. stage

  5. name

  6. attributes

This module generates IDs using the following convention by default: {namespace}-{environment}-{stage}-{name}-{attributes}. However, it is highly configurable. The delimiter (e.g. -) is configurable. Each label item is optional (although you must provide at least one).

Tenants

tenants are a Cloud Posse construct used to describe a collection of accounts within an Organizational Unit (OU). An OU may have multiple tenants, and each tenant may have multiple AWS accounts. For example, the platform OU might have two tenants named dev and prod. The dev tenant can contain accounts for the staging, dev, qa, and sandbox environments, while the prod tenant only has one account for the prod environment.

By separating accounts into these logical groupings, we can organize accounts at a higher level, follow AWS Well-Architected Framework recommendations, and enforce environment boundaries easily.

The context.tf Mixin Pattern

Cloud Posse Terraform modules all share a common context object that is meant to be passed from module to module. A context object is a single object that contains all the input values for terraform-null-label and every cloudposse/terraform-* module uses it to ensure a common interface to all of our modules. By convention, we install this file as context.tf which is why we call it the context.tf pattern. By default, we always provide an instance of it accessible via module.this, which makes it always easy to get your context. 🙂

Every input value can also be specified individually by name as a standard Terraform variable, and the value of those variables, when set to something other than null, will override the value in the context object. In order to allow chaining of these objects, where the context object input to one module is transformed and passed on to the next module, all the variables default to null or empty collections.

Atmos CLI

We predominantly call terraform from atmos, however, by design all of our infrastructure code runs without any task runners. This is in contrast to tools like terragrunt that manipulate the state of infrastructure code at run time.

See How to use Atmos

Helm

Charts as an Interface

Typically, in a service-oriented architecture (SOA) aka microservices architecture, there will be dozens of very similar services. Traditionally, companies would develop a “unique” helm chart for each of these services. In reality, the charts were generated by running the helm create (https://helm.sh/docs/helm/helm_create/ ) command that would generate all the boilerplate. As a result, the services would share 99% of their DNA with each other (e.g. like monkeys and humans), and 1% would differ. This led to a lot of tech debt, sprawl, and copy & paste 🍝 mistakes.

For proprietary apps deployed by your organization, we recommend taking a different tactic when developing helm charts. Instead, treat charts like an interface - the way you want to deploy apps to Kubernetes. Develop 1-2 charts based on the patterns you want your developers to use (e.g. microservice, batchjob, etc). Then parameterize things like the image, env values, resource limits, healthcheck endpoints, etc. Think of charts like developing your own Heroku-like mechanism to deploy an application. Instead of writing 2 dozen charts, maintain one. Make your apps conform to the convention. Push back on changes to the convention unless necessary.

What if we need more than one deployment (or XYZ) in a chart? That’s easy. You have a few options: a) Deploy the same chart twice; b) Decide if as an organization you want to support that interface and then extend the chart; c) Develop an additional chart interface.

What if we want to add a feature to our chart and don’t want to disrupt other services? No problem. Charts are versioned. So think of the version of a chart as the version of your interface. Everytime you change the chart, bump the version. Ensure all your services pin to a version of the chart. Now when you change the version of the chart in your service, you know that your upgrading your interface as well.

What if we need some backing services? Good question. You can still use the features of umbrella charts, and even feature flag common things like deploying a database backend for development environments by using a condition in the requirements.yaml that can be toggled in the values.yaml. Pro-tip: Use https://artifacthub.io/ to find ready-made charts you can use.

- name: elasticsearch
version: ^1.17.0
repository: https://kubernetes-charts.storage.googleapis.com/
condition: elasticsearch.enabled
- name: kibana
version: ^1.1.0
repository: https://kubernetes-charts.storage.googleapis.com/
condition: kibana.enabled

Jira

Every task is tied (or should be tied) to a confluence page. Homework Assignments. Day two operations are tutorials and upgrade guides. Almost FAQ everything can be in How-to Guides .

Implementation guides and Deployment guides may coexist or there could be multiple types of similar implementations.

Implementation Tasks

An implementation includes the step by step instructions to go from Design Decisions to Implementation and ultimately Deployment.

Example: Implement AWS Cold Start

Provisioning Tasks

Every implementation consists of decisions that result of provisioning something so we can deploy applications or configurations to it.

Deployment Tasks

Deployment tasks come after all provisioning tasks because they operate on something which has been previously provisioned.

Design Decision Tasks

Design Decisions are the form of requirements gathering that we do with our customers. See Design Decisions.

caution

We used to move decisions to PENDING REVIEW after discussing with the customer on our weekly calls. Instead, we should now move them to READY TO IMPLEMENT as soon as we have the requirements.

  1. BACKLOG decisions that have not yet been reviewed and covered with customers should be in the backlog.

  2. READY TO IMPLEMENT decisions have requirements and now need to be documented as ADRs. See How to write ADRs.

  3. PENDING REVIEW decisions have an associated Pull Request for the ADR and has not yet been merged

  4. DONE decisions have been decided and codified as an ADR in the main branch.