Query Cloud SQL through Big Query

This article demonstrates Cloud SQL federated queries for Big Query, a neat and simple to use feature.

Connecting to Cloud SQL

One of the challenges presented when using Cloud SQL on a private network (VPC) is providing access to users. There are several ways to accomplish this which include:

  • open the database port on the VPC Firewall (5432 for example for Postgres) and let users access the database using a command line or locally installed GUI tool (may not be allowed in your environment)
  • provide a web based interface deployed on your VPC such as PGAdmin deployed on a GCE instance or GKE pod (adds security and management overhead)
  • use the Cloud SQL proxy (requires additional software to be installed and configured)

In additional, all of the above solutions require direct IP connectivity to the instance which may not always be available. Furthermore each of these operations requires the user to present some form of authentication – in many cases the database user and password which then must be managed at an individual level.

Enter Cloud SQL federated queries for Big Query…

Big Query Federated Queries for Cloud SQL

Big Query allows you to query tables and views in Cloud SQL (currently MySQL and Postgres) using the Federated Queries feature. The queries could be authorized views in Big Query datasets for example.

This has the following advantages:

  • Allows users to authenticate and use the GCP console to query Cloud SQL
  • Does not require direct IP connectivity to the user or additional routes or firewall rules
  • Leverages Cloud IAM as the authorization mechanism – rather that unmanaged db user accounts and object level permissions
  • External queries can be executed against a read replica of the Cloud SQL instance to offload query IO from the master instance

Setting it up

Setting up big query federated queries for Cloud SQL is exceptionally straightforward, a summary of the steps are provided below:

Step 1. Enable a Public IP on the Cloud SQL instance

This sounds bad, but it isn’t really that bad. You need to enable a public interface for Big Query to be able to establish a connection to Cloud SQL, however this is not accessed through the actual public internet – rather it is accessed through the Google network using the back end of the front end if you will.

Furthermore, you configure an empty list of authorized networks which effectively shields the instance from the public network, this can be configured in Terraform as shown here:

This configuration change can be made to a running instance as well as during the initial provisioning of the instance.

As shown below you will get a warning dialog in the console saying that you have no authorized networks – this is by design.

Cloud SQL Public IP Enabled with No Authorized Networks

Step 2. Create a Big Query dataset which will be used to execute the queries to Cloud SQL

Connections to Cloud SQL are defined in a Big Query dataset, this can also be use to control access to Cloud SQL using authorized views controlled by IAM roles.

Step 3. Create a connection to Cloud SQL

To create a connection to Cloud SQL from Big Query you must first enable the BigQuery Connection API, this is done at a project level.

As this is a fairly recent feature there isn’t great coverage with either the bq tool or any of the Big Query client libraries to do this so we will need to use the console for now…

Under the Resources -> Add Data link in the left hand panel of the Big Query console UI, select Create Connection. You will see a side info panel with a form to enter connection details for your Cloud SQL instance.

In this example I will setup a connection to a Cloud SQL read replica instance I have created:

Creating a Big Query Connection to Cloud SQL

More information on the Big Query Connections API can be found at: https://cloud.google.com/bigquery/docs/reference/bigqueryconnection/rest

The following permissions are associated with connections in Big Query:


These permissions are conveniently combined into the following predefined roles:

roles/bigquery.connectionAdmin    (BigQuery Connection Admin)         
roles/bigquery.connectionUser     (BigQuery Connection User)          

Step 4. Query away!

Now the connection to Cloud SQL can be accessed using the EXTERNAL_QUERY function in Big Query, as shown here:

Querying Cloud SQL from Big Query

Google Cloud SQL – Availability for PostgreSQL – Part II (Read Replicas)

In this post we will look at read replicas as an additional method to achieve multi zone availability for Cloud SQL, which gives us – in turn – the ability to offload (potentially expensive) IO operations such as user created backups or read operations without adding load to the master instance.

In the previous post in this series we looked at Regional availability for PostgreSQL HA using Cloud SQL:

Recall that this option was simple to implement and worked relatively seamlessly and transparently with respect to zonal failover.

Now let’s look at read replicas in Cloud SQL as an additional measure for availability.

Deploying Read Replica(s)

Deploying read replicas is slightly more involved than simple regional (high) availability, as you will need to define each replica or replicas as a separate Cloud SQL instance which is a slave to the primary instance (the master instance).

An example using Terraform is provided here, starting by creating the master instance:

Next you would specify one or more read replicas (typically in a zone other than the zone the master is in):

Note that several of the options supplied are omitted when creating a read replica database instance, such as the backup and maintenance options – as these operations cannot be performed on a read replica as we will see later.

Cloud SQL Instances – showing master and replica
Cloud SQL Master Instance

Voila! You have just set up a master instance (the primary instance your application and/or users will connect to) along with a read replica in a different zone which will be asynchronously updated as changes occur on the master instance.

Read Replicas in Action

Now that we have created a read replica, lets see it in action. After connecting to the read replica (like you would any other instance), attempt to access a table that has not been created on the master as shown here:

SELECT operation from the replica instance

Now create the table and insert some data on the master instance:

Create a table and insert a record on the master instance

Now try the select operation on the replica instance:

SELECT operation from the replica instance (after changes have been made on the master)

It works!

Some Points to Note about Cloud SQL Read Replicas

  • Users connect to a read replica as a normal database connection (as shown above)
  • Google managed backups (using the console or gcloud sql backups create .. ) can NOT be performed against replica instances
  • Read replicas can be used to offload IO intensive operations from the the master instance – such as user managed backup operations (e.g. pg_dump)
pg_dump operation against a replica instance
  • BE CAREFUL Despite their name, read replicas are NOT read only, updates can be made which will NOT propagate back to the master instance – you could get yourself in an awful mess if you allow users to perform INSERT, UPDATE, DELETE, CREATE or DROP operations against replica instances.

Promoting a Read Replica

If required a read replica can be promoted to a standalone Cloud SQL instance, which is another DR option. Keep in mind however as the read replica is updated in an asynchronous manner, promotion of a read replica may result in a loss of data (hopefully not much but a loss nonetheless). Your application RPO will dictate if this is acceptable or not.

Promotion of a read replica is reasonably straightforward as demonstrated here using the console:

Promoting a read replica using the console

You can also use the following gcloud command:

 gcloud sql instances promote-replica  <replica_instance_name>

Once you click on the Promote Replica button you will see the following warning:

Promoting a read replica using the console

This simply states that once you promote the replica instance your instance will become an independent instance with no further relationship with the master instance. Once accepted and the promotion process is complete, you can see that you now have two independent Cloud SQL instances (as advertised!):

Promoted Cloud SQL instance

Some of the options you would normally configure with a master instance would need to be configured on the promoted replica instance – such as high availability, maintenance and scheduled backups – but in the event of a zonal failure you would be back up and running with virtually no data loss!

Full source code for this article is available at: https://github.com/gamma-data/cloud-sql-postgres-availability-tutorial

Google Cloud SQL – Availability, Replication, Failover for PostgreSQL – Part I

In this multi part blog we will explore the features available in Google Cloud SQL for High Availability, Backup and Recovery, Replication and Failover and Security (at rest and in transit) for the PostgreSQL DBMS engine. Some of these features are relatively hot of the press and in Beta – which still makes them available for general use.

We will start by looking at the High Availability (HA) options available to you when using the PostgreSQL engine in Google Cloud SQL.

Most of you would be familiar with the concepts of High Availability, Redundancy, Fault Tolerance, etc but let’s start with a definition of HA anyway:

High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.


Higher than a normal period is quite subjective, typically this is quantified by a percentage represented by a number of “9s”, that is 99.99% (which would be quoted as “four nines”), this would allot you 52.60 minutes of downtime over a one-year period.

Essentially the number of 9’s required will drive your bias towards the options available to you for Cloud SQL HA.

We will start with Cloud SQL HA in its simplest form, Regional Availability.

Regional Availability

Knowing what we know about the Google Cloud Platform, regional availability means that our application or service (in this case Cloud SQL) should be resilient to a failure of any one zone in our region. In fact, as all GCP regions have at least 3 zones – two zones could fail, and our application would still be available.

Regional availability for Cloud SQL (which Google refers to as High Availability), creates a standby instance in addition to the primary instance and uses a regional Persistent Disk resource to store the database instance data, transaction log and other state files, which is synchronously replicated to a Persistent Disk resource local to the zones that the primary and standby instances are located in.

A shared IP address (like a Virtual IP) is used to serve traffic to the healthy (normally primary) Cloud SQL instance.

An overview of Cloud SQL HA is shown here:

Cloud SQL High Availability

Implementing High Availability for Cloud SQL

Implementing Regional Availability for Cloud SQL is dead simple, it is one argument:

availability_type = "REGIONAL"

Using the gcloud command line utility, this would be:

gcloud sql instances create postgresql-instance-1234 \
  --availability-type=REGIONAL \
  --database-version= POSTGRES_9_6

Using Terraform (with a complete set of options) it would look like:

resource "google_sql_database_instance" "postgres_ha" {
  provider = google-beta
  region = var.region
  project = var.project
  name = "postgresql-instance-${random_id.instance_suffix.hex}"
  database_version = "POSTGRES_9_6"
  settings {
   tier = var.tier
   disk_size = var.disk_size
   activation_policy = "ALWAYS"
   disk_autoresize = true
   disk_type = "PD_SSD"
   availability_type = "REGIONAL"
   backup_configuration {
     enabled = true
     start_time = "00:00"
   ip_configuration  {
     ipv4_enabled = false
     private_network = google_compute_network.private_network.self_link
   maintenance_window  {
     day = 7
     hour = 0
     update_track = "stable"

Once deployed you will notice a few different items in the console, first from the instance overview page you can see that the High Availability option is ENABLED for your instance.

Second, you will see a Failover button enabled on the detailed management view for this instance.


Failovers and failbacks can be initiated manually or automatically (should the primary be unresponsive). A manual failover can be invoked by executing the command:

gcloud sql instances failover postgresql-instance-1234

There is an --async option which will return immediately, invoking the failover operation asynchronously.

Failover can also be invoked from the Cloud Console using the Failover button shown previously. As an example I have created a connection to a regionally available Cloud SQL instance and started a command which runs a loop and prints out a counter:

Now using the gcloud command shown earlier, I have invoked a manual failover of the Cloud SQL instance.

Once the failover is initiated, the client connection is dropped (as the server is momentarily unavailable):

The connection can be immediately re-established afterwards, the state of the running query is lost – importantly no data is lost however. If your application clients had retry logic in their code and they weren’t executing a long running query, chances are no one would notice! Once reconnecting normal database activities can be resumed:

A quick check of the instance logs will show that the failover event has occured:

Now when you return to the instance page in the console you will see a Failback button, which indicates that your instance is being served by the standby:

Note that there may be a slight delay in the availability of this option as the replica is still being synched.

It is worth noting that nothing comes for free! When you run in REGIONAL or High Availability mode – you are effectively paying double the costs as compared to running in ZONAL mode. However availability and cost have always been trade-offs against one another – you get what you pay for…

More information can be found at: https://cloud.google.com/sql/docs/postgres/high-availability

Next up we will look at read replicas (and their ability to be promoted) as another high availability alternative in Cloud SQL.

Google Cloud Storage Object Notifications using Slack

This article describes the steps to integrate Slack with Google Cloud Functions to get notified about object events within a specified Google Cloud Storage bucket.

Google Cloud Storage Object Notifications using Slack

Events could include the creation of new objects, as well as delete, archive or metadata operations performed on a given bucket.

This pattern could be easily extended to other event sources supported by Cloud Functions including:

  • Cloud Pub/Sub messages
  • Cloud Firestore and Firebase events
  • Stackdriver log entries

More information can be found at https://cloud.google.com/functions/docs/concepts/events-triggers.

The prerequisite steps to configure Slack are provided here:

1. First you will need to create a Slack app (assuming you have already set up an account and a workspace). The following screenshots demonstrate this process:

Create a Slack app
Give the app a name and associate it with an existing Slack workspace

2. Next you need to Enable and Activate Incoming Webhooks to your app and add this to your workspace. The following screenshots demonstrate this process:

Enable Incoming Web Hooks for the app
Activate incoming webhooks
Add the webhook to your workspace

3. Next you need to specify a channel for notifications generated from object events.

Select a channel for the webhook

4. Now you need to copy the Webhook url provided, you will use this later in your Cloud Function.

Copy the webhook URL to the clipboard

Treat your webhook url as a secret, do not upload this to a public source code repository

Next you need to create your Cloud Function, this example uses Python but you can use an alternative runtime including Node.js or Go.

This example templates the source code using the Terraform template_file data source. The function source code is shown here:

Within your Terraform code you need to render your Cloud Function code substituting the slack_webhook_url for it’s value which you will supply as a Terraform variable. The rendered template file is then placed in a local directory along with a requirements.txt file and zipped up. The resulting Zip archive is uploaded to a specified bucket where it will be sourced to create the Cloud Function.

Now you need to create the Cloud Function, the following HCL snippet demonstrates this:

The event_trigger block in particular specifies which GCS bucket to watch and what events will trigger invocation of the function. Bucket events include:

  • google.storage.object.finalize (the creation of a new object)
  • google.storage.object.delete
  • google.storage.object.archive
  • google.storage.object.metadataUpdate

You could add additional logic to the Cloud Function code to look for specific object names or naming patterns, but keep in mind the function will fire upon every event matching the event_type and resource criteria.

To deploy the function, you would simply run:

terraform apply -var="slack_webhook_url=https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXX"

Now once you upload a file named test-object.txt, voilà!:

Slack notification for a new object created

Full source code is available at: https://github.com/gamma-data/gcs-object-notifications-using-slack

Ansible Tower for Continuous Infrastructure

As infrastructure and teams scale, effective and robust configuration management requires growing beyond manual processes and local conventions. Fortunately, Ansible Tower (or the upstream Open Source project Ansible AWX) provides a perfect platform for configuration management at scale.

The Ansible Tower/AWX documentation and tutorials provide comprehensive information about the individual components.  However, assembling all the moving pieces into a whole working solution can involve some trial and error and reverse engineering in order to understand how the components relate to one another.  Ansible Tower, like the core Ansible solution, offers flexibility in how features assembled to support different typed of workflows. The types of workflows can include once-off initial configurations, ad-hoc system maintenance, or continuous convergence.

Continuous convergence, also referred to as desired state, regularly re-applies the defined configuration to infrastructure. This tends to ‘correct the drift’ often encountered when only applying the configuration on infrastructure setup. For example, a continuous convergence approach to configuration management could apply the desired configuration on a recurring schedule of every 30 minutes.  

Some continuous convergence workflow characteristics can include:

  • Idempotent Ansible roles. If there are no required configuration deviations, run will report 0 changes.
  • A source code repository per Ansible role, similar to the Ansible Galaxy approach,
  • A source code repository for Ansible playbooks that include the individual Ansible roles,
  • A host configured to provide one unique service function only,
  • An Ansible playbook defined for each unique service function that gets applied to the host,
  • Playbooks applied to each host on a repeating schedule.

One way to achieve a continuous convergence workflow combines the Ansible Tower components according to the following conceptual model.

The Workflow Components

Playbook and Role Source Code

Ansible roles contain the individual tasks, handlers, and content with a role responsible for the installation and configuration of a particular software service.

Ansible playbooks configure a host for a particular service function in the environment acting as a wrapper for the individual role based configurations.  All the roles expected to be applied to a host must be defined in the playbook.

Source Code Repositories

Role git repositories contain the versioned definition of a role, e.g. one git repository per individual role.  The roles are pulled into the playbooks using the git reference and tags, which pegs the role version used within a playbook.

Project git repositories group the individual playbooks into single collection, e.g. one git repository per set of playbooks.  As with roles, specific versions of project repositories are also identified by version tags. 

Ansible Tower Server

Two foundational concepts in Ansible Tower are projects and inventories. Projects provide access to playbooks and roles. Inventories provide the connection to “real” infrastructure.  Inventories and projects also provide authorisation scope for activities in Ansible Tower. For example, a given group can use the playbooks in Project X and apply jobs to hosts in Inventory Y.

Each Ansible Tower Project is backed by a project git repository.  Each repository contains the playbooks and included roles that can be applied by a given job.  The Project is the glue between the Ansible configuration tasks and the plays that apply the configuration.

Ansible Tower Inventories are sets of hosts grouped for administration, similar to inventory sets used when applying playbooks manually.  One option is to group hosts into Inventories by environment.  For example, the hosts for development may be in one Inventory while the hosts for production may be in another Inventory.  User authorisation controls are applied at the Inventory level.

Ansible Tower Inventory Groups define sub-sets of hosts within the larger Inventory.  These subsets can then be used to limit the scope of a playbook job.  One option is to group hosts within an Inventory by function.  For example, the hosts for web servers may be in one Inventory Group and the hosts for databases may be in another Inventory Group.  This enables one playbook to target one inventory group.  Inventory groups effectively provide metadata labels for hosts in the Inventory.

An Ansible Job Template determines the configuration to be applied to hosts.  The Job Template links a playbook from a project to an inventory.   The inventory scope can be optionally further limited by specifying inventory group limits.  A Job Template can be invoked either on an ad-hoc basis or via a recurring schedule.

Ansible Job Schedules define the time and frequency at which the configuration specified in the Job Template is applied.  Each Job Template can be associated with one or more Job Schedules.  A schedule supports either once-off execution, for example during a defined change window, or regularly recurring execution.  A job schedule that applies the desired state configuration with a frequency of 30 minutes provides an example of a job schedule used for a continuous convergence workflow.

“Real” Infrastructure

An Ansible Job Instance defines a single invocation of an Ansible Job Template, both for scheduled and ad-hoc invocations of the job template.  Outside of Ansible Tower, the Job Instance is the equivalent of executing the ansible-playbook command using an inventory file.

Host is the actual target infrastructure resources configured by the job instance, applying an ansible playbook of included roles.

A note on Ansible Variables

As with other features of Ansible and Ansible Tower, variables also offer flexibility in defining parameters and context when applying a configuration.  In addition to declaring and defining variables in roles and playbooks, variable definitions can also be defined in Ansible Tower job templates, inventory and inventory groups, and individual hosts.  Given the plethora of options for variable definition locations, without a set of conventions for managing variable values, debugging runtime issues with roles and playbooks can become difficult.  E.g. which value defined at which location was used when applying the role?

One example of variable definitions conventions could include:

  • Variables shall be given default values in the role, .e.g. in the ../defaults/main.yml file.
  • If the variable must have a ‘real’ value supplied when applying the playbook, the variable shall be defined with an obvious placeholder value which will fail if not overridden.
  • Variables shall be described in the role README.md documentation
  • Do not apply variables at the host inventory level as host inventory can be transient.
  • Variables that select specific capabilities within a role shall be defined at the Ansible Tower Inventory Group.  For example, a role contains the configuration definition for both master and work nodes.  The Inventory Group variables are used to indicate which hosts must have the master configuration and applied and which must have the worker configuration applied.
  • Variables that define the environment context for configuration shall be defined in the Ansible Tower Job Template.

Following these conventions, each of the possible variable definition options serves a particular purpose.  When an issue with variable definition does arise, the source is easily identified.

Scalable, Secure Application Load Balancing with VPC Native GKE and Istio

At the time of this writing, GCP does not have a generally available non-public facing Layer 7 load balancer. While this is sure to change in the future, this article outlines a design pattern which has been proven to provide scalable and extensible application load balancing services for multiple applications running in Kubernetes pods on GKE.

When you create a service of type LoadBalancer in GKE, Kubernetes hooks into the provider (GCP in this case) on your behalf to create a Google Load Balancer, while this may be specified as INTERNAL, there are two issues:

Issue #1:

The GCP load balancer created for you is a Layer 4 TCP load balancer.

Issue #2:

The normal behaviour is for Google to enumerate all of the node pools in your GKE cluster and “automagically” create mapping GCE instance groups for each node pool for each zone the instances are deployed in. This means the entire surface area of your cluster is exposed to the external network – which may not be optimal for internal applications on a multi tenanted cluster.

The Solution:

Using Istio deployed on GKE along with the Istio Ingress Gateway along with an externally created load balancer, it is possible to get scalable HTTP load balancing along with all the normal ALB goodness (stickiness, path-based routing, host-based routing, health checks, TLS offload, etc.).

An abstract depiction of this architecture is shown here:

Istio Ingress Design Pattern for VPC Native GKE Clusters

This can be deployed with a combination of Terraform and kubectl. The steps to deploy at a high level are:

  1. Create a GKE cluster with at least two node pools: ingress-nodepool and service-nodepool. Ideally create these node pools as multi-zonal for availability. You could create additional node pools for your Egress Gateway or an operations-nodepool to host Istio, etc as well.
  2. Deploy Istio.
  3. Deploy the Istio Ingress Gateway service on the ingress-nodepool using Service type NodePort.
  4. Create an associated Certificate Gateway using server certificates and private keys for TLS offload.
  5. Create a service in the service-nodepool.
  6. Reserve an unallocated static IP address from the node network range.
  7. Create an internal TCP load balancer:
    1. Specify the frontend as the IP address reserved in step 6.
    2. Specify the backend as the managed instance groups created during the node pool creation for the ingress-nodepool (ingress-nodepool-ig-a, ingress-nodepool-ig-b, ingress-nodepool-ig-c).
    3. Specify ports 80 and 443.
  8. Create a GCP Firewall Rule to allow traffic from authorized sources (network tags or CIDR ranges) to a target of the ingress-nodepool network tag.
  9. Create a Cloud DNS A Record for your managed zone as *.namespace.zone pointing to the IP Address assigned to the load balancer frontend in step 7.1.
  10. Enable Health Checks through the GCP firewall to reach the ingress-nodepool network tag at a minimum – however there is no harm in allowing these to all node pools.

The service should then be resolvable and routable from authorized internal networks (peered private VPCs or internal networks connected via VPN or Dedicated Interconnect) as:


The advantages of this design pattern are…

  1. The Ingress Gateway provides fully functional application load balancing services.
  2. Istio provides service discovery and routing using names and namespaces.
  3. The Ingress Gateway service and ingress gateway node pool can be scaled as required to meet demand.
  4. The Ingress Gateway is multi zonal for greater availability

Test Driven Infrastructure and Test Automation with Ansible, Molecule and Azure

A few years back, before the rise of the hyper-scalers, I had my first infracode ‘aha moment’ with OpenStack. The second came with Kitchen.

I had already been using test driven development for application code and configuration automation for infrastructure but Kitchen brought the two together. Kitchen made it possible to write tests, spin up infrastructure, and then tear everything down again – the Red/Green/Refactor cycle for infrastructure. What made this even better was that it wasn’t a facsimile of a target environment, it was the same – same VM’s, same OS, same network.

Coming from a Chef background for configuration automation, Kitchen is a great fit to the Ruby ecosystem. Kitchen works with Ansible and Azure, but a Ruby environment and at least a smattering of Ruby coding skills are required.

Molecule provides a similar red-green development cycle to Kitchen, but without the need to step outside of the familiar Python environment.

Out of the box, Molecule supports development of Ansible roles using either a Docker or Virtual Box infrastructure provider. Molecule also leverages the Ansible drivers for private and public cloud platforms.

Molecule can be configured to test an individual role or collections of roles in Ansible playbooks.

This tutorial demonstrates how to use Molecule with Azure to develop and test an individual Ansible role following the red/green/refactor infracode workflow, which can be generalised as:

  • Red– write a failing infrastructure test
  • Green – write the Ansible tasks needed to pass the test
  • Refactor – repeat the process

The steps required for this tutorial are as follows:

Azure setup

Ensure there is an existing Azure Resource Group that will be used for infracode development and testing. Within the resource group, ensure there is a single virtual network (vnet) with a single subnet. Ansible will use these for the default network setup.

Setup a working environment

There are a number of options for setting up a Python environment for Ansible and Molecule, including Python virtualenv or a Docker container environment.

Create a Docker image for Ansible+Molecule+Azure

This tutorial uses a Docker container environment. A Dockerfile for the image can be found in ./molecule-azure-image/Dockerfile. The image sets up a sane Python3 environment with Ansible, Ansible[azure], and Molecule pip modules installed.

Create a Docker workspace

Setup a working environment using the Docker image with Ansible, Molecule, and the azure-cli installed.

This example assumes the following:

  • a resource group already exists with access rights to create virtual machines; and
  • the resource group contains a single vnet with a single subnet

Log into an Azure subcription

Ansible supports a number of different methods for authenticating with Azure. This example uses the azure-cli to login interactively.

Create an empty Ansible role with Molecule

Molecule provides an init function with defaults for various providers. The molecule-azure-role-template creates an empty role with scaffolding for Azure.

Check that the environment is working by running the following code:

The output should look be similar to…

Spin up an Azure VM

Spin up a fresh VM to be used for infra-code development.

Molecule provides a handy option for logging into the new VM:

There is now a fresh Ubuntu 18.04 virtual machine ready for infra-code development. For this example, a basic Nginx server will be installed and verified.

Write a failing test

Testinfra provides a pytest based framework for verifying server and infrastructure configuration. Molecule then manages the execution of those testinfra tests. The Molecule template provides a starting point for crafting tests of your own. For this tutorial, installation of the nginx service is verified. Modify the tests file using vi molecule/default/tests/test_default.py

Execute the failing test

The Ansible task needed to install and enable nginx has not yet been written, so the test should fail:

If the initial sample tests in test_default.py are kept, then 3 tests should fail and 2 tests should pass.

Write a task to install nginx

Add a task to install the nginx service using vi tasks/main.yml:

Apply the role

Apply the role to the instance created using Molecule.

The nginx package should now be installed, both enabled and started, and listening on port 80. Note that the nginx instance will not be accessible from the Internet due to the Azure network security rules. The nginx instance can be confirmed manually by logging into the instance and using curl to make a request to the nginx service.

Execute the passing test

After applying the Ansible task to the instance, the testinfra tests should now pass.


Now that the Ansible role works as defined in the test specification, the development environment can be cleaned up.

Molecule removes the Azure resources created to develop and test the configuration role. Note that deletion may take a few minutes.

Finally, once you are done, exit the container environment. If the container was started with the --rm switch, the container will also be removed, leaving you with a clean workspace and newly minted Ansible role with automated test cases.

Full source code can be found at: https://github.com/gamma-data/json-wrangling-with-golang

Really Simple Terraform – S3 Object Notifications using Lambda and SES

Following on from the previous post in the Really Simple Terraform series simple-lambda-ec2-scheduler, where we used Terraform to deploy a Lambda function including the packaging of the Python function into a ZIP archive and creation of all supporting objects (roles, policies, permissions, etc) – in this post we will take things a step further by using templating to update parameters in the Lambda function code before the packaging and creation of the Lambda function.

S3 event notifications can be published directly to an SNS topic which you could create an email subscription, this is quite straightforward. However the email notifications you get look something like this:

Email Notification sent via an SNS Topic Subscription

There is very little you can do about this.

However if you take a slightly different approach by triggering a Lambda function to send an email via SES you have much more control over content and formatting. Using this approach you could get an email notification that looks like this:

Email Notification sent using Lambda and SES

Much easier on the eye!


You will need verified AWS SES (Simple Email Service) email addresses for the sender and recipient’s addresses used for your object notification emails. This can be done via the console as shown here:

SES Email Address Verification

Note that SES is not available in every AWS region, pick one that is generally closest to your particular reason (but it really doesn’t matter for this purpose).


The Terraform module creates an IAM Role and associated policy for the Lambda function as shown here:

Variables in the module are substituted into the function code template, the rendered template file is then packaged as a ZIP archive to be uploaded as the Lambda function source as shown here:

As in the previous post, I will reiterate that although Terraform is technically not a build tool, it can be used for simple build operations such as this.

The Lambda function is deployed using the following code:

Finally the S3 object notification events are configured as shown here:

Use the following commands to run this example (I have created a default credentials profile, but you could supply your API credentials directly, use STS, etc):

cd simple-notifications-with-lambda-and-ses
terraform init
terraform apply

Full source code for this article can be found at: https://github.com/avensolutions/simple-notifications-with-lambda-and-ses

Really Simple Terraform – Infrastructure Automation using AWS Lambda

There are many other blog posts and examples available for either scheduling infrastructure tasks such as the starting or stopping of EC2 instances; or deploying a Lambda function using Terraform. However, I have found many of the other examples to be unnecessarily complicated, so I have put together a very simple example doing both.

The function itself could be easily adapted to take other actions including interacting with other AWS services using the boto3 library (the Python AWS SDK). The data payload could be modified to pass different data to the function as well.

The script only requires input variables for schedule_expression (cron schedule based upon GMT for triggering the function – could also be expressed as a rate, e.g. rate(5 minutes)) and environment (value passed to the function on each invocation). In this example the Input data is the value for the “Environment” key for an EC2 instance tag – a user defined tag to associate the instance to a particular environment (e.g. Dev, Test. Prod). The key could be changed as required, for instance if you wanted to stop instances based upon their given name or part thereof you could change the tag key to be “Name”.

When triggered, the function will stop all running EC2 instances with the given Environment tag.

The Terraform script creates:

  • an IAM Role and associated policy for the Lambda Function
  • the Lambda function
  • a Cloudwatch event rule and trigger

The IAM role and policies required for the Lambda function are deployed as shown here:

The function source code is packaged into a ZIP archive and deployed using Terraform as follows:

Admittedly Terraform is an infrastructure automation tool and not a build/packaging tool (such as Jenkins, etc), but in this case the packaging only involves zipping up the function source code, so Terraform can be used as a ‘one stop shop’ to keep things simple.

The Cloudwatch schedule trigger is deployed as follows:

Use the following commands to run this example (I have created a default credentials profile, but you could supply your API credentials directly, use STS, etc):

cd simple-lambda-ec2-scheduler
terraform init
terraform apply
Terraform output

Full source code is available at: https://github.com/avensolutions/simple-lambda-ec2-scheduler

Stay tuned for more simple Terraform deployment recipes in coming posts…