Bigtable is one of the foundational services in the Google Cloud Platform and to this day one of the greatest contributions to the big data ecosystem at large. It is also one of the least known services available, with all the headlines and attention going to more widely used services such as BigQuery.
In 2006 (pre Google Cloud Platform), Google released a white paper called “Bigtable: A Distributed Storage System for Structured Data”, this paper set out the reference architecture for what was to become Cloud Bigtable. This followed several other whitepapers including the GoogleFS and MapReduce whitepapers released in 2003 and 2004 which provided abstract reference architectures for the Google File System (now known as Colossus) and the MapReduce algorithm. These whitepapers inspired a generation of open source distributed processing systems including Hadoop. Google has long had a pattern of publicising a generalized overview of their approach to solving different storage and processing challenges at scale through white papers.
The Bigtable white paper inspired a wave of open source distributed key/value oriented NoSQL data stores including Apache HBase and Apache Cassandra.
What is Bigtable?
Bigtable is a distributed, petabyte scale NoSQL database. More specifically, Bigtable is…
At its core Bigtable is a distributed map or an associative array indexed by a row key, with values in columns which are created only when they are referenced. Each value is an uninterpreted byte array.
Row keys are stored in lexographic order akin to a clustered index in a relational database.
A given row can have any number of columns, not all columns must have values and NULLs are not stored. There may also be gaps between keys.
All values are versioned with a timestamp (or configurable integer). Data is not updated in place, it is instead superseded with another version.
When (and when not) to use Bigtable
You need to do many thousands of operations per second on TB+ scale data
Your access patterns are well known and simple
You need to support random write or random read operations (or sequential reads) – each using a row key as the primary identifier
Don’t use Bigtable if…
You need explicit JOIN capability, that is joining one or more tables
You need to do ad-hoc analytics
Your access patterns are unknown or not well defined
Bigtable vs Relational Database Systems
The following table compares and contrasts Bigtable against relational databases (both transaction oriented and analytic oriented databases):
Column Family Oriented
Single Row Only
Depends (but usually no)
Row Key Only
Yes (typically PI based)
Max Data Size
'00s GB to TB
Bigtable Data Model
Tables in Bigtable are comprised of rows and columns (sounds familiar so far..). Every row is uniquely identified by a rowkey (like a primary key..again sounds familiar so far).
Columns belong to Column Families and only exist when inserted, NULLs are not stored – this is where it starts to differ from a traditional RDBMS. The following image demonstrates the data model for a fictitious table in Bigtable.
In the previous example, we created two Column Families (cf1 and cf2). These are created during table definition or update operations (akin to DDL operations in the relational world). In this case, we have chosen to store primary attributes, like name, etc in cf1 and features (or derived attributes) in cf2 like indicators.
Each cell has a timestamp/version associated with it, multiple versions of a row can exist. Versions are naturally stored in descending order.
Properties such as the max age for a cell or the maximum number of versions to be stored for any given cell are set on the Column Family. Versions are compacted through a process called Garbage Collection – not to be confused with Java Garbage Collection (albeit same idea).
Bigtable Instances, Clusters, Nodes and Tables
Bigtable is a “no-ops” service, meaning you do not need to configure machine types or details about the underlying infrastructure, save a few sizing or performance options – such as the number of nodes in a cluster or whether to use solid state hard drives (SSD) or the magnetic alternative (HDD). The following diagram shows the relationships and cardinality for Cloud Bigtable.
Clusters and nodes are the physical compute layer for Bigtable, these are zonal assets, zonal and regional availability can be achieved through replication which we will discuss later in this article.
Instances are a virtual abstraction for clusters, Tables belong to instances (not clusters). This is due to Bigtables underlying architecture which is based upon a separation of storage and compute as shown below.
Bigtables separation of storage and compute allow it to scale horizontally, as nodes are stateless they can be increased to increase query performance. The underlying storage system in inherently scalable.
Physical Storage & Column Families
Data (Columns) for Bigtable is stored in Tablets (as shown in the previous diagram), which store “regions” of row keys for a particular Column Family. Columns consist of a column family prefix and qualifier, for instance:
A table can have one or more Column Families. Column families must be declared at schema definition time (could be a create or alter operation). A cell is an intersection of a row key and a version of a column within a column family.
Storage settings (such as the compaction/garbage collection properties mentioned before) can be specified for each Column Family – which can differ from other column families in the same table.
Bigtable Availability and Replication
Replication is used to increase availability and durability for Cloud Bigtable – this can also be used to segregate read and write operations for the same table.
Data and changes to tables are replicated across multiple regions or multiple zones within the same region, this replication can be blocking (single row transactions) or non blocking (eventually consistent). However all clusters within a Bigtable instance are considered primary (writable).
Requests are routed using Application Profiles, a single-cluster routing policy can be used for manual failover, whereas a multi-cluster routing is used for automatic failover.
Backup and Export Options for Bigtable
Managed backups can be taken at a table level, new tables can be created from backups. The backups cannot be exported, however table level export and import operations are available via pre-baked Dataflow templates for data stored in GCS in the following formats:
Bigtable data and admin functions are available via:
cbt (optional component of the Google SDK)
hbase shell (REPL shell)
Happybase API (Python API for Hbase)
SDK libraries for:
C#, C++, PHP, and more
As Bigtable is not a cheap service, there is a local emulator available which is great for application development. This is part of the Cloud SDK, and can be started using the following command:
gcloud beta emulators bigtable start
In the next article in this series we will demonstrate admin and data functions as well as the local emulator.
This article demonstrates creating a site to site IPSEC VPN connection between a GCP VPC network and an Azure Virtual Network, enabling private RFC1918 network connectivity between virtual networks in both clouds. This is done using a single PowerShell script leveraging Azure PowerShell and gcloud commands in the Google SDK.
Additionally, we will use Azure Private DNS to enable private access between Azure hosts and GCP APIs (such as Cloud Storage or Big Query).
An overview of the solution is provided here:
One note before starting – site to site VPN connections between GCP and Azure currently do not support dynamic routing using BGP, however creating some simple routes on either end of the connection will be enough to get going.
Let’s go through this step by step:
Step 1 : Authenticate to Azure
Azure’s account equivalent is a subscription, the following command from Azure Powershell is used to authenticate a user to one or more subscriptions.
This command will open a browser window prompting you for Microsoft credentials, once authenticated you will be returned to the command line.
Step 2 : Create a Resource Group (Azure)
A resource group is roughly equivalent to a project in GCP. You will need to supply a Location (equivalent to a GCP region):
Step 3 : Create a Virtual Network with Subnets and Routes (Azure)
An Azure Virtual Network is the equivalent of a VPC network in GCP (or AWS), you must define subnets before creating a Virtual Network. In this example we will create two subnets, one Gateway subnet (which needs to be named accordingly) where the VPN gateway will reside, and one subnet named ‘default’ where we will host VMs which will connect to GCP services over the private VPN connection.
Before defining the default subnet we must create and attach a Route Table (equivalent of a Route in GCP), this particular route will be used to route ‘private’ requests to services in GCP (such as Big Query).
Network Security Groups in Azure are stateful firewalls much like Firewall Rules in VPC networks in GCP. Like GCP, the lower priority overrides higher priority rules.
In the example we will create several rules to allow inbound ICMP, TCP and UDP traffic from our Google VPC and RDP traffic from the Internet (which we will use to logon to a VM in Azure to test private connectivity between the two clouds):
We need to create two Public IP Address (equivalent of an External IP in GCP) which will be used for our VPN gateway and for the VM we will create:
# create public IP address for VM
$vmpip = New-AzPublicIpAddress `
-Name "vm-ip" `
-ResourceGroupName "azure-to-gcp" `
-Location "Australia Southeast" `
# create public IP address for NW gateway
$ngwpip = New-AzPublicIpAddress `
-Name "ngw-ip" `
-ResourceGroupName "azure-to-gcp" `
-Location "Australia Southeast" `
Step 6 : Create Virtual Network Gateway (Azure)
The Virtual Network Gateway in Azure is the VPN Gateway equivalent in Azure which will be used to create a VPN tunnel between Azure and a GCP VPN Gateway. This gateway will be placed in the Gateway subnet created previously and one of the Public IP addresses created in the previous step will be assigned to this gateway.
# create virtual network gateway
$ngwipconfig = New-AzVirtualNetworkGatewayIpConfig `
-Name "ngw-ipconfig" `
-SubnetId $gatewaySubnet.Id `
# use the AsJob switch as this is a long running process
$job = New-AzVirtualNetworkGateway -Name "vnet-gateway" `
-ResourceGroupName "azure-to-gcp" `
-Location "Australia Southeast" `
-IpConfigurations $ngwipconfig `
-GatewayType "Vpn" `
-VpnType "RouteBased" `
-GatewaySku "VpnGw1" `
-VpnGatewayGeneration "Generation1" `
$vnetgw = Get-AzVirtualNetworkGateway `
-Name "vnet-gateway" `
Step 7 : Create a VPC Network and Subnetwork(s) (GCP)
A VPC network and subnet need to be created in GCP, the subnet defines the VPC address space. This address space must not overlap with the Azure Virtual Network CIDR. For all GCP steps it is assumed that the project is set for client config (e.g. gcloud config set project <>) so it does not need to be specified for each operation. Private Google access should be enabled on all subnets created.
Now we will create the GCP side of our VPN tunnel using the Public IP Address of the Azure Virtual Network Gateway created in a previous step. As this example uses a route based VPN the traffic selector values need to be set at 0.0.0.0/0. A PSK (Pre Shared Key) needs to be supplied which will be the same key used when we configure a VPN Connection on the Azure side of the tunnel.
As we are using static routing (as opposed to dynamic routing) we will need to define all of the specific routes on the GCP side. We will need to setup routes for both outgoing traffic to the Azure network as well as incoming traffic for the restricted Google API range (126.96.36.199/30).
Now we can setup the Azure side of the VPN Connection which is accomplished by associating the Azure Virtual Network Gateway with the Local Network Gateway. A PSK (Pre Shared Key) needs to be supplied which is the same key used for the GCP VPN Tunnel in step 10.
Perform an nslookup to ensure that calls to googleapis.com resolve to the restricted API range (e.g. nslookup storage.googleapis.com). You should see a response showing the A records from the googleapis.com Private DNS Zone created in step 14.
Now test connectivity to Google APIs, for example to test access to Google Cloud Storage using gsutil, or test access to Big Query using the bq command
In this post we will look at read replicas as an additional method to achieve multi zone availability for Cloud SQL, which gives us – in turn – the ability to offload (potentially expensive) IO operations such as user created backups or read operations without adding load to the master instance.
In the previous post in this series we looked at Regional availability for PostgreSQL HA using Cloud SQL:
Recall that this option was simple to implement and worked relatively seamlessly and transparently with respect to zonal failover.
Now let’s look at read replicas in Cloud SQL as an additional measure for availability.
Deploying Read Replica(s)
Deploying read replicas is slightly more involved than simple regional (high) availability, as you will need to define each replica or replicas as a separate Cloud SQL instance which is a slave to the primary instance (the master instance).
An example using Terraform is provided here, starting by creating the master instance:
Next you would specify one or more read replicas (typically in a zone other than the zone the master is in):
Note that several of the options supplied are omitted when creating a read replica database instance, such as the backup and maintenance options – as these operations cannot be performed on a read replica as we will see later.
Voila! You have just set up a master instance (the primary instance your application and/or users will connect to) along with a read replica in a different zone which will be asynchronously updated as changes occur on the master instance.
Read Replicas in Action
Now that we have created a read replica, lets see it in action. After connecting to the read replica (like you would any other instance), attempt to access a table that has not been created on the master as shown here:
Now create the table and insert some data on the master instance:
Now try the select operation on the replica instance:
Some Points to Note about Cloud SQL Read Replicas
Users connect to a read replica as a normal database connection (as shown above)
Google managed backups (using the console or gcloud sql backups create .. ) can NOT be performed against replica instances
Read replicas can be used to offload IO intensive operations from the the master instance – such as user managed backup operations (e.g. pg_dump)
BE CAREFUL Despite their name, read replicas are NOT read only, updates can be made which will NOT propagate back to the master instance – you could get yourself in an awful mess if you allow users to perform INSERT, UPDATE, DELETE, CREATE or DROP operations against replica instances.
Promoting a Read Replica
If required a read replica can be promoted to a standalone Cloud SQL instance, which is another DR option. Keep in mind however as the read replica is updated in an asynchronous manner, promotion of a read replica may result in a loss of data (hopefully not much but a loss nonetheless). Your application RPO will dictate if this is acceptable or not.
Promotion of a read replica is reasonably straightforward as demonstrated here using the console:
Once you click on the Promote Replica button you will see the following warning:
This simply states that once you promote the replica instance your instance will become an independent instance with no further relationship with the master instance. Once accepted and the promotion process is complete, you can see that you now have two independent Cloud SQL instances (as advertised!):
Some of the options you would normally configure with a master instance would need to be configured on the promoted replica instance – such as high availability, maintenance and scheduled backups – but in the event of a zonal failure you would be back up and running with virtually no data loss!
A refresher on the data plane, and what the userspace proxy can perform:
Routing: Given a REST request for /hello from the local service instance, where should that request be sent?
Load Balancing: Once routing has done its job, to which upstream service instance should the request be sent? With what timeout? If the request fails, should it be retried?
Authorisation and Authentication: For requests that are incoming, can cryptographic functions determine the authenticity of that requests? Is the called allowed to invoke the requested endpoint?
Observability: Detailed logging, statistics, distributed tracing data so that operators can understand the traffic flow and debug problems as they occur
Service Discovery: What backend/upstream service instances are available?
Health Checking: Are upstream service instances healthy and ready to accept traffic?
The control plane is slightly less complex. For the data plane to act in a coordinated fashion, the control plane gives you the machinery to make that happen. This is the magical part of the service mesh; the control plane takes a set of isolated sidecar proxies and turns them into a distributed system. The control plane in turn provides an API to allow the user to modify and inspect the behaviour of the data plane.
You can see from the diagram below the proxies are right next to the service in the same node. We usually call those ‘sidecar’ containers.
The diagram above gives you a high level indication of what the service mesh would look like. What if I don’t have many services? Then the service mesh probably isn’t for you. That’s a whole lot of machinery to run a single proxy! Having said this, if your solution is running hundreds or thousands of services, then you’re going to require a whole heap of proxies.
So there you have it. The service mesh with its control and data plane. To put it simply, the goal of the control plane is to monitor and set a policy that will eventually be enacted by the data plane.
You’ve taken over a project, and the security team have mandated the use of the service mesh. You’ve never used it yourself before, and the confusion as to why we need another thing is getting you down. An additional thing next to my container that will add latency? And consume resources? And I have to maintain it?! Why would anyone need or want this?
While there are a few answers to this, the most important answer is something I alluded to in an example in part 1 of this series: this design is a great way to add additional logic into the system. Not only can you add additional logic (to containers possibly outside of your control) but you can do this uniformly across the entire mesh! The service mesh gives you features that are critical for running software that’s uniform across your whole stack
The set of features that the service mesh can provide include reliability features (Retries, timeouts etc), observability features (latencies, volume etc) and security features (mTLS, access control etc).
Let’s break it down
Server-side software relies on these critical features If you’re building any type of modern server-side software that’s predicated on multiple services, think API’s and web-apps, and if you’re continually adding features to this in a short timeframe, then all the features listed above become critical for you. Your applications must be reliable, observable and most importantly secure. This is exactly what the service mesh helps you with.
One view to rule them all The features mentioned above are language-agnostic, don’t care about your framework, who wrote it or any part of your development life cycle. They give you, your team and your company a consistent way to deploy changes across your service landscape
Decoupled from application code It’s important to have a single place to include application and business logic, and not have the nightmare of managing that in multiple components of your system. The core stewardship of the functionality that the service mesh provides lies at the platform level. This includes maintenance, deployments, operation etc. The application can be updated and deployed by developers maintaining the application, and the service mesh can change without the application being involved.
Yes, while the features of the service mesh could be implemented as application code, this solution would not help in driving uniform features sets across the whole system, which is the value proposition for the service mesh.
If you’re a business-logic developer, you probably don’t need to worry about the service mesh. Keep pumping out that new fangled business logic that makes the software oh-so-usable
If you’re in a platform role and most likely using Kubernetes, then you should be right on top of the service mesh! That is unless your architecture dictates a monolith. You’re going to have a lot of services talking to one another, all tied together with an overarching dependency.
If you’re in a platform role with no Kubernetes but a bunch of microservices, you should maybe care a little bit about the service mesh, but without the power of Kubernetes and the ease of deployment it brings, you’ll have to weigh up how you intend to manage all those proxies.
I hope you enjoyed this article, please feel free to reach out to me at:
In this multi part blog we will explore the features available in Google Cloud SQL for High Availability, Backup and Recovery, Replication and Failover and Security (at rest and in transit) for the PostgreSQL DBMS engine. Some of these features are relatively hot of the press and in Beta – which still makes them available for general use.
We will start by looking at the High Availability (HA) options available to you when using the PostgreSQL engine in Google Cloud SQL.
Most of you would be familiar with the concepts of High Availability, Redundancy, Fault Tolerance, etc but let’s start with a definition of HA anyway:
High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Higher than a normal period is quite subjective, typically this is quantified by a percentage represented by a number of “9s”, that is 99.99% (which would be quoted as “four nines”), this would allot you 52.60 minutes of downtime over a one-year period.
Essentially the number of 9’s required will drive your bias towards the options available to you for Cloud SQL HA.
We will start with Cloud SQL HA in its simplest form, Regional Availability.
Knowing what we know about the Google Cloud Platform, regional availability means that our application or service (in this case Cloud SQL) should be resilient to a failure of any one zone in our region. In fact, as all GCP regions have at least 3 zones – two zones could fail, and our application would still be available.
Regional availability for Cloud SQL (which Google refers to as High Availability), creates a standby instance in addition to the primary instance and uses a regional Persistent Disk resource to store the database instance data, transaction log and other state files, which is synchronously replicated to a Persistent Disk resource local to the zones that the primary and standby instances are located in.
A shared IP address (like a Virtual IP) is used to serve traffic to the healthy (normally primary) Cloud SQL instance.
An overview of Cloud SQL HA is shown here:
Implementing High Availability for Cloud SQL
Implementing Regional Availability for Cloud SQL is dead simple, it is one argument:
availability_type = "REGIONAL"
Using the gcloud command line utility, this would be:
There is an --async option which will return immediately, invoking the failover operation asynchronously.
Failover can also be invoked from the Cloud Console using the Failover button shown previously. As an example I have created a connection to a regionally available Cloud SQL instance and started a command which runs a loop and prints out a counter:
Now using the gcloud command shown earlier, I have invoked a manual failover of the Cloud SQL instance.
Once the failover is initiated, the client connection is dropped (as the server is momentarily unavailable):
The connection can be immediately re-established afterwards, the state of the running query is lost – importantly no data is lost however. If your application clients had retry logic in their code and they weren’t executing a long running query, chances are no one would notice! Once reconnecting normal database activities can be resumed:
A quick check of the instance logs will show that the failover event has occured:
Now when you return to the instance page in the console you will see a Failback button, which indicates that your instance is being served by the standby:
Note that there may be a slight delay in the availability of this option as the replica is still being synched.
It is worth noting that nothing comes for free! When you run in REGIONAL or High Availability mode – you are effectively paying double the costs as compared to running in ZONAL mode. However availability and cost have always been trade-offs against one another – you get what you pay for…
Next up we will look at read replicas (and their ability to be promoted) as another high availability alternative in Cloud SQL.
So you’ve started delivering a new project and it’s all about this “Cloud Native” or “Microservices” thing. You’re a Delivery Manager or Software Engineer at some type of company and someone has lightly peppered a meeting with a term, ‘Mesh’.
They possibly said event mesh. Or better yet, they mentioned a service mesh. As time went on you kept hearing more and more about the service mesh. You’ve attempted to read up about it, digested a whole bunch of new terms and still didn’t completely understand what the Mesh even does, why you would need it or why the hype train around this technology shows no sign of stopping. This article is an attempt to provide a focused guide to the service mesh, and why it is so interesting.
Ok, so what is this thing?
Truth be told, the service mesh is actually pretty simple. It’s built around the idea of small, repeatable bits of software, in this case userspace proxies, stuck very close to your services. This is called the data plane. In addition to the userspace proxies, you also get a bunch of management processes, which is referred to as the control plane. Simply put, the data plane (userspace proxies) intercepts all calls between services and the control plane (management processes) coordinates the wholesale behaviour of those proxies. This allows you to perform sweeping changes across your service landscape via the control planes API’s, operators and provides the capability to measure your mesh as a whole.
Before we get into the engineering of what the proxies are, let’s go with an example.
The business has bought some software.
The engineers are tasked with deploying this software in their Kubernetes cluster.
The engineers first task is to containerise this application, expose its functionality to downstream applications and deploy it to the cluster in a repeatable, continuous fashion.
There’s a requirement in your organisation that says ‘I need all communications to this vendors software as TLS1.3’. Or, ‘I would like to measure all API latency from this application’.
The engineer replies ‘I can’t make changes to a third party application! What do I do?’. Service mesh to the rescue.
Using a service mesh, you can deploy a proxy right next to your vendor container and in effect, abstract away the complexities of measurement and data transport mechanisms, and allow the vendor software to concentrate on it’s business logic.
This vendor container is now part of the service mesh.
When we talk about proxies, we usually discuss things in OSI model terminology, that is to say Layers 1 through 7. Most of the time when it comes to proxies, you’re comparing Layer 4 to Layer 7. Here’s a quick run-down:
Layer 4 (L4) -> operates with the delivery of messages with no regard to the content of the messages. They would simply forward network packets to and from the server without inspecting any part of the packets.
Layer 7 (L7) -> this is a higher level, application layer. This deals with the actual content of the message. If you were routing network traffic, you could do this at L7 in a much more sophisticated way because you can now make decisions based on the packets messages within.
Why pick between L4 and L7? Speed.
Back to the service mesh, these userspace proxies are L7-aware TCP proxies. Think NGINX or haproxy. There are different proxies; Linkerd is an ultralight service mesh for Kubernetes. The most popular is Envoy, which was created by the ride-share company Lyft. Above, I also mentioned NGINX and haproxy which are also quite popular. So what differentiates NGINX proxies from the service mesh? Their focus. You would implement NGINX as an Ingress proxy (traffic entering your network), but when it comes to proxies that focus on traffic between services, that’s when the service mesh proxy comes in to play.
Ok, probably time for a diagram now that we’ve explained the Data Plane.
Tune in for part 2 for when we discuss the Control Plane!
There are many posts available which map analogous services between the different cloud providers, but this post attempts to go a step further and map additional concepts, terms, and configuration options to be the definitive thesaurus for cloud practitioners familiar with AWS looking to fast track their familiarisation with GCP.
It should be noted that AWS and GCP are fundamentally different platforms, nowhere is this more apparent than in the way networking is implemented between the two providers, see:
This post is focused on the core infrastructure, networking and security services offered by the two major cloud providers, I will do a future post on higher level services such as the ML/AI offerings from the respective providers.
Furthermore this will be a living post which I will continue to update, I encourage comments from readers on additional mappings which I will incorporate into the post as well.
I have broken this down into sections based upon the layout of the AWS Console.
One you get beyond the Associate level AWS certification exams into the Professional or Speciality track exams the degree of difficulty rises significantly. As a veteran of the Certified Solutions Architect Professional and Big Data Specialty exams, I thought I would share my experiences which I believe are applicable to all the certification streams and tracks in the AWS certification program.
First off let me say that I am a self-professed certification addict, having sat more than thirty technical certification exams over my thirty plus year career in technology including certification and re-certification exams. I would put the AWS professional and specialty exams right up there in terms of their level of difficulty.
The AWS Professional and Specialty exams are specifically designed to be challenging. Although they have removed the pre-requisites for these exams (much to my dismay…), you really need to be prepared for these exams otherwise you are throwing your hard-earned money away.
There are very few – if any – “easy” questions. All of the questions are scenario based and require you to design a solution to meet multiple requirements. The question and/or the correct answer will invariably involve the use of multiple AWS services (not just one). You will be tested on your reading comprehension, time management and ability to cope under pressure as well as being tested on your knowledge of the AWS platform.
The following sections provide some general tips which will help you approach the exam and give you the best chance of success on the day. This is not a brain dump or a substitute for the hard work and dedication required to ensure success on your exam day.
Needless to say, your ability to manage time is critical, on average you will have approximately 2-3 minutes to answer each question. Reading the questions and answers carefully may take up 1-2 minutes on its own. If the answer is not apparent to you, you are best to mark the question and come back to it at the end of the exam.
In many cases there may be subsequent questions and answer sets which jog your memory or help you deduce the correct answers to the questions you initial passed on. For instance, you may see references in future questions which put context around services you may not be completely familiar with, this may enable you to answer flagged questions with more confidence.
Of course, you must answer all questions before completing the exam, there are no points for incomplete or unattempted answers.
Recommended Approach to each Question
Most of the questions on the Professional or Specialty certification exams fall into one of three categories:
Short-ish question, multiple long detailed answer options
Long-ish scenario question, multiple relatively short answer options
Long-ish question with multiple relatively long, detailed answers
The latter scenario is thankfully less common. However, in all cases it is important to read the last sentence in the question first, this will provide indicators to help you read through the question in its entirety and all of the possible answers with a clear understanding of what is “really” being asked. For instance, the operative phrase may be “highly available” or “most cost effective”.
Try to eliminate answers based on what you know, for instance answers with erroneous instance families can be eliminated immediately. This will give you a much better statistical chance of success, even if you have to venture an educated guess in the end.
The Most Complicated Solution is Probably Not the Correct One
In many answer sets to questions on the Professional or Specialty exams you will see some ridiculously complicated solution approaches, these are most often incorrect answers. Although there may be enough loosely relevant terminology or services to appear reasonable.
Note the following statement direct from the AWS Certified Solutions Architect Professional Exam Blueprint:
“Distractors, or incorrect answers, are response options that an examinee with incomplete knowledge or skill would likely choose. However, they are generally plausible responses that fit in the content area defined by the test objective.”
AWS wants professionals who design and implement solutions which are simple, sustainable, highly available, scalable and cost effective. One of the key Amazon Leadership Principles is “Invent and Simplify”, simplify is often the operative word.
Don’t spend time on dumps or practice exams (other than those from AWS)
The question pools for AWS exams are enormous, the chances of you getting the same questions and answer sets as someone else are slim. Furthermore, non-AWS sources may not be trustworthy. There is no substitute to AWS white papers, how to’s, and real-life application of your learnings.
Don’t focus on Service Limits or Calculations
In my experiences with AWS exams, they are not overly concerned with service limits, default values, formulas (e.g. the formula to calculate required partitions for a DynamoDB table) or syntax – so don’t waste time remembering them. You should however understand the 7 layer OSI model and be able to read and interpret CIDR notation.
Mainly, however, they want you to understand how services work together in an AWS solution to achieve an outcome for a customer.
Some Final Words of Advice
Always do what you
think AWS would want you to do!
It is worthwhile having a quick look at the AWS Leadership Principles (I have already referenced one of these in this article) as these are applied religiously in every aspect of the AWS business. In particular, you should pay specific attention to the principals around simplicity and frugality.
GCP and AWS share many similarities, they both provide similar services and both leverage containerization, virtualization and software defined networking.
There are some significant differences when it comes to their respective implementations, networking is a key example of this.
Before we compare and contrast the two different approaches to networking, it is worthwhile noting the genesis of the two major cloud providers.
Google was born to be global, Amazon became global
By no means am I suggesting that Amazon didn’t have designs on going global from it’s beginnings, but AWS was driven (entirely at the beginning) by the needs of the Amazon eCommerce business. Amazon started in the US before expanding into other regions (such as Europe and Australia). In some cases the expansion took decades (Amazon only entered Australia as a retailer in 2018).
Google, by contrast, was providing application, search and marketing services worldwide from its very beginning. GCP which was used as the vector to deliver these services and applications was architected around this global model, even though their actual data centre expansion may not have been as rapid as AWS’s (for example GCP opened its Australia region 5 years after AWS).
Their respective networking implementations reflect how their respective companies evolved.
AWS is a leader in IaaS, GCP is a leader in PaaS
This is only an opinion and may be argued, however if you look at the chronology of the two platforms, consider this:
The first services released by AWS (simultaneously for all intents and purposes) were S3, SQS and EC2
The first service released by Google was AppEngine (a pure PaaS offering)
Google has launched and matured their IaaS offerings since as AWS has done similarly with their PaaS offerings, but they started from much different places.
With all of that said, here are the key differences when it comes to networking between the two major cloud providers:
GCP VPCs are Global by default, AWS VPCs are Regional only
This is the first fundamental difference between the two providers. Each GCP project is allocated one VPC network with Subnets in each of the 18 GCP Regions. Whereas each AWS Account is allocated one Default VPC in each AWS Region with a Subnet in each AWS Availability Zone for that Region, that is each account has 17 VPCs in each of the 17 Regions (excluding GovCloud regions).
It is entirely possible to create VPCs in GCP which are Regional, but they are Global by default.
This global tenancy can be advantageous in many cases, but can be limiting in others, for instance there is a limit of 25 peering connections to any one VPC, the limit in AWS is 125.
GCP Subnets are Regional, AWS Subnets are Zonal
Subnets in GCP automatically span all Zones in a Region, whereas AWS VPC Subnets are assigned to Availability Zones in a Region. This means you are abstracted from some of the networking and zonal complexity, but you have less control over specific network placement of instances and endpoints. You can infer from this design that Zones are replicated or synchronised within a Region, making them less of a direct consideration for High Availability (or at least as much or your concern as they otherwise would be).
All GCP Firewall Rules are Stateful
AWS Security Groups are stateful firewall rules – meaning they maintain connection state for inbound connections, AWS also has Network ACLs (NACLs) which are stateless firewall rules. GCP has no direct equivalent of NACLs, however GCP Firewall Rules are more configurable than their AWS counterparts. For instance, GCP Firewall Rules can include Deny actions which is not an option with AWS Security Group Rules.
Load Balancers in GCP are layer 4 (TCP/UDP) unless they are public facing
AWS Application Load Balancers can be deployed in private VPCs with no external IPs attached to them. GCP has Application Load Balancers (Layer 7 load balancers) but only for public facing applications, internal facing load balancers in GCP are Network Load Balancers. This presents some challenges with application level load balancing functionality such as stickiness. There are potential workarounds however such as NGINX in GKE behind
Firewall rules are at the Network Level not at the Instance or Service Level
There are simple firewall settings available at the instance level, these are limited to allowing HTTP and HTTPS traffic to the instance only and don’t allow you to specify sources. Detailed Firewall Rules are set at the GCP VPC Network level and are not attached or associated with instances as they are in AWS.
Hopefully this is helpful for AWS engineers and architects being exposed to GCP for the first time!