Map Reduce is Dead, Long Live Map Reduce

Firstly, this is not another Hadoop obituary, there are enough of those out there already.

The generalized title of this article has been used as an expression to convey the idea that something old has been replaced by something new. In the case of the expression “the King is dead, long live the King” the inference is that although one monarch has passed, another monarch instantly succeeds him.

In the age of instant gratification and hype cycle driven ‘pump and dump’ investment we are very quick to discard technologies that don’t realise overzealous targets for sales or market share. In our continuous attempts to find the next big thing, we are quick to throw out the last big thing and everything associated with it.

The Reports of My Death Have Been Greatly Exaggerated

A classic example of this is the notion that Map Reduce is dead. Largely proliferated by the Hadoop obituaries which seem to be growing exponentially with each day.

A common e-myth is that Google invented the Map Reduce pattern, which is completely incorrect. In 2004, Google described a framework distributed systems implementation of the Map Reduce pattern in a white paper named “MapReduce: Simplified Data Processing on Large Clusters.” – this would inspire the first-generation processing framework (MapReduce) in the Hadoop project. But neither Google nor Yahoo! nor contributors to the Hadoop project (which include the pure play vendors) created the Map Reduce algorithm or processing pattern and neither shall any one of these have the rights to kill it.

The origins of the Map Reduce pattern can be traced all the way back to the early foundations of functional programming beginning with Lambda Calculus in the 1930s to LISP in the 1960s. Map Reduce is an integral pattern in all of today’s functional and distributed systems programming. You only need to look at the support for map() and reduce() operators in some of the most popular languages today including Python, JavaScript, Scala, and many more languages that support functional programming.

As far as distributed processing frameworks go, the Map Reduce pattern and its map() and reduce() methods are very prominent as higher order functions in APIs such as Spark, Kafka Streams, Apache Samza and Apache Flink to name a few.

While the initial Hadoop adaptation of Map Reduce has been supplanted by superior approaches, the Map Reduce processing pattern is far from dead.

On the fall of Hadoop…

There is so much hysteria around the fall of Hadoop, we need to be careful not to toss the baby out with the bath water. Hadoop served a significant role in bringing open source, distributed systems from search engine providers to academia all the way to the mainstream, and still serves an important purpose in many organizations data ecosystems today and will continue to do so for some time.

OK, it wasn’t the panacea to everything, but who said it was supposed to be? The Hadoop movement was hijacked by hysteria, hype, venture capital, over ambitious sales targets and financial engineering – this does not mean the technology was bad.

Hadoop spawned many significant related projects such as Spark, Kafka and Presto to name a few. These projects paved the way for cloud integration, which is now the dominant vector for data storage, processing, and analysis.

While the quest for world domination by the Hadoop pure play vendors may be over, the Hadoop movement (and the impact it has had on the enterprise data landscape) will live on.

When Life Gives you Undrinkable Wine – Make Cognac…

Firstly, this is not another motivational talk or motherhood statement (there are plenty of those out there already), but a metaphor about resourcefulness.

We have all heard the adage “when life gives you lemons, make lemonade”. Allow me to present a slight twist (pardon the pun…) on this statement.

Cognac is a variety of brandy named after the town of Cognac, France. It is distilled from a white wine grown from grapes in the area. The wine used to make Cognac is characterised as “virtually undrinkable”. However in its distilled form, Cognac is considered to be the world’s most refined spirit. With bottles of high end Cognac fetching as much as $200,000.

The area surrounding the town of Cognac was a recognised wine-growing area dating back to the third century, however when it was evident that the grapes produced in the town of Cognac itself were unsuitable for wine making, local producers developed the practice of double distillation in copper pot stills and ageing in oak barrels for at least two years. The product yielded was the spirit we now know as Cognac.

Fast forward 15 centuries to Scotland in the 1800’s, where John Walker was a humble shopkeeper. Local whiskey producers would bring John different varieties of single malt whiskeys, many of them below an acceptable standard. Instead of on selling these varieties – he began blending them as made-to-order whiskies, blended to meet specific customer requirements. From this idea a product (and subsequently one of the most globally recognised brands) was created which would last over 200 years.

Interesting, but what does this have to do with technology you might ask?

Ingenuity and resourcefulness have existed for as long as man has, but in the realm of technology and in particular cloud computing and open source software it has never been more imperative. We are continually faced with challenges on each assignment or project, often technology not working as planned or as advertised, or finding dead ends when trying to solve problems. This is particularly evident now as software and technology are moving at such an accelerated rate.

To be successful in this era you need creativity and lateral thinking. You need an ability not just to solve problems to which there are no documented solutions, but to create new products from existing ones which are not necessarily suitable for your specific objective. In many cases, looking to add value in the process. That is not just making lemonade from lemons (which anyone could think of) but making Cognac from sub-standard grapes or premium blended whiskey from sub-standard single malt whiskies.

One of my favourite Amazon Leadership Principles is “Invent and Simplify”:

Leaders expect and require innovation and invention from their teams and always find ways to simplify. They are externally aware, look for new ideas from everywhere, and are not limited by “not invented here”. As we do new things, we accept that we may be misunderstood for long periods of time.

https://www.amazon.jobs/en/principles

The intent in this statement is not simply to “lift and shift” current on premise systems and processes to the cloud, but to look for ways to streamline, rationalise and simplify processes in doing so. This mandates a combination of practicality and creativity.

The take away is when you are presented with a challenge, be creative and inventive and not just solve the problem but look for ways to add value in the process.

Cheers!

The Cost of Future Change: What we should really be focused on (but no one is…)

In my thirty year career in Technology I can’t think of a more profound period of change. It is not as if change never existed previously, it did. Innovation and Moore’s law have been constant features in technology. However, for decades we had experienced a long period of relativity stability. That is, the rate of change was somewhat fixed. This was largely due to the fact that change, for the most part, was dictated almost entirely by a handful of large companies (such as Oracle, Microsoft, IBM, SAP, etc.). We, the technology community, were exclusively at the mercy of the product management teams at these companies dreaming up the next new feature, and we were subject to the tech giant’s product release cycle as to when we could use this feature. The rate of change was therefore artificially suppressed to a degree.

Then along came the open source software revolution. I can recall being fortunate enough to be in a meeting with Steve Ballmer along with a small group of Microsoft partners in Sydney, Australia around 2002. I can remember him as a larger than life figure with a booming voice and an intense presence. I can vividly remember him going on a long diatribe about Open Source Software, primarily focused around Linux. To put some historical context around it, Linux was a fringe technology at best at that point in time – nobody was taking it seriously in the enterprise. The Apache Software Foundation wasn’t a blip on the radar at the time, Software-as-a-Service was in its infancy, and Platform-as-a-Service and cloud were non-factors – so what was he worried about?

From the fear instilled in his rant, you would have thought this was going to be the end of humanity as we knew it. Well it was the beginning of the end, the end of the halcyon days of the software industry as we knew it. The early beginnings of the end of the monopoly on change held by a select few software vendors. The beginning of the democratisation of change.

Fast forward fifteen years, and now everyone is a (potential) software publisher. Real world problems are solved in real companies and in many cases software products are created, published and made (freely) available to other companies. We have seen countless examples of this pattern, Hadoop at Yahoo!, Kafka at LinkedIn, Airflow at Airbnb, to name a few. Then there are companies created with scant capital or investment, driven by a small group of smart people focused on solving a problem or streamlining a process that they have encountered in the real world. Many of these companies growing to be globally recognised names, such as Atlassian or Hashicorp.

The rate of change is no longer suppressed to the privileged few – in fact we have an explosion of change. No longer do we have a handful of technology options to achieve a desired outcome or solve a particular problem, we have a constellation of options.

Now the bad news, the force multiplier in change created by open source communities cuts both ways. When the community is behind a project it gets hyper-charged, conversely when the community drops off, functionality and stability drop off just as quickly.

This means there are components, projects, modules, services we are using today that won’t be around in the same format in 2-3 years’ time. There are products that don’t exist today that will be integral (and potentially critical) to our operations in 1-2 years’ time.

This brings me to my final point – the cost of future change. We as leaders and custodians of technology in this period should not only factor in current day run costs or the cost of change we know about (the cost of current change), but the hidden cost of future change. We need to think that whatever we are doing now is not what we will be doing in a years’ time – we will need to accommodate future change.

We need to be thinking about the cost of future change and what we can do to minimise this. This means not just swapping out one system for another, but thinking about how you will change this system or component again in the near future. We need to take the opportunity that is in front of us to replace monoliths with modular systems, move from tightly coupled to loosely coupled components, from proprietary systems to open systems, from rigid architectures to extensible architectures.

When you’re planning a change today, consider the cost of future change…