Modern cloud wars: an interview with DevOps expert Tarun Arora

This is Part I of my interview with Tarun Arora – the most efficient cloud platform engineer I have worked with. I’ve been consistently impressed with Tarun’s ability to 1) understand the capabilities and limits of the cloud platform; 2) understand the quirks of the developer requirements; and then 3) create environments that are a joy to work with (to the extent permissible) while delivering on business objectives – and also keeping performance, reliability, costs, safety, security and business continuity in mind.

So without further ado, let the cloud wars begin!

Amazon AWS vs Microsoft Azure – the good, the bad and the important

JD: Tarun, you have a lot of industry experience with cloud DevOps and I know you’ve used both Amazon AWS and Microsoft Azure extensively. Aside from Azure being the obvious choice for companies that are a “Microsoft shop”, in your view, how do these two leading cloud offerings compare, technically? Particularly from a DevOps point of view – where does each shine more than the other?

TA: Well Jordan, a wise man said to me those after a metric will find a metric, before I give you my perspective I’ll offer a metric as well:

Amazon Web Services is the most popular public cloud infrastructure platform, comprising 41.5% of application workloads in the public cloud. Microsoft owns 29.5% of the market share in this segment. However, Azure has reported an annual growth of 62.3% in its cloud business while AWS has reported a drop down to 33.2%.

You can read this metric front to back or back to front and you’ll probably conclude that both Amazon Web Services and Microsoft Azure are in this to win this. But the question remains, who is the better cloud provider? An interesting approach to compare the cloud provides is to look at their offerings in the following six verticals – security, scale, services, cost, support and automation. I know various architecture teams that have spent months generalising the differences to nail down who the better cloud provider really is.

With both AWS and Azure offering comparable offerings the question is less about which cloud provider is better rather which provider is right for the requirements of the organization. Digging deeper into your business’ needs will highlight whether your company and its background is more aligned with Azure or AWS. For example, if an organization is in need of a strong Platform-as-a-service (PaaS) provider or needs Windows integration, Azure would be the preferable choice while if an enterprise is looking for infrastructure-as-a-service (IaaS) or diverse set of tools then AWS might be the best solution.

With various clients, forced to carry the legacy of their sun set applications in their private datacentres, another parameter for cloud provider decision making is how much hybrid cloud integration each provider offers. Microsoft has always been strong in the hybrid cloud space with its Azure stack offering, Amazon has recently made foray in this space with its AWS outpost solution.

So while there isn’t a lot of technical difference, having used both platforms I’ll offer my personal opinion. I admire how simple it is to get started with Microsoft Azure but find the integration of open source and command line tooling a lot better with AWS. I have found AWS support more prompt than Azure, with quicker resolution of technical support queries with AWS while Azure support almost always needs to tap up its Redmond based product teams it takes longer to get resolution. AWS need to spend some $$ on redesigning its console experience, Azure has a fantastic portal with deep integration into other services and offerings making it really easy to get more done with less clicks.

As a Microsoft MVP of Developer Tooling for Microsoft for 8 years, I can tell you that Microsoft has been in the developer tooling business for a very long time. While the focus initially was to create tools that worked really well with Windows, the focus over the last 6 years has changed to create tools that work for any developer, on any platform for any language. A clear example of this is Visual Studio Code, Microsoft’s free, open source editor that has become one of the most popular open source repository on GitHub.

Amazon has DevOps tools that are somewhat limited to its cloud platform. While they work great for AWS, use cases that involve integration into a heterogenous multi cloud scenario AWS DevOps tool score very low. In contrast Microsoft has two key toolchains for DevOps namely AzureDevOps and GitHub. A question you may ask is, why Microsoft has 2 DevOps tools and do they compete with one another? Well the simple answer is that Microsoft had AzureDevOps and then they acquired GitHub, it will take a few years to consolidate the two into a single offering. GitHub needs no introduction, it is where open source projects are born, grow and flourish. Microsoft’s acquisition and further increased funding of GitHub is further testimony to how much Microsoft is keen to grow its developer ecosystem, tooling and open source involvement. AzureDevOps has excellent integration into Azure and works really well with AWS. One of the most popular extensions in the AzureDevOps marketplace is the task library for AWS cloud.

No surprise Jordan, you and I have recently worked on a project building a data platform on AWS using Python and yet using AzureDevOps we’ve been able to deploy seamlessly into the AWS cloud. In your first few days of using AzureDevOps, you said to me (in your own words), “AzureDevOps is pretty cool! That’s not what I expected from a Microsoft product”.

JD: This is so true – in the 1990’s and early 2000’s I grew increasingly disappointed by Microsoft software and took a somewhat stubborn decision to move my career in the (then) opposite direction of Unix / Linux and open-source tools. Naturally, when the cloud became a thing, I sided with Amazon and have been building my expertise on AWS ever since. But you are right: having now experienced some Microsoft developer tools like AzureDevOps first-hand, you can colour me impressed. At least in terms of UI and integration, the experience is honestly much better than the AWS web console. (Perhaps if you give me another decade or two, I might also consider switching my desktop to Windows, we’ll see.)

Taking next-level control of the cloud with Infrastructure as Code

When we talk about Infrastructure-as-a-Service, we have to also discuss the relatively recent shift towards Infrastructure-as-Code. Tools like Terraform and AWS CDK are very-very hip right now. From your experience, what are the limitations of these tools at the moment? What could be done better?

TA: The original vision of cloud computing was automated, on-demand services that scale dynamically to meet demand. While this vision is now a reality, it doesn’t happen on its own.

Cloud automation is complex and requires specialized tools, expertise, and hard work. Infrastructure as Code (IaC) is one of the key enablers of the DevOps revolution. Together with cloud automation technology, It provides the ability to turn complex systems and environments into a few lines of code, which can be deployed at the click of a button. This enables automated pipelines which provide a rapid feedback loop for developers, and rapid deployment of new features for end-users. The software development field is in a constant state of flux as new technologies replace older methods. Responding directly to the needs of the market at any given time, these new technologies may improve upon older technologies, or they may offer a completely different approach to the development process.

Infrastructure as Code (IAC) is a relatively new approach to infrastructure automation that is more extensive and thorough than the more common, scripting-based procedural method. If you want to get the maximum mileage out of your investment in cloud, you need to have an automated provisioning process around your infrastructure. Companies that use IAC deploy more frequently and recover faster than those that use the traditional, scripting based procedural method.

Cloud providers have launched their native SDK’s that allow you to program infrastructure in a language of your choice, AWS CDK is Amazon’s manifestation of this idea. What I love about AWS CDK is the 3 levels of abstraction it offers, L3 being the lowest level and it allows you to directly interact with the infrastructure, L1 offers a level of abstraction and allows you to program a workflow or use case. All levels of programming can be used to contribute to one multiple stacks, that can be output into a cloud formation template or directly validated and applied to AWS with all the bells and whistles of full auditability.

It’s rather sad that many clients spend millions on cloud computing but shy away from investing in tooling that would enable them to “operate” infrastructure at scale. Cloud automation can be done using throwaway scripts but that doesn’t mean that’s the preferred way of doing it. The industry has realized that due to the complexity of cloud environments and the need for intricate orchestration of many day-to-day tasks, it is better to rely on a mature automation platform.

Puppet, Chef, Ansible & Terraform are a few systems commonly used to automate and orchestrate infrastructure and configuration management in the cloud. The basic needs when it comes to infrastructure configuration is for it to be predictable, scalable and allow for easy recovery in case of failure. These tools provide idempotency and immutability, so no matter how many times you use it, it will provide the same result. Terraform takes the centre stage as it’s not only idempotent, it’s multi-cloud capable swiss army knife of IaC tools. Terraform is completely cloud-agnostic and helps you tackle large infrastructure for complex distributed applications, for example, with more ease than working on a cloud-specific platform. Terraform also supports change and provisioning previews, plus it has a capable set of features for replicating deployments and individual server instances. Terraform then also takes it a step further with its version control and remote states which provide a centralized source of truth for remote teams working in collaboration.

Tools with endless possibilities are usually hard to master, the case with Terraform is no different. I still think Terraform is a fantastic tool once you get to know it in further details, but the learning curve can be very steep, especially if you don’t have a good understanding of how the underlying provider works. If you don’t understand how AWS works, Terraform will not make your life easier. Indeed, it might make it worse, because you’ll have to deal with both AWS and Terraform quirks.

AWS has regional and global services. For instance, EC2 is regional — which means that an auto-scaling group in North Virginia has nothing to do with one in São Paulo. Therefore, they can have the same name (which are ASG identifiers). IAM, in its turn, is global, which means that when you define a role, it can be used anywhere. Then there is S3. S3 is a hybrid: while it has regional scope, its namespace is global, which means you can’t have buckets with the same name, even across different regions. Terraform isn’t going to help you with that sort of things. When you run terraform plan, it will tell you it’s all ok. Then you will happily proceed to terraform apply and instead of running smoothly as you’d expect, it’s going to blow up in your face.

What does it take to actually move a big operation from on-premise to cloud?

JD: Tell us about a particularly interesting project you’ve worked on.

Oh! That’s a difficult question. My immediate reaction is, It’s hard to pick just one, every project comes with its challenges and phases of complexity, which is where in my opinion the interesting bits lie. If I did just have to pick one, I would say I enjoyed leading the cloud transformation journey for Centrica. Clouds can be fluffy and fun, or dark and foreboding (reports of silver linings remain, at present, unverified).  Will “The Cloud” solve your data centre infrastructure problems, or will it “lift and shift” them into a remote and confusing new paradigm? A cloud migration can be challenging, one that is at a global scale that involves PetaBytes of data and thousands of services, along with business teams that can’t tolerate any outages, systems that transact worth millions every hour and can’t ever be fully offline & wait… I didn’t even mention the people challenges that come when you move the cheese on operation teams by changing the way the operate. Showing Centrica the art of the possible and then working with Centrica and Avanade to see through the full cloud migration, full account of the story can be read on the Microsoft customer reference site.    

JD: Thank you Tarun, this has been a very fascinating discussion so far. I know you have many more practical insights into making the best of cloud services and DevOps, and we’ve already discussed some other interesting topics – we will continue in Part II of this interview.

Tarun Arora is obsessed with high-quality working software, continuous delivery, and Agile practices. He has experience managing technical programs, implementing digital strategy, and delivering quality at scale. You can follow Tarun on Twitter: @arora_tarun and on YouTube: Azure Devops TV