We all know that when the good people at Gartner highlight a topic as a strategic technology trend for the coming year, everyone will pay more attention – especially within the enterprise ecosystem. The topic of platform engineering feels no different. I first commented on this emerging trend two years ago at KubeCon EU, and now I want to share my most recent thoughts, given that this approach to building platforms is rapidly “crossing the chasm” into the enterprise.
Exploring Gartner’s model of platform engineering
The Gartner article “What is Platform Engineering” provides a great overview of the motivations behind adopting this approach to building platforms.
“Platform engineering improves developer experience and productivity by providing self-service capabilities with automated infrastructure operations. It is trending because of its promise to optimise the developer experience and accelerate product teams’ delivery of customer value.”
This paragraph nicely captures the focus on developer experience (“devex”) and also points to the “platforms as a product” approach required for self-service and automation. As we saw in the recent Platform Maturity Model from the CNCF TAG Application Delivery team, platform engineers and platform product managers need to work together to make these features a reality.
The Gartner model also points to three fundamental layers required for modern application delivery mentioned in this paragraph: application, platform, and infrastructure.
There’s no denying that the last decade of developments in the infrastructure as code (IaC) layer has provided new opportunities for deploying and running applications. We now have clusters, containers, networking and storage on demand. No more waiting for IT help desk ticket resolution to open up a port or provision storage (hopefully!). The approach to “plan, build (apply), and maintain” is relatively well accepted.
The application layer has evolved in tandem with infrastructure. We have a range of architectural choices that provide clearer tradeoffs between nonfunctional requirements such as maintainability, replaceability, scalability, etc. Developers can “code, ship, and run” with microservices, function-as-a-service, and even the well-structured monolith (or “modulith”).
But what about the platform layer?
Looking a little deeper into the Gartner model, it appears they primarily see this layer as offering “X as a Service” and a developer portal as abstractions over the “digital platform”. There are clearly elements of designing reusable components, enabling product teams through tools and platform service, and optimising the developer experience through the sharing (and codification) of knowledge.
At first glance, you could look at this diagram and argue that not much has changed here since the original platform as a service (PaaS) offerings, such as Heroku, Cloud Foundry and OpenShift. The reality, however, is different. In 2024, there is a lot of pressure on product teams to deliver value to customers in a manner that is faster, safer, and more scalable than ten years ago. The market for software solutions is larger, and the expectations are greater.
Many principles from the Heroku-inspired Twelve-Factor App are now “necessary but not sufficient”. Arguably, the move to microservices was to enable “agile architecture” and bridge the gap, but this, in turn, presented a series of operational challenges.
Combining this evolution in architecture with the Cambrian explosion of infrastructure and developer tooling means that modern platforms must be customised to your application architecture, workflows, and infrastructure.
In effect, everyone is building their own customised platform as a service. Indeed, many organisations are doing this, but they are struggling.
What are the goals of your platform?
Delivering any kind of product or system that spans across your organisation is very challenging. A custom platform as a service (or internal developer platform) is no different. The first thing you need to do is think about your goals.
Your platform should help your organisation:
Go faster: Platform teams need to provide “everything as a service” to help rapidly and sustainably deliver value to end-users
Decrease risk: Teams need to automate manual processes in reusable components
Increase efficiency: You need to manage and scale your digital platform and resources as a fleet
In order to meet these goals, you must consider how to design, staff, roll out, and maintain your platform. Understanding how to stage the rollout of your platform is a topic for another blog post; there are many arguments for starting at any one of the layers we’ll discuss next.
Three essential layers: Applications, platform orchestration, and infrastructure
When an organisation begins its platform engineering journey, the teams involved have to start somewhere, and typically, they focus on the application or infrastructure layer. The leadership's experience or affinity often biases this choice; e.g. people with a developer background empathise more with the application layer, and folks with an ops-heavy background tend to lean into the infrastructure layer.
We are all familiar with the application and infrastructure layers, but I believe we often merge or conflate these with the platform layer. For example, I’ve heard people say the following things recently:
“Backstage is my platform. Developers go here to spin up a new application, deploy it, and view metrics”
Asking more questions typically reveals that the portal is primarily a facade that calls a series of infrastructure APIs.
The day 2 aspects of running applications in production are often only partly considered. Updating workflows or policies connected to either the applications or platforms can be challenging
“Terraform is my platform. I can orchestrate all of my infrastructure via HCL and cron jobs, and the GitOps pipelines automatically deploy applications”
When I dig a little deeper, often, the infrastructure abstractions leak outwards towards developers. Some of us developers are happy at the command line, but many are not. Expecting developers to learn about HCL or Terraform workflows adds to the cognitive load
As the infrastructure grows (or companies inherit things via M&A), more tools, such as Bash, biceps, Crossplane, etc., often get added to the mix. These tools are very effective, but if each one leaks a small piece of their abstraction, developers are soon overwhelmed with the details when all they really want to do is code, ship, and run.
We starting to see technology stacks emerge that aim to deal with some of these issues:
We can cover these in more detail in a future blog post, but what I’m most interested in here is how these stacks handle the concept of the platform layer. They all provide some form of platform orchestration, but this is typically more application-focused, e.g. OAM uses components and traits to orchestrate applications, or infrastructure-focused, e.g. CNOE primarily deals with container orchestration and workflow orchestration (CI/CD).
I wonder if the platform layer is in danger of becoming analogous to the “wall of confusion” we saw before the DevOps movement took hold. If a Backstage rollout fails because of a lack of day 2 support, it’s tempting for developers to blame the infrastructure team for not providing enough application/platform lifecycle management functionality. If an IaC-based platform rollout fails, it’s tempting for operators to blame the developers for not codifying their policies and workflows correctly. Both of these issues relate to effectively managing the platform lifecycle.
The Thoughtworks team nicely frame the potential “missing middle” that focuses on managing the platform lifecycle in their mention of (and recommendation to assess) “platform orchestration” in their recent Technology Radar.
Mind the gap: The "missing middle" of platform orchestration
Looking again at the Gartner model of platform engineering, we can see the application, platform, and infrastructure layers. I’ve attempted to add a postfix to each layer to signal the main focus, be this choreography, orchestration, or composition.
Application choreography: This is the UI, CLI, or UX that enables developers to code, ship, and run applications. They should be able to self-serve resources and platform services and observe metrics and logs.
Platform orchestration: This layer enables platform engineers to design, enable, and optimise the platform components effectively. The goal is to reduce both technical sprawl and the team’s cognitive load and apply business processes, policies, and workflows consistently.
Infrastructure orchestration/composition: Here, the platform or infrastructure engineers plan, build, and maintain the required infrastructure resources across various environments and hosting targets.
I’ve attempted to dive a little deeper into each of these layers in the table below and am looking for feedback on this early draft:
Stay tuned for future blog posts
Over the coming weeks, I will explore each of the layers mentioned above and highlighted in the table below. I’ll aim to highlight the challenges, potential solutions, and patterns that platform engineers need to know to succeed.
In the meantime, please join the Kratix Slack and ask any questions or highlight topics you want to explore. We've recently created a "# platform-as-a-product" channel that is generating interest!