How Different Team Topologies Influence DevOps Culture
Author: Matthew Skelton DevOps is a movement. DevOps is a mindset. DevOps is devs and ops working together. DevOps is a way of organising. DevOps is continuous learning.
The intangibility of DevOps makes it hard for leaders to come up with a clear-cut roadmap for adopting DevOps in their organisation. The meaning of DevOps is highly contextual, there are no popular methodologies with prescribed practices to abide by. However, healthy organisations exhibit similar patterns of behaviour, organisation and improvement efforts.
The relationship between teams, individual motivation, and effectiveness of software delivery is something I have been interested in for several years, going back to when I was team lead on a large, multi-supplier programme of work to re-build a key software system for a major financial institution in London. Since then I have worked on several software systems for organisations with differing team configurations and team cultures, and the relationship between team configuration – topology as I call it – and organisational capability has become something of a fascination for me.
In my talk at QCon London in March 2015 on Continuous Delivery, I described how we've found through working with our clients that the choice of tooling for DevOps should really be informed by Conway's Law to produce the best outcomes for organisations. Conway's Law – brilliantly described by Rachel Laycock in the book Build Quality In – is the weirdly baffling-but-believable observation by Mel Conway in 1968 that "organisations which design systems … are constrained to produce designs which are copies of the communication structures of these organisations". Given that an IT organisation itself is a kind of system, it follows from Conway's Law that the topology of that system will be shaped by the kinds of communication that we allow or encourage to take place. There is increasing evidence that Conway's Law is hard to bypass.
Since 2013 I and a few others have been collecting and documenting different team topologies used by organisations to make DevOps work for them – see http://devopstopologies.com/ for the current catalogue of team topologies. We have found these patterns to be very useful for clients grappling with the changes to teams needed to make DevOps work, because the topologies help us to reason about the kinds of responsibilities and cultures that organisations need to have to make the different team structures work.
No single 'DevOps culture'
It has become increasingly clear to me over the past few years working with many different organisations that the idea of a single, identifiable 'DevOps culture' is misplaced. What we've seen in organisations with effective or emerging DevOps practices is a variety of cultures to support DevOps. Of course, some aspects of these different cultures are essential: blameless incident most-mortems; team autonomy; respect for other people; and the desire and opportunity to improve continuously are all key components of a healthy DevOps culture.
However, in some organisations certain teams collaborate much more than other teams, and the type and purpose of communication can be different to that in other organisations. In fact, we've sometimes seen the need for some organisations to reduce the amount of collaboration between certain teams in order to maintain a separation between logically distinct parts of the whole 'computer-human system' in place. This is driven by the need to anticipate Conway's Law (the so-called 'inverse Conway manoeuvre').
In the 20 or so organisations we've worked with over the past few years to develop DevOps practices, three team patterns or topologies stand out as being the most commonly used:
- Infrastructure as a service ('Type 3' on devopstopologies.com)
- A fully-shared responsibility ('Type 2')
- SRE team ('Type 7')
Other team topologies also work well for some organisations, but these three are the patterns we have seen most frequently. Let's now explore the differences in culture between these topologies.
Infrastructure as a Service
First, say that we know the virtual environment provisioning capability needs to be consumed 'as a Service' by application Development teams; to avoid Conway's Law from driving too much coupling between Dev teams and the infrastructure team, we have recommended that communication be limited between the Dev teams and the infrastructure team.
In this case, we anticipate that the infrastructure team is somewhat deliberately isolated from the application Dev teams and would share less with them than internally amongst themselves, at least on the level of code and tooling (we'd still want shared lunchtime pizza sessions across the teams to advocate for role rotation and learn about new approaches).
The culture here would see sharing of some infrastructure-level metrics between the infrastructure team and the Dev teams, but – as with public cloud – a limited interaction between the infrastructure providers and the Devs.
'We build it, we run it'
Conversely, for organisations with end-to-end product-aligned or KPI-aligned teams that own the whole stack (including infrastructure), we see the infrastructure people in close collaboration with application developers and testers, and rightly so: anyone on the team might get woken up at 2am due to a live service incident.
Team members with different skills thus work closely together, drawing on each other's strengths for different aspects of delivery and operation. Such a 'we build it, we run it' team would need to be independent of a neighbouring team working on an unrelated subsystem or service, probably to the extent of using different tools for application code, testing, build, or provisioning (if needed). Thinking again of Conway's Law, we set up the work environment so that the cross-team collaboration and communication required are at a minimum.
Third, for organisations that use a Site Reliability Engineering (SRE) team in the Google model (sometimes called WebOps), the culture is again different. The SRE team is willing to take on all Production responsibility (on-call, incident response, etc.) as long as stringent operational criteria are met by the software produced by the Dev team.
The collaboration with the product development team is chiefly limited to helping the Dev team meet the operational criteria and providing feedback on run-time behaviour via metrics, incident reports, etc. The Dev team is deliberately isolated from some aspects of how the software runs in Production to allow them to focus on new features, although this needs a high degree of maturity from product owners to ensure that operational criteria are properly prioritised.
The effect of team topology on DevOps culture
To the extent that in a successful DevOps-enabled organisation all teams will pull together towards a common goal, clearly there is a kind of overarching DevOps culture at work (the respect, autonomy, lack of blame, etc. mentioned earlier). However, in these three team models where DevOps flourishes – call them 'infrastructure as a service', 'we build it, we run it', and 'SRE' – the nature of the DevOps culture on the ground is somewhat different, and we are wise to understand and anticipate the different 'feel' that each of these models and cultures has. What works for one ecommerce company may not work at all for a second ecommerce company, due to differences in skills, motivation, technology, or even personalities!
Choose a team structure to fit a culture?
Some organisations are unfortunately crippled with inter-team rivalries, conflicting goals, poor leadership, and politics (most of us have worked at one or more of these places, I guess). The more perceptive of these organisations are looking to adopt practices like Continuous Delivery and DevOps in order to address their poor performance, but they typically have major difficulties in making team changes due to underlying conflicts still being in place. In one organisation we found that the project managers had a financial incentive to deliver 20 story points a month, while the IT operations team had a financial incentive to close live incident tickets within a fixed time: unsurprisingly, this led to all kinds of weird behaviours and conflicts.
For these kinds of organisations, unless there is bold leadership from the top, it makes sense to adopt a team topology for DevOps that fits most easily with the current team incentives and drivers. In effect, we get IT leaders to ask themselves "which team topology is best for my organisation given the current inter-team culture?"
For instance, if the Dev team is driven strongly by CapEx 'product' spending and the Ops team by a OpEx 'business-as-usual' budget, perhaps the SRE (Type 7) model might work: the 'contract of operability' required by the SRE team before taking on a piece of software acts as a good divide between the CapEx and OpEx worlds.
On the other hand, if a newly-recruited Head of Technology wants a 'fully embedded' DevOps model but realises that her Dev and Ops teams are too far apart in skills and focus to begin collaborating immediately, perhaps she might introduce a temporary enabling 'DevOps' team for a short period of time (12 months? 18 months?) in order to foster a culture of collaboration.
We've specifically seen organisations where the sysadmins ('Ops') have been very reluctant to adopt essential practices such as version control and test-first development of infrastructure code, and yet other companies where developers believe that monitoring & metrics or deployment was 'beneath them' and so had little interest in operational concerns. In such cases, some sort of catalyst is needed before a DevOps culture can take hold: an internal or external team can help here.
This approach of choosing a team topology to fit the current culture needs to balance a desire to achieve a new, more effective DevOps culture with a realism about where the teams currently are. Crucially, the team topology – and therefore the 'flavour' of DevOps culture – should be seen as something that evolves over time as skills, technologies, capabilities, and business needs change.
You can re-live Matthews session at IP EXPO Manchester by reviewing his slides here.
To discuss the DevOps community further, why not register for Digital Transformation EXPO Europe, Register your interest here. ______________________________________________________________________________________________________________________________________