Three Ways Conways Law Affects API Governance

[Update 2023/07/25: Several years later, I regret contributing to the cult of personality around Elon Musk. At the time, in 2016, he still seemed like a green energy champion and space innovator. The analogy I took from one of his talks and applied to APIs below still applies. However, the last several years’ worth of actions have proven him loathsome. I’m sorry for having elevated this individual in whatever small way.]

When I presented ‘3 Ways Conway’s Law Affects API Governance’ at the 2016 API Strategy Conference, I had no idea that it would become my most requested talk. Even now, a year later, I still get the occasional email asking if the talk has been posted anywhere. What had started as a handful of observations in how organizational structure was showing up in API designs had struck a cord with those in other areas.

So why haven’t I shared the deck before now? No good reason; there just always seemed to be some other more pressing need (or want). However, people persisted and made me realize that the content wasn’t just a blip in the conference milieu; rather, it was something that needed to be shared.

So, without further ado, here’s “3 Ways Conway’s Law Affects API Governance”.

Before I start my slides, I'd like to start with someone else's presentation. Recently, Elon Musk presented his plan for how to make humans an interplanetary species. It is available online and has a number of interesting ideas. What intrigued me most, however, was how Musk illustrated why going to Mars remains prohibitively expensive.

Musk argued that "full reusability" was vital to reducing the cost, thus making a trip to Mars viable.

How many of you flew on a plane to get to this conference?

Elon’s argument that if we built a 737 for one time use, a seat from LA to Las Vegas would cost $500,000. Of course, a plane is reused, lowering the price which makes air travel much more accessible for everyone.

Why is that?

Because the plane is used, day after day, year after year. The cost, prohibitively expensive when used once, is distributed over the life of the vehicle. Reusability is why a ticket from LA to Los Vegas can cost as little as $43 dollars. Reusability is a key cornerstone in Musk's plan to reduce costs to the point that going to Mars becomes viable. And reusability is a compelling feature for internal APIs.

Building a plane for one use is crazy. And yet enterprise organizations are filled with point-to-point software, one time use API interfaces that the organization pays for. Then they do that again, again, and again.

Instead of creating consistent service architecture and demonstrating service re-use, teams inadvertently produce Just a Bunch of REST Services (JBORS): a spaghetti web of One-to-One connections between providers and consumers. As a result, an enterprise may find the REST effort doesn’t improve technical or business agility, but, instead, end ups in only swapping out IT toolsets, message formats, and protocols.

The job of API governance is to identify and mitigate these problems.

Web APIs promise better business agility while, simultaneously, achieving better ROI on developer time. This is why I've seen an increasing number of APIs created within our organization. With sufficient volume, patterns begin to emerge.

Capital One, broadly speaking, has five lines of business. In the card business alone we have hundreds of sprint teams developing and deploying APIs.

Our teams are API-First. They describe their API intent in an Swagger/OpenAPI description. They then submit this for collaboration. The graph shown here are API submissions for review to my area, the API Center of Excellence, grouped by week. In the first year there were over 2600 submissions for more than 650 unique APIs.

Sure, there’s a fair amount of poor resource design or misapplication of a status code in those designs. For those API design problems, the fix is straightforward. However, when you look at this many APIs, deeper, more challenging design issues begin to emerge.

At enterprise scale, reusability can be blocked by something deeper, something more institutional, than what can be handled simply by "lexicon police".

Harvard Business Review may not be where you'd expect to seek out your software architecture insight. However, they published a fantastic study attempting to measure the duality between product and organization architectures. The conclusion was:

"[software] products tend to 'mirror' the architectures of the organizations in which they are developed. This dynamic occurs because the organization’s governance structures, problem solving routines and communication patterns constrain the space in which it searches for new solutions"

Of course, anyone that has seen a microservices presentation will have heard of this phenomenon by a different name: Conway's Law.

Informally, Conway's law has come to be known as:

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure."

This isn't new, revelatory linkbait dominating social media at the moment. Conway made these observations about software development in 1968. The reason we're still talking about it today is because it continues to illustrate a fundamental piece of human behavior.

As Conway, more formally, stated:

"Consider a large system S that the government wants to build. The government hires company X to build system S. Say company X has three engineering groups, E1, E2, and E3 that participate in the project. Conway's law suggests that it is likely that the resultant system will consist of 3 major subsystems (S1, S2, S3), each built by one of the engineering groups. More importantly, the resultant interfaces between the subsystems (S1-S2, S1-S3, etc) will reflect the quality and nature of the real-world interpersonal communications between the respective engineering groups (E1-E2, E1-E3, etc)."

Let's discuss how this behavior affects API design.

Fred Brooks, in his seminal work, The Mythical Man Month observed that the more people that were added to the project, the more likely that software project would take longer. Just adding people to a project doesn't make it go faster. In fact, the increase in the amount of communication overhead is more likely to slow a project even after the initial 'drinking from the fire hose' phase has past. Simply put, the more people, the more communication that needs to occur.

That also applies to design. The more people contributing to a design, the more diverse the number of approaches, experiences, and desired outcomes a group is likely to have. The lack of cohesion result in an API design that is difficult to use.

From Conway’s Paper:

"Ways must be found to reward design managers for keeping their organizations lean and flexible. There is need for a philosophy of system design management which is not based on the assumption that adding manpower simply adds to productivity."

Businesses need to incentivize correct bounded context creations first, then apply manpower; not the other way around.

But what does the really mean? How does a company's communication patterns inadvertently affect API production? An advantage of adopting the microservice architecture is that small capsules of functionality are independently deployable. Work can be partitioned out to separate teams.

When beginning an API interface design, the correct identification of logical bounded contexts is essential. When done incorrectly, the resource association, or lack thereof, are a signal of the communication patterns of the organization behind the scenes.

Let's start with a simple example. Suppose we need to create an API that does something with a company's users. We identify that we'll need two resources - the collection ("/users") and the instances within that collection ("/users/{userId}"). Given that we want this done ASAP, a natural mistake might be to divide the labor across two different development teams; Team A will be responsible for the endpoints against "/users", and Team B will create the design for the userId instance. Adding two development teams means that we'll have the work delivered in half the time, right? Isn't this 'division of labor' in action?

Any time you have a collections and instance resource, they strongly imply that they should be in a single API. However, I see these separated into different work units all the time. The teams are assigned to produce code by resource. Subsequently, when a you, a potential consumer of these APIs, goes to the portal site for discovery, you find multiple APIs related to a single concept - "user stuff". Moreover, if the key provisioning requires separate keys in order to access each item, you've just doubled the problem on the part of the user.

Let's suppose that the teams did deliver their contributions in the expected amount of time. Chances are, if you have two teams, you're going to get two different approaches. Take the user object on the left; this is a simple set of fields to POST to the "/users" resource to create a new instance. We have things like "name", "street", "city", etc. Simple and straightforward; no room for misunderstanding, right?

Actually, even with this simple example, there's plenty of places for discrepancies to appear. Look at the output by Team B on the left. They've taken the single "name" field and, instead, decided to represent it as two fields, "firstname" and "lastname". Rather than having a number of address elements as siblings of the other user fields, they've created an array structure. They start with "home", but the structure leaves room for alternative values, like "office" and "shipping". Even the "zip" has been modified to be "postal-code", assuming that they intend to support both US and Canadian customers.

This is a fairly trivial example. After quickly scanning the two objects, a consumer would be able to map values from one item to the other. However, imagine the kinds of internal jargon, abbreviations, and assumptions that may exist within one group that may not be obvious to another. By separating the design across multiple teams, the API consumer is forced to bridge these discrepancies, increasing their pain and sapping their productiveness.

The solution for this is to define the bounded context first. Let it determine the units of work, and then assign a single team to own that. Don't start with the number of teams available and let that drive the division of labor. Otherwise, there are guaranteed to be inconsistencies in the API nuances that are invisible to teams, but friction to integrators.

Having a single team working on a common concept means that objects used in both the creation and retrieval are guaranteed to look the same, because the team has clear communication among itself (or should). Further, when consumers go looking, what they find maps to their expected model, rather than APIs designs via division-of-labor convenience.

The second observation is a bit of a mouthful. However, how many have dealt with "not invented here" thinking? If so, you've seen this effect in action.

If you're within the same line of business, or even same geographical location your development teams will be much more inclined to reach out, build a bridge, and ask a question. The faces are familiar. Those folks are "one of us".

Different line of business? Hell, different floor in the same building? The attitude is much more likely to default to "those people speak a different language - it would be easier to just do it ourselves".

Let's return to our 'users' resource example, only now the model has been included to include an instance of a user's preferences. Team A is put in charge of the code for this feature, including the API to retrieve and update these properties. These are common things that you would find on most accounts: language preference, avatar image, and so on. That works well and is used to populate a 'settings' UI in an app or on the web. Everyone is happy when the 1.0 ships.

A little while later, Team B is working on a new feature that allows users to customize the sort orders for their search pages. They've got the ordering down to a science, but it is now time to save the user's preference somewhere. Conceptually, it should be part of the 'preferences' resource. However, nobody on Team B knows anyone from Team A. In fact, Team A is in a different location. Team B could pick up the phone, make the introductions, relate the use case for the zillionth time, and argue with Team A over their existing backlog prioritization.

Or Team B could create a new one-off resource: "/users/{userId}/preferences/sort", and avoid all that. After all, Team B knows how to create APIs - they don't need help there. And "preferences" already kinda implies a collection, of which "sort" or "sortOrder" would be one. So what's the problem?

The problem is that the lack of communication on the part of the teams will manifest as additional "chattiness" on the part of the consumers. Yes, adding "just one more" resource doesn't seem, in the moment, like that bad of thing. However, it is a slippery slope, as illustrated in this mockup on the left.

In this sample we see things like "out-of-office", manager, time zone, etc. If the previous strategy employed by Team B is allowed to proliferate, one could see how the resources would also explode into a host of one-off, fine-grained items. Imagine a consumer, like a mobile device, not only having to call each of these to populate a display, but track their dependencies on each of these.

Yuck.

The solution to this second item is identifying, and enforcing, that teams own a context, not a codebase. In our example, Team A owns the user preferences context (if not the user context itself). Any additions, updates, or removals from that context should be performed by the owning team.

If not, the conceptual debt incurred in order to convenience the API producers will be, subsequently, paid on every integration.

The 3rd Conway's Law Effect is that one's internal organization may not align to external perceptions. This can be extremely problematic when attempting to convert internal APIs to external products - things simply don’t map. Conversations are impeded and business value can't be derived, because the APIs on offer are from the perspective of internal hierarchy, rather than externally presumed functionality.

Suppose we are responsible for the APIs in a global consumer goods business called "Veridian Dynamics". There's the shampoo division and, somewhere upstate, the razors folks. Across the country lies the newly acquired "Big Pumpkin" division, responsible for the glut of seasonal products that one sees each fall- things like pumpkin spice candy, toothpaste, toilet paper, etc. They are the weaponized pumpkin division.

Let's look at our first attempt to articulate what our conglomerate does. This first mapping results in a resource design organized around the various divisions. We create three high level path concepts to begin grouping those things that are similar. Under an "api.veridian.com" domain, we put an identifier for "/shampoo", "/razor", and "/pumpkin".

Each line of business has its own set of needs, which they exposed as APIs in the appropriate area. These take the form of collection and instance resources under the appropriate areas. Shampoo has its formulas. Razors have their own innovative product strategies. So, too, does the weaponized pumpkin division.

While it might make the utmost sense in the moment, this first API resource attempt that Veridian made is problematic. Development teams have been organized around specific product divisions, and the resource produced reflect those organizational structures.

That may not be a problem if the codification of internal structure only remains available internally. But let's introduce another fictional conglomerate, Buy 'n Large, or BnL. They are a national big-box store and they are looking to carry Veridian Dynamics' products in their hundreds of stores. In order to do so, they want to integrate their purchasing and fulfillment systems with Veridian Dynamics' inventory levels.

Veridian Dynamics is ecstatic; they have APIs! The integration will be easy! The executives head off to the golf course while the developers send over the documentation for how to get their inventory levels. It takes awhile, as it has to be collected from each of the units and compiled into documentation for BnL.

Shortly after that, BnL begins to grumble. They want inventories, they should be able to just call for Veridian Dynamics inventories. However, because Veridian has organized its resources by organization division, BnL has to make multiple calls.

Further, because each division was allowed to define its own approach, none of the calls behave the same. If one wants to retrieve inventories from the shampoo division, one would call the shampoo inventory collection. That seems straightforward until BnL's engineers try to do the same thing for razors and find (surprise!) that they need a product identifier before being able to get the stock; they're now in the business of keeping lists of razor productIds, or polling other APIs to make sure they have the latest information. And the pumpkin division? The division across the country that was recently acquired? That API is more than a little different: a caller not only needs to know the 'stock keeping unit' (or sku) but have an understanding of that division's regional warehousing. What should have been a simple task for an external entity becomes a prolonged and ongoing conversation about Veridian's internal organization.

And that is only the immediate problem. With an API resource design aligned to the organization, what happens when new initiatives cross those boundaries? For example, what if the shampoo and pumpkin divisions join forces to create pumpkin-spiced shampoo? Under the previous model, does that go under the 'pumpkin' or 'shampoo' top-level paths? How are external entities going to know where to find the self-lubricating razor product that comes out of a shampoo and razor team-up? Where do we send the congressional inquiry when Veridian announces its razor pumpkin home defense product?

Ultimately, if there is an external perception of a central concept, then the APIs - at least the external facing ones - should express that concept. In our BnL example, if they perceive Veridian's products as a single entity, then the resource design should reflect that. If being able to call for the inventories of individual products is important, we could create an API endpoint of a GET to '/products/{productId}/inventories'. If BnL still wanted all inventories, we could maintain path hierarchy by pass a wildcard, in this case the tilde ('~'), for the productId. There are several options. The point is, there API design would be aligned to the external expectation of the business function, not the internal organizational chart.

#TFW Devs Realize These Problems Can't be Fixed with Tools, but by Changing People

If we continue to treat API design is something that is just for developers, then developers will attempt to fix the problems with the developer tools on hand. But, as Conway’s Law implies, no amount of automation or framework selection will solve this problem.

I've demonstrated three ways in which API Design is effected by the communication patterns in a company. But how do you change those patterns?

Ultimately, changing the communication patterns means changing the culture. That may sound daunting. But there are common sense, incremental ways of getting started.

The first step is recognizing that there may be a problem. Viewing API design through the lens of Conway's law allows us to find spots where an API's design may be suboptimal due to organizational factors. But once we've identified that there is a problem, what do we do about it?

Blue Ocean Strategy was written by W. Chan Kim and Renee Mauborgne in 2005. In it the authors articulated the challenges and possible approaches for creating meaningful organizational change. Much of the latter portion of the book includes impactful guidance for making culture change possible.

The book identifies for challenges to cultural change. The first is cognitive challenges. Going back to our Veridian and BnL example, the engineers that designed the first set of resources may not realize how difficult those APIs are for an external entity.

The second challenge is motivation. Once the inconsistencies is brought to the Veridian engineers' attention, they may understand why their initial design is suboptimal, but have very little financial or intellectual motivation for making a change. There may be a lack of urgency:

"It's a pain, sure. But that is job security for their guys, am I right?"

The incentives here may be intrinsic (why aren't we the API Twilio or SendGrid of retail giants?) or extrinsic (we have quantified the amount of lost sales the integration headaches are costing our business). Either way understanding there is a problem and wanting to take action are two different things.

The next challenge to cultural change are resources (or the lack thereof). Suppose that we've shown Veridian's engineers the problem. And they are motivated to solve it because they see how the current design is limiting new initiatives. However, they may question where the budget for this new centralized, product API comes from. Or state that redesigning the API needs to be put on a backlog, the prioritization to be fought over at the next planning iteration.

The final challenge may be the most difficult: the institutional politics. Teams within each one of Veridian's divisions may acknowledge there is a problem (but it is someone else's problem). They are motivated to change (if that change supports their already existing initiatives). They might be committed to helping the company get more business (as long as it doesn't come at the expense to what the divisions are doing).

The politics don't go away because we're talking about technology. If anything, they get more complex.

(Update: 2020-03-25) Since publishing, I have read a number of additional books unpacking digital transformation and shifting corporate behavior. Two of the best are Agendashift, by Mike Burrows, and Switch, by Chip and Dan Heath.

That sounds daunting, because it is. Positive culture change within a company can be one of the most difficult professional things to do. However, not all is lost. There are ways of seeding change that don't involve wheelbarrows full of money or a vice president title.

To begin, start with disproportionate influencers. You know these people - the ones who are incredibly plugged in, always seem to get the regular raises, and who leadership turns to in meetings. Every company has these stars. Once these people are identified, determine how your agenda compliments, or even furthers theirs. Appealing to their better nature or a sense of duty will only get you so far. But if it can be demonstrated how your course of action actually gets them what they want, and things will begin to happen.

Once the disproportionate influencers are onboard, shine a light on their accomplishments. To change a culture, you have to paint a picture of what the destination will look like. Not only does that reward the influencer for their alliance, but it communicates to the rest of the org what behavior gets rewarded.

One mistake that wordy people, like myself, make is assuming that a single presentation, or a well-reasoned argument, is all it takes. This may appeal to people's logic. However, action rarely takes root until the audience feels the pain of a certain situation. The realities of business-as-usual (BAU) don't become real until individuals are living it.

Finally, resources should be redistributed from "coldspots" (high need but little impact) to "hotspots" (big impact for little investment). Can ongoing, in-person training be recorded and provided in self-service fashion? Constantly evaluating where time is being spent, and the impacts of that time, is critical for any effort.

At this point, I've hopefully made the argument that an organization's structure can adversely affect API design. Any API design culture needs to:

  • Incentivize correct bounded context creation first, then apply manpower
  • Overcome resistance to reuse inherent in the org chart
  • Align bounded context for external APIs with external expectations

Thank you for your attention.


Update: