Managing Complexity

Speakers: 

Drupal 8 is big. Really big. The code base is much cleaner and more maintainable than in previous versions, but it's still big. And big complex code bases are hard to manage.

We're going to have to manage this system for some time, though; not just for Drupal 8, but in the future Drupal 9 and beyond. How can we ensure that the core team scales? Not just committers, but our social architecture itself?

There are a couple of things we can do, both technical and social, drawing on the experience of other large projects. This session will lay out the problem space of maintaining a big system, then offer some suggestions for how to manage complexity going forward (including how to reduce it).

This talk will cover both Drupal 8's maintenance lifecycle, as well as ways to plan ahead for Drupal 9.

Resources:

Slides here: http://www.palantir.net/presentations/dcamsterdam2014-managing-complexity/

Schedule info
Track: 
Core Conversations
Experience level: 
Intermediate
Drupal Version: 
Drupal 8.x
Time slot: 
Wednesday · 17:00-18:00
Room: 
G107 · Pantheon

Comments

Crell’s picture

I agree. Feel free to share/tweet/FB/whatever to get people to watch the video and, especially, read the links.

There's a lot to respond to here, unfortunately amongst the grains of truth I think there's a lot of misdirection and obfuscation. xjm's questions/comments after the talk handled that quite well.

On the specific question of authority and structure, two points:

Firstly, you referenced the "The Tyranny of Structurelessness". While that work is seminal, more recent pamphlets have paired it with Cathy Devine's response, "The Tyranny of Tyranny". http://www.libcom.org/library/tyranny-of-tyranny-cathy-levine

While many of the political groups of the '60s and '70s (and since) with no formal structure ended up with very entrenched informal leadership, there were also formal groups that had very entrenched and completely undemocratic official leadership. Neither is a good model.

Structure means exactly that, it does not require a strict and immovable division of labour, nor does it require formal hierarchy. In other words much of the structure that is missing from Drupal core is project management (identifying scope, work plans, communication between teams etc.) - that role is really done by a few core generalists, branch maintainers, xjm and some, but not all, self-organised subsystem teams. Bob the chair mover in your talk might have a degree in events management (or a natural talent for it previously undiscovered) for all we know - plenty of people with low wage and menial jobs are highly educated but get shut out from decision making.

Also on the question of authority, there is more than one kind of authority, this quote sums it up pretty well for me:

Does it follow that I reject all authority? Far from me such a thought. In the matter of boots, I refer to the authority of the bootmaker; concerning houses, canals, or railroads, I consult that of the architect or the engineer. For such or such special knowledge I apply to such or such a savant. But I allow neither the bootmaker nor the architect nor savant to impose his authority upon me. I listen to them freely and with all the respect merited by their intelligence, their character, their knowledge, reserving always my incontestable right of criticism and censure. I do not content myself with consulting a single authority in any special branch; I consult several; I compare their opinions, and choose that which seems to me the soundest. But I recognise no infallible authority, even in special questions; consequently, whatever respect I may have for the honesty and the sincerity of such or such individual, I have no absolute faith in any person. Such a faith would be fatal to my reason, to my liberty, and even to the success of my undertakings; it would immediately transform me into a stupid slave, an instrument of the will and interests of others.

(Bakunin, 1871)

I would like to hear catch's and xjm's views on the practical level.

xjm understandably took exception to the 'secretary' comparison, which is an (arguably offensive) exaggeration used to point up a valid concern about the distribution of power and responsibility. catch is here questioning the theoretical part of the talk. Speaking as a 'drive-by' contributor who has not contributed a single patch (though I have contributed), I have the greatest respect for the contribution of time and skills which key contributors have made. A 'do-ocracy' is natural, and even happens in national politics. However, I look in at core work from the outside (it feels that way anyway) and think that the processes are more laborious than they need to be, discouragingly so (i.e. it discourages me from doing more). A more directed approach might work better. Of course Dries gives some big-picture direction which is valuable.

I get the feeling that xjm and catch might disagree with Larry's underlying message, but as I hear and read it, whilst both have picked up some features of the talk they did not like, neither has said that they directly that they fundamentally disagree with the need for a more 'vertical' technical direction, or if they do disagree, whether their view is that broadly speaking nothing needs to change.

One of the reasons the Drupal community is so special is that Dries is an open, consensual and encouraging type of leader. Were he not, things might have simply fallen apart long ago, as happened to some other CMSs. It does seem to me that given the size of the community, of the codebase, and of Drupal's commercial importance, the more hierarchical leadership structure which Crell is calling for can only be good for Drupal at this stage. Whilst xjm and catch have some major and valid reservations about what Crell said, I have not yet heard anyone expressly disagree with the 'core' message that a more vertical structure (giving more power to branch maintainers being just one possible way of implementing it) would, in spite of its downsides, be on balance good for Drupal.

xjm understandably took exception to the 'secretary' comparison, which is an (arguably offensive) exaggeration used to point up a valid concern about the distribution of power and responsibility.

I also took visceral exception to that, I just didn't have anything to add beyond xjm's comment that wouldn't be repeating it.

Also if you listen to what xjm says in detail, she debunks the idea that subsystem maintainers don't have any authority. That doesn't mean they have absolute power over their subsystems, it's just a recognition that authority and power are two different things which come from different sources. This was completely missing from the talk. I don't think it's just an exaggeration, it actually misrepresents what happens, or restricts the available data to a small subset of where things go wrong.

That was my fundamental problem with this presentation. While there are problems with core development, this talk starts out with the wrong premises on what exactly those problems are or where they stem from, and so arrives at bad conclusions too. I spent some time to refute the database patch as an example of subsystem maintainer authority being undermined, because that's such a spectacularly bad example it should raise serious flags with many of the other statements made. The talk only discussed conflicts between 'vested authority' vs. 'power of those with lots of time'. It completely glossed over 'natural authority' (see Bakunin quote above for the distinction) which is what core actually relies on at its best.

Things I think are actual problems in core:

- Core development relies on massive investment of volunteer resources from a few dozen people - while those few dozen people are fluid between releases, there has to be that large group of people committed over a long period of time to have any kind of institutional memory. People give up paid, family and sleep time, often not due to fun contributing but because of a sense of duty. This leads to burnout.

- Some of the less-frequented regions of core completely lose institutional memory if one or two people leave and people have to figure it out almost from scratch again (and sometimes people just don't know who to talk to so do that unnecessarily). If those areas aren't essential, we should be prepared to drop them, if not they need funding directed towards them (the render system has ended up getting attention from the OCTO team as part of resolving performance issues this release, because for example people like me who would otherwise work on it are, erm, a bit busy). I've been keeping a very close eye on those issues though.

- There are sometimes patches worked on in isolation from other movements in core, and they end up getting left behind/conflicting. These aren't disagreements as such, but incompatible approaches resulting from lack of communication. This ranges from just straight duplicate bug reports to people trying to refactor APIs to improve DX that are being refactored out of existence altogether elsewhere. Or sometimes people working on things in isolation without knowledge of how they fit into the wider application and missing vital details.

- When there are massive disagreements between contributors, or groups of contributors, or if someone is just behaving poorly in the issue queue on a frequent basis, we don't have great processes for dealing with this. We also don't distinguish well between technical disagreements and personal fallouts. This also leads to burnout. Also happens in contrib, but contrib has a lot more forks.

- The very, very long release cycles for major versions.

None of this is solved by giving vested authority to subsystem maintainers though.

If I look at what was missing from the talk in terms of successes, or problems that don't stem from lack of authority over components:

1. Initiatives (in the colloquial sense, whether official or not) that were worked on by self-managed groups of people relating to components. If you look at entity/field they have successfully built up institutional memory and natural authority within those areas. People naturally defer to decisions taken by those groups, or consult them when there is crossover. CMI ended up like this after a very rocky start. I think the rocky start could have been avoided if it has started out as a group rather than led by one individual (with absolutely no disrespect to heyrocker, the problem there was being one person, not which person it was).

A group being self-managed is very different to one person directing a group of people in a hierarchy. Also entities/fields in particular have a long list of dependencies (CMI, plugins etc.) and 'upstream' dependents (node/user/comment module, field types), and successfully worked to uncover and solve problems in both directions - as did the teams working in the downstream/upstream areas. Decoupling components doesn't get rid of that need - we still have dependencies.

2. Cross-subsystem initiatives like performance/scaling, accessibility, mobile, D8MI. These areas have to get involved with lots and lots of different parts of core, to ensure coherence/parity across subsystems. This kind of work (which happens to be the kinds of areas I've tended to work on as a patch author) is exactly why I get very wary when people want component maintainers to have absolute authority over their components. Performance/scaling/multilingual are exactly the kinds of things that get forgotten at the component level, and then end up falling through the cracks at the application level.

Multilingual support in CMI is a good/bad example of an issue where everyone agreed on the needs at the application level, but disagreed whether support should be within the component or not (valid technical disagreement, not turf wars). We ended up with different versions at different times, but eventually with language baked in - and this was done partly due to performance concerns with the complexity of the mechanism for adding language overrides from outside the component - which impacted CMI performance in general.

If all of the 'performance' team and the 'CMI' team, and the D8MI have vested authority in their areas, then if there's a hard to solve/controversial performance + multilingual issue in CMI which group has authority? The answer should not be "you have to do it like this because I'm in charge" it should be "I think we should be doing like this because x, y and z".

What's important in those cases is:

- people who can straddle multiple sub-systems and have at least an overview plus some in-depth knowledge of each, to be able to communicate/balance requirements.

- requirements gathering before and during work, so that important functionality doesn't get forgotten/ignored

- co-ordination between teams that have mutual respect for each other so they avoid duplicating/conflicting work where possible

3. Core committers have been trying, especially about the very wide overarching management of the release as a whole, to distil institutional memory into actual documentation (not very successfully always, but the attempt is there). https://www.drupal.org/node/2350615, https://www.drupal.org/node/2341575, https://www.drupal.org/node/2135189 are all examples where we're trying to put in a very clear structure within which issues can be worked on. This is very different from individually saying yes you can, no you can't to individual issues (in fact I'm specifically trying not to do that arbitrarily, but put these frameworks into place first).

These are primarily communication/facilitation/structure issues, not lack of authority. The need there, and it's something we definitely need more of, is facilitation and communication, not individuals with vested authority telling others what to do.

If someone's behaving very, very badly in the issue queues, then that's an issue for the community more generally. It's not something you fix by giving people official power to ignore them, and it's also something that should never be conflated with genuine technical disagreement. Technical disagreement happens all the time and is usually healthy.

Things I think we need to do more of:

1. Properly take advantage of the new release cycle https://www.drupal.org/node/2135189 - we're just starting to see that with things like migrate being decoupled from the 8.0.0, beta/upgrade path policies etc. This will involve very high level decisions about what areas of work will be encouraged at various points in the release cycle, but it's different from direction on actual implementation.

2. More people doing the kind of work that xjm and YesCT do (very high level management/analysis of the release, mentoring of new contributors, cross-subsystem communication, this is different from directing work).

3. Tackle bad behaviour on Drupal.org, regardless of the position/reputation of the person perpetuating it. Note this is much harder to do if someone is in an official position of authority than if they aren't.

4. Sort out funding so that people aren't self-sacrificing/putting in what is essentially unpaid overtime to get the release out, this should get easier once the new release cycle actually kicks in.

Thanks for setting out you views so directly.

As a 'drive-by' contributor, the barriers to contributing are pretty high. I leave aside unavoidable issues, like setting up a local environment with drush 7, setting up a local test bot, working out IRC, wondering whether I need to put my hand in my pocket for PHPStorm or a ticket + hotel room for Drupal con, and so on (DrupalCon is fantastic value but still a big investment for a self-funding freelancer). The more intrinsic barriers include the sheer amount of discussion of issues, which is probably ultimately good for quality but must burn a lot of time (even to read). Then there is the feeling when looking at a patch, 'at some level I can review this, or reroll that, but without a deep knowledge of core I do not know what I do not know, so my review may be worth little' (related to the problem of inter-dependencies in the architecture). My fantasy is that having a 'manager' figure would lower the barriers for new contributors. Of course, no change is worth spoiling the good community spirit which runs through Drupal.

Have you looked at core mentoring? https://www.drupal.org/core-office-hours

That's designed to help resolve exactly the challenges for new/occasional contributors you outlined.

Similarly the "novice" tag: https://www.drupal.org/novice (which really means "issue that hopefully won't require 300 comments to get solved", it's not related to ability but more familiarity and issues that shouldn't be impacted too much by changes elsewhere).

Then there's the issue summary template that tries to enforce that there's a scannable summary of the changes an issue makes, without having to read through comments from 2007 each time to figure out the current situation.

All of these add structure for new/occasional contributors. Mentoring (whether on-line or at sprints) has an element of management: matching people with tasks, leading them through the process of getting that task completed. Once again, no vested authority necessary for these things to be in place.

One thing we're missing, and which core office hours at in its very earliest incarnation was hoping to help with, is a team of people that concentrate on triaging the queues. A lot of that work gets done by people like me and xjm, who are also doing plenty of other things, and it's a hard thing to introduce people to. For example many issues in the 8.x major queues are either not major, no longer relevant, and some might be critical issues that were filed wrongly. This is skilled work that requires a very broad and up-to-date knowledge of core (i.e. X issue is irrelevant because Y issue got fixed six months ago in a different component) but there's definitely plenty more people who could do it, and it directly leads to reducing the amount of duplicate/unnecessary work that gets done.

https://www.drupal.org/node/2297993 was brought up as an example of a subsystem maintainer not being consulted on a change to their subsystem. Additionally it was claimed that the change 'complected' the subsystem in a negative way.

In my view, this issue was an extremely bad example to pick if wanting to put across the idea that subsytem maintainers get ignored or undermined. Because in this case that is demonstrably false.

Let's examine what actually happened in that issue, and the ones that previously touched this code, then people can decide for themselves.

First of all https://www.drupal.org/node/2297993 had been open for three months, as a critical bug, against the database system. More or less the same patch that was committed to core was posted to the issue two months before the commit. So there was plenty of time for people interested in the database system, or critical Drupal 8 bugs, to review the patch before it was committed.

Not all issues have that kind of lifespan for a one-liner, this one did.

The Unicode::truncate() call was added to fix a regression in previously working code. It was also identified as a regression soon after the patch was posted.

The regression was introduced in https://www.drupal.org/node/1167144 when Unicode::truncate()'s predecessor truncate_utf8()) was changed to subtr(). Crell, the database subsystem maintainer, posted many, many times on that issue and RTBCed one of the patches. None of the participants there noticed the regression being introduced and there were no tests to catch it. These things happen.

If we look at where the truncate_utf8() call was originally added to the database system, git blame says it was added in https://www.drupal.org/node/12201. Crell, the database subsystem maintainer, also reviewed this issue. This was a long time ago. The code lived more or less untouched in core for six years until it was unintentionally broken.

Of all the cases where you'd consider consulting a subsystem maintainer on an RTBC patch, critical one-line reverts of unintentionally introduced regressions (that they reviewed), that have had a patch posted for three months in the correct component are probably the least likely.

At no point in Drupal's history has there been working code in the MySQL driver that did not have a dependency on Drupal's unicode helpers. Reverting a regression, with tests added that were never previously there, is not breaking a design principle or anyone's hard work. No-one to my knowledge has ever submitted a core patch that does unicode safe truncation in the MySQL driver without a dependency on the unicode library.

In addition to not having consulted the subsystem maintainer, Crell also said in his talk that he'd not have signed off on the patch because it 'complected' the database system.

First of all, if you look at Sam's original keeping it simple talk. Complection isn't a synonym for any kind of complexity, nor is complection necessarily bad.

If something is complected, from the etymological point of view it means that it can be broken down into clearly delineated constituent parts - a coupled thing, made of things which can be decoupled. Like a train and carriages for example. Or the train and the railway.

So the ability to decomplect something is great.

Something being complected may or may not be bad.

Complection then, is a useful concept to understand good and bad kinds of complexity. Using it as a dirty word makes it considerably less useful.

Where Drupal has suffered from complexity has usually been in circular dependencies between subsystems - this is really entanglement, more than complection.

The Unicode call in the MySQL driver added a dependency on Drupal's unicode library (which if they were separate projects on github could be represented in composer.json with no issues). A dependency, even a static method call, isn't necessarily bad. In this case the only way to avoid it would be to examine Unicode::truncate() and copy paste some of the code into the database layer - may or may not be an improvement. The patch, or the issue, did not make anything worse - in fact it added tests that will ensure that such a later refactoring to remove the dependency won't break things again.

Actual link to Sam's talk, which was titled 'Stomp complexity', the blog post was 'Keeping it simple'.