Impressions on the Book “Tidy First? A Personal Exercise in Empirical Software Design” by Kent Beck

#softwaredevelopment #softwareengineering #refactoring #codequality

First and foremost, it is good to see a well-known figure and book writer in our field writing a new book about something not everybody strives for every day – code quality. And it comes many many years after other similar books in the area, such as Refactoring by Fowler (1st edition from 1999) and Clean Code by Robert Martin (2008). So, even though the book does not open a completely new area, it still shows practitioners the motivation to look at code quality in the small with a magnifying glass, as we (at CQSE) do.

I divided this article into two parts based on the two most interesting dimensions I found in the book. In the first part, I summarize some key points from the book that relate directly to what we (at CQSE) praise daily – small moves that make developers’ lives easier. However, it is in the second part that lies the even more interesting dimension, in my opinion. There, I summarize the temporal and economic spin Kent provides in his book. The “First?” part of the title “Tidy First?” gives a hint of what it entails – What is the right time to tidy up the code? Tidy first? Before you make behavior-change modifications? Or right after behavior modifications? Later? Or never?

Part I - The Foundation: Small Moves that Make Developers’ Lives Easier

If you are wondering what exactly “tidying” is and whether it differs from “refactoring”, let’s then clarify it. The book defines "tidying" as a small, tiny refactoring. While refactoring is defined as changes to structure that don’t modify behavior, tidyings are “the cute, fuzzy, little refactorings that nobody could possibly hate on.” Moreover, Kent observes that “‘Refactoring’ took fatal damage when folks started using it to refer to long pauses in feature development. They even eliminated the ‘don’t change behavior’ clause, so refactoring could easily break the system.” Indeed, I have heard people refer to refactoring in many different ways, matching Kent’s observation. Probably, you have heard that too. So, whether it’s going to stick or not, here we are with another terminology – “Tidying.”

And here’s a list of tidyings presented in the book. Kent doesn’t go deep in exploring them. Some of them only take half a page, which is fine, since they’re not new.

Guard Clauses
Dead Code
Move Declaration and Initialization Together
Explaining Variables
Explaining Constants
Explicit Parameters
Chunk Statements
Extract Helper
One Pile
Explaining Comments
Delete Redundant Comments

Even though they are not new, I can see some reasons why Kent deserved time and effort on that part of the book. First, it makes the book self-contained. Second, it clearly and concretely expresses what the author means by tidyings being “cute, fuzzy, little refactorings” rather than just citing what’s in Fowler’s or Martin’s books otherwise.

Now, zooming out a bit, I liked the connection built between tidyings and the duo Coupling-and-Cohesion, since pretty much any refactoring (and tidying) revolves around these two, often interconnected, design properties.

The Duo Coupling-and-Cohesion is Key

Kent refers to Ed Yourdon and Larry Constantine’s work multiple times in different places in the book. Yourdon & Constantine laid the foundation for this notion of coupling and cohesion in software design in such a pragmatic way back in the 70s.

Coupling and Cohesion are really how your brain deals with complicated systems.

Cohesion Order (and Whether to Decouple)

It is about avoiding changing widely dispersed spots in the code. Reorder the code elements you need to change so they become adjacent (not only in the same source file but across different files and folders).

Moreover, why not just eliminate the coupling that leads to those changes in widely dispersed spots? Well...First, if you know how to do it, and if you can do it, go for it!

But note: cost(decoupling) + cost(change) < cost(coupling) + cost(change)

It may not be feasible, though, for various reasons:

Decoupling can be an intellectual stretch (you don’t know how to do it).

Decoupling can be a time/money stretch (you could do it, but you can’t afford to take that time just now).

Decoupling can be a relationship stretch (the team has taken as much change as it can handle right now).

If you're an experienced engineer or a software quality expert, you might be thinking “I know... this is not new.” But I bet a lot of engineers out there don’t think straight that way for various reasons (topic for another article).

I hope I’m not sounding overly passionate about this coupling-cohesion topic 🙂 but I have to say I’m biased since I read a lot about cohesion as it was my Ph.D. topic many years ago.

Coupling and Constantine’s Equivalence

If you think from the perspective that the goal of software design is to minimize the cost of software (i.e., the cost of owning and maintaining software), then we have to understand what such a cost means. Working in this field, we all know this is not hard to make sense of. However, I really liked the way Kent explains this as “Constantine’s Equivalence” which I summarize here.

The Constantine’s Equivalence is: cost(software) ~= cost(change)

Where cost(change) follows a power law distribution. What does that mean? That means a few big outliers (very high-cost changes) matter a lot even though they are just a few outliers. If you add them up (the sum of cost of big changes), they will outweigh the far more usual changes. In other words, “the most expensive behavior changes cost, together, far more than all the least expensive behavior changes put together.” Or put another way:

cost(change) ~= cost(big changes)

And what exactly makes the cost of such big changes highly expensive? You probably know the answer – big changes are not isolated in a single cohesive element in the design. They’re often propagated across different highly coupled spots. And that boils down to:

cost(software) ~= cost(change) ~= cost(big changes) ~= coupling

Or simply cost(software) ~= coupling

However, decoupling has its own cost, too (besides, of course, there will always be some coupling degree in any software). What are the tradeoffs? It all comes down to the second dimension of the book that addresses What is the right timing for tidying (or decoupling)? Or whether it should be done at all. In short, it depends on whether it’s worth paying the cost of coupling now if there’s a chance of making a profit out of it soon, or whether it’s worth paying the cost of DEcoupling now while reaping the benefits of doing it is even higher (even if benefits come later). Let’s expand on all that in Part II.