Sunday, August 2, 2015

When TDD Doesn't Matter

https://www.facebook.com/notes/kent-beck/when-tdd-doesnt-matter/797644973601702

A group of students apologized for "not using TDD" today. It was like they were apologizing to their dentist for not flossing--they should do it but just weren't motivated to take the trouble. In that moment I realized that TDD is small part of a large and complicated space. As long as the students mindfully chose where they wanted to be in the space, I didn't really care where they ended up.

Here's the explanation I invented. I use three dimensions to characterize getting feedback for programming decisions. I think these three are the most important ones to consider, plus visualizing in three dimensions is (just barely) possible for me. Here they are:



One dimension to choose when seeking feedback is how much scope you are interested in. If you try to take in absolutely every potential effect of a programming decision, you would have to analyze its global economic and social impact. Scope trades off between the tactical utility of feedback and leverage. Knowing you have a syntax error means you definitely need to make changes, knowing that there might be adverse effects to society as a whole because of a decision doesn't help minute-to-minute even if it's very important.



The second dimension characterizing programming feedback is how clear you want to be about the consequences of your decisions. Sometimes you just want to wait and see what happens, sometimes you want to predict up front exactly what you are going to see and if you see anything different at all you want to know about it unambiguously. Clarity trades off between the effort needed to specify expectations and the information produced when you have actual data to compare to the expectations.

Scope x Clarity



For TDD, I chose to get feedback at the next level above the compiler. The compiler tells you whether it's worthwhile trying to run a program. The tests (potentially) provide much more feedback about whether a program is "good". A secondary effect of writing tests is that you can double-check my programming work. If you get to the same answer two different ways, you can have a fair amount of confidence that the code behaves as expected. Feedback with larger scope takes more work to gather, so test-level feedback is a reasonable tradeoff.

On the clarity axis, for TDD I chose the full monty--binary feedback, red or green, good or trash. Not all feedback can be cast in binary terms. Reduced engagement in one part of a program can be compensated by increased engagement in another part. Analyzing subtle tradeoffs take time, though, and I was looking for feedback that wouldn't require much effort to analyze, to avoid interrupting programming flow. The combination of these two choices--complete clarity and test-level scope--defines what is conventionally called unit and integration tests:





The final dimension along which to choose feedback is frequency. Fortunately this can measured in a tidy linear way (even if the effects of delay are non-linear) from years to seconds (and milliseconds if you listen to Gary Bernhardt, which I do).

TDD = Frequency:seconds * Scope:tests * Clarity:binary



TDD is a little box in this big space of feedback, the combination of expressing binary expectations, expressing them as tests, and receiving feedback every few seconds about whether those expectations are being met. This seems to me to be a sweet spot in the feedback space. You don't get perfect information, but you get pretty good information at a reasonable cost and quickly enough that you can a) fix problems quickly and b) learn not to make the same mistake again.

Consequences

Expressing feedback as a space suggests experiments. What if you relaxed the frequency dimension? Would the additional scope you could cover provide enough value to make up for the reduced timeliness? What new mistakes could you catch if you reduced clarity and increased scope? How frequently should such feedback be considered? If you increase the modularity of the system, how much could you increase frequency? How long would it take for the investment in design be paid for (if ever)?

The feedback space illustrates one of the dangers of feedback, putting the response loop inside the measurement loop. This happens when businesses manage quarter-to-quarter even though the effects of decisions aren't known for years. Technically, if you got feedback about server load every four minutes, you would be nuts to change server configuration every minute. You're just going to cause oscillation. We can label parts of the feedback space "here abide dragons".

Now I can explain when TDD doesn't matter. When the students understand the shape of feedback space, when they understand the tradeoffs involved in moving along each dimension, and when they understand the interaction of the dimensions, then I don't care if they "do" TDD or not. I'd rather focus on teaching them the principles than policing whether they are aping one particular ritual. That's when TDD doesn't matter.


No comments:

Post a Comment