The Refactoring Workout: November 2008

Tuesday, November 25, 2008

The Golden Rules for Software Refactoring

How do we actually go about refactoring? Below are the Golden Rules.

One change at a time
Small steps
Test after each step
Frequent builds

These four steps are the key principles which you should keep in mind when doing any refactoring. If you remember nothing else about the technical aspects of refactoring, this is the list to remember. It is the mantra which during refactoring you should continually chant to yourself. It should be one of the post-its pasted to the side of your computer screen. Tie them to your hands as a reminder; write them on the doorposts of your house; wear them on your forehead....OK, now I’m getting a bit carried away.

But they are pretty important stuff. They are the key to good refactoring.
We are only going to make one change at a time, only one clearly defined improvement to the code. This single change we are going to try and do in small steps, often the smallest steps possible. Then we are going to test. If the test goes all right, then we are going to make another single change, again in small steps, and then we are going to test again.

In future posts I plan to look at each of these in more detail.

See my book The Refactoring Workout for more on refactoring.

Tuesday, November 18, 2008

Introduction to causes of software spaghetti

When talking about software refactoring, it's worth stepping back and taking a moment to think about why we need to be worried about our current software processes.

Assuming you have been in programming for any length of time – either directly working on the code or more indirectly in some sort of project management – you will have learnt or developed a certain notion of how you would really like your software to be.

You will want it to be really well structured. Well designed. Well written. Modular. It will have proper separation of each of the different layers of the logic; business logic, application logic, presentation layer. It should be easy to change the software. It should be efficient. Easy to understand, well-documented. In brief, it should ideally be a masterpiece of modern software engineering, an exemplary demonstration of good software. It will demonstrate an internal structure and quality that simply has ‘professional’ written all over it.

And yet, if you have been in programming for a little bit longer, you will know that, somehow, almost for reasons you can’t quite put your finger on, software gradually goes off the rails. No matter how hard you seem to try, your software projects slowly but surely lose this dreamed of structure and clarity and become more and more the spaghetti code which you want to avoid.

Why does this happen? Is there any way to avoid it happening in the future?

These are vital questions.

Maybe you have always inherited existing projects, and they were already in a bad way when you took them on. At least there you can take no blame for the poor state that they are in now, and convince yourself that if only you’d been on the case, since the beginning, they would still be in that really good state which you would like.

Is this realistic, or would it have still gotten in the mess even with you at the helm from the beginning? And what about this software which seems to be already in a poor state, is there any hope for it, or is it really just a question of waiting impatiently for the day where it is rewritten or no longer needed? And even then, if it were ever rewritten, would it really be able to avoid the mistakes of the past? Or are there some negative factors in software development which just cannot be tamed?

You will guess perhaps that my answer to these questions is that there are a number of reasons why software almost always gets in a bad state. In this chapter I really want to take a look at those reasons. I want to examine why software tends almost inexorably to deteriorate. I want to have a look at the causes that push software in a negative direction, and why in many ways we should remain relatively pessimistic about being able to fully control these negative causes.

There are a whole range of causes which work against this dream of beautifully structured, well designed, flexible, well written code. Each one is perfectly natural, and generally unavoidable. Note, I said that these causes are largely unavoidable – there’s very little you can do about them. Indeed, in most software development, they are inevitable.

And each one has a detrimental effect on code quality, on code simplicity, on code flexibility. Each one pushes the code to become more and more disorganised. Each one pushes the code to become more and more complex. And each one pushes the code to be more and more difficult to maintain and enhance.

See my book The Refactoring Workout for more on refactoring.

Tuesday, November 11, 2008

Refactoring - Room of Files analogy summary

In my previous posts I have compared software refactoring to sorting out a Room of Files.

You may think in this example I have been just doing trickery with the numbers. It is true that in practise, with refactoring, the time overall will take longer. Nonetheless, even when the total time for things done with refactoring is longer, there will still be many benefits, both tangible and intangible:

The actual time spent on any given file-finding task is objectively reduced. This corresponds to an actual reduction in the time programmers spend on any given programming task (when not counting any new refactoring).
It is easier for people to do the required work. It’s a lot easier for a programmer to work on a program which is in a good condition than one which is in a poor condition.
When files have to be found with a degree of urgency and speed then the room will be ready. This corresponds both to the need to have programmers fix critical bugs quickly, but also to develop new features to a tight deadline.
The chance for duplication and errors are reduced when everything is well-ordered. One of the causes of errors in software is when things are in several places instead of just one; refactoring removes this duplication and such errors naturally get reduced or eliminated.
People’s morale will be much higher, as a result of contributing to a tidy and well organised room. Programmers will be much happier working on a program which they are really trying to make a quality product internally, rather than something which is just a hack.
People’s morale will be higher, as work will feel fast and efficient, because it is.

So, what are the important messages from this room of files example?

Refactoring is not something you typically do in one big blitz, but rather something you integrate as a part of normal work.
When considering the cost and benefits of a refactoring, the costs always appear at the time of the refactoring, and the benefits always appear after the refactoring.
Even though on the detail level, refactoring takes more time, when looked at from the wider perspective, it can actually save time. Even when there is no overall saving of time, there are other benefits which still make refactoring worthwhile.
There are many benefits including: efficiency of working, reduced bugs, faster when needed, improved morale.

See my book The Refactoring Workout for more on refactoring.

Tuesday, November 4, 2008

Refactoring - Room of files analogy (worked example)

In my last post I introduced the idea that software refactoring could be thought of as rather like dealing with a room of files.

Let’s take the Room of Files and work through how that gets tidied up (the full figures are shown in the table below). It is certainly true that this is only an illustration, and the figures have been chosen to come out as particularly favourable to the refactoring side of the story – that I will admit. However, I think as I run through this example with you, you will find it very instructive, because, though we might want to argue about the precise figures, many of the principles are much more incontrovertible. In any case, the likely benefits of any particular refactoring will always have to be judged on its own merits.

Let’s say, for arguments sake, that before the room is tidied it typically takes a day to find a file. Sometimes, when you get lucky, it takes only half-an-hour, sometimes as much as two days, but on average one day. So, one day is our current benchmark for finding files.

When we first go and hunt for a file, rather than spend a day looking for a file, we start by spending two days tidying and organising (=refactoring) the room. Then, we actually do the directly productive work which we set out to do – finding the file. Of course, the room is now already in a much better state than it was before we tidied it, so it might take us, say, 6 hours rather than the typical eight to find it. Nonetheless, rather than take just 8 hours to find the file, we have taken a total of 22 hours.

The next time we go and look for a file, rather than spend time directly looking, we first spend (say) another 6 hours tidying and organising the room. By now the room is really starting to improve. When we have finished tidying, then we actually do the directly productive work we set out to do – find the file. This time, rather than the already improved 6 hours, it is likely to take much less, say 4 hours. Nonetheless, yet again, the actual amount of time we have spent in total (6 hours of refactoring and 4 hours to find the file) is more than we might have spent if we simply came and looked for the file (8 hours). You will see however, that even if we leave the room in its current ‘partially refactored’ state, things have improved significantly over our initial situation. Now, each time we wish to find a file, even if we don’t spend any more time tidying, a file will typically take only 4 hours to find every time. The savings we have gained in terms of the vastly improved speed of finding a file, are paid for once, but benefit us each time we revisit the room.

This is exactly how it is with refactoring code. Whenever we improve the clarity of the code, and thus the speed at which future people can understand and change the code, the benefits of any improvement apply for every future occasion on which that code is worked on.

The next row of imaginary figures is very interesting. Here, on the 3rd visit, we actually spend 3 hours for refactoring, and 3 hours for finding our file. What is interesting is this: we could find our file now in an average of 4 hours, but yet we choose still to spend just as much time tidying before we look for the file, and, even with that being the case, we have still taken less time in total (6 hours) than in the scenario where we never do refactoring of the room (i.e. 8 hours).
Think about that for a moment. In coding terms, it is as if we have decided to spend just as long on refactoring as we will on the ‘actual’ coding, and yet, and here is the marvel of it – we have actually spent less time in total than we would have done under the scenario of not ever refactoring. We have spent twice as much time as we needed to get the job done (6 hours – 3 refactoring, and 3 finding the file), and yet we have still spent less time in total than the poor person who is working on the unrefactored room (8 hours).

I can imagine many of you now scurrying to the tables of figures trying to understand fully this apparent cloak-and-mirrors trickery, to try and understand the error of the logic.

There is no error in the logic. The apparent deception comes from the fact that the benefit of a refactoring is seen after the refactoring has been done, and this current visit to the room of files is really reaping the benefits of all previous times we have spent tidying the room.

What it does illustrate is that in order to understand the benefits of refactoring, we can never consider a single refactoring in isolation. Almost any refactoring when looked at in isolation will probably not be worthwhile – sometimes it happens that the time spent in refactoring is immediately repaid in the speed of making the change we want, but this is usually the exception rather than the rule. No, the real gains usually come later, in all subsequent changes.

Imagine trying to get managerial support for a particular change where you want to spend two days doing it rather than a few hours – because you “want to do some refactoring you think is needed”. It is difficult. Support for refactoring must be won first in a global discussion of its advantages when looked at in the global context rather than in a specific debate about a single refactoring, since any refactoring in itself, generally does not look worth doing, but it only becomes valuable when we think of the many times in future that the code will be worked on.
Let us skip in the table to the 7th visit to the room. Here you will notice that 1 hour is spent refactoring, and 1/10th of that in actually finding the file. What profligacy! Ten times as much time spent refactoring as actually doing the work? Of course we could never agree to that! You, my now learned reader will already see the fallacy of such a claim. You will know that refactoring, even at ten times the time spent on the ‘actual’ work, will soon enough reap its rewards.

From the 8th to the 11th visit I have shown no refactoring at all. There comes a point where there is little to be gained from further refactoring.

The other thing to note on these visits is the difference in time to perform our required task of finding a file: 0.1 hours versus 8 hours in the still untidy room, a factor of 80 times quicker. Is this realistic? Certainly I think for the room of files you can see that this would be perfectly reasonable; if you have a room of files in complete and total disorder it could easily take a whole day to find something, but yet, if the room was beautifully and logically arranged a file could almost instantly be found.

But what of our software? Can it really be that bad? Yes, I think the ratio sometimes is that bad. Yes, I do think that with badly decayed software, things can even take as long as 100 times as long! You can and understand it clearly with a very simplified example like this room of files, but the reality is that it is often time-sappingly bad within poor code too.

This huge amount of extra time needed to work on our software can apply from the smallest change to the largest. A bug will take several days to find and fix, rather than ten minutes (that’s a factor of 100 as near as makes no difference). A major change will take man-years rather than man-months.

On the 12th visit I have shown a little blip in the hours spent refactoring column. In the previous visits, no time at all was spent in refactoring and here, all of a sudden, another half-hour is spent in refactoring, without any apparent gain in subsequent times finding a file. Sometimes refactoring will be like this, that there are no immediate obvious gains in time. But any refactoring will be done because the person doing it perceives it as something useful for bringing more order to the situation. Perhaps they didn’t think of a way to improve things before; that’s fine, there are usually many visits to the same piece of code, and at each stage there is the opportunity to make improvements which suddenly become apparent, even if they hadn’t been thought of before.

Let us now look at the total time spent over all the iterations. When we look at the totals, for the refactoring scenario, we have actually spent twice as much time refactoring the room (30.5 hours) as we have actually finding a file (17.2 hours). But yet, when we compare the overall totals for working with the room in the two ways – a room of files with refactoring, and a room of files not refactored – the overall time where the room was treated with refactoring (47.7 hours) is half that where the room was not refactored (104 hours).

See my book The Refactoring Workout for more on refactoring.

The Refactoring Workout