Tuesday, October 28, 2008

Refactoring - Room of Files analogy


To help understand software refactoring, I want to introduce an analogy of a Room of Files.

Imagine you have a large room full of files, but which is a real mess. It has dozens and dozens of filing cabinets, and shelves, with hundreds and hundreds of paper files. And regularly, this is where you come to get your different files. But it’s a mess. A real nightmare to find what you want. Sure, there is some order within it. But the chaos is so bad, that...well, the order that there is within it...is very difficult, if not impossible, to spot.

That’s often what our software is like! Our software has some sort of order, but the disorder that has gradually accumulated has more and more taken over, until it becomes almost impossible to see the order through all the disorder.

Now imagine that you regularly have to come into this room and find a file or some papers that are needed. How easy is it? How much time does it take? It takes ages just to find where the thing is your looking for. Doesn’t that describe accurately how things are like in our software? Doesn’t that describe how often it takes to understand all the parts that are causing the latest bug? Oh, you think that things are different in your software! Are you sure? How long does it take you to track down bugs? How often do functions get rewritten in different places with different names...just because people didn’t realise they already existed. Somewhere in the code. Somewhere the programmer just didn’t know about.

I have too often worked on code where I have almost wanted to cry because something which should be just so simple...is just so difficult. Tracking down and finding a relatively simple bug can take a day or more, when really it should only take half an hour...or less. Why? Because the code is in such disorder.
And as for adding new functionality, adding new functionality to code in such a state can be nearly impossible.

It’s a thing difficult to objectively prove, but when I work on code which is really bad, I get the feeling that programming tasks take something like 10 times as long as if the code were in a good state. Yes, 10 times as long. Because the poorly structured code just makes it like that. A task which should take half an hour takes a day – that is 16 times as long by my reckoning.

Think of the Room of Files. If the room was extremely well-organised, how long would it take to find a file you wanted? A few seconds? Five or ten minutes max? And what about with the room in a really messed up state? A couple of hours? All day? Maybe you wouldn’t even find it after a day because you missed it first time through, and would have to go through everything all again, but this time a lot more carefully and slowly.

And that’s often what our software is like. A job which should only take a short time takes not just twice as long, but easily five or ten times as long.
Sometimes people see refactoring as a waste of time. The train of though has a certain logic to it. It goes like this. If I fix this bug, or add this new function, it will take me a certain amount of time, say one day. If I do refactoring as part of the task, the task will take me twice as long. Twice as long! You can imagine the non-understanding manager making you feel like you’ve asked to go on four weeks holiday just when the project is struggling to hit its deadline: What on earth would you want to waste your time like that!? You can do it in a day, but you’re going to take two days to do it...why, because you want to do some refactoring?! And what will be the benefit to the customer of this refactoring? The code is a little bit prettier inside!?!

I think refactoring, done properly, will typically mean that coding will (in the microscopic view of things) typically take twice as long. Why? Because you will be continually finding ways to improve the code. But note well, I said in the microscopic view of things, that is, when we look at a single isolated task. Indeed, in a way, it typically has to be the case that when you look at a single isolated task, doing some refactoring will make the task take longer. Why? Because almost by the definition of refactoring, the benefit will come when people work on this code afterwards. Yet that future may even be directly afterwards. I have said elsewhere that you don’t do refactoring if you are the last one to ever be touching a piece of code. But in the macroscopic view of things, when you look at the overall productivity of the programmer, it increases significantly. The programmer easily becomes say twice as productive, but more probably tasks can be done five or ten times as quickly or even quicker.

Refactoring code is like tidying up in the Room of Files. Sure, if I just search through the mess of files for what I’m looking for, it’ll take me a certain amount of time (a day say), and if I also spend time doing some tidying up it might take me twice as long (two days say) in total to find the file I’m looking for, but the room will be a bit more organised. Sure, I’ve lost time on this first search. But, even if I haven’t completely sorted out the room, I’ve gained loads of time on the next search.

And notice how I refactor (tidy and organise) the Room of Files. I don’t set aside a week or two, or even a month or two if needed, to really sort it out. I could. But in general there will not be the management support for that sort of full-time clear-up, even if that were the best way to do it.

Some people mistakenly think that because you adopt a policy of refactoring, you are going to be spending weeks on tasks which previously would have taken a day. That is not usually how one approaches refactoring.
What we do is at the time of actually doing some needed work (looking for a file in the example, fixing a bug or adding some new feature in software), we spend a proportion of our time in improving the ability to do our job in all future visits.

In fact even if there were the possibility to do a full-time clear-up, there would be a good argument against it: it is actually a lot easier to do the clear-up bit by bit whilst having a very concrete task such as looking for a file, than to simply go in and clear it up, as it were, just with the aim of clearing it up. If we do clear-up as part of another task, it is easier to stay focused on what clear-up is actually useful since you are searching for a particular item, and it is easier to stay motivated since you have in your mind at all times how this is really going to help to find this particular file, and other such files in the future.

See my book The Refactoring Workout for more on refactoring.

Tuesday, October 21, 2008

What refactoring IS

Let’s now state in the positive sense what refactoring is.

Refactoring is changing code internally, so that in some way or other, the code is internally in a better state. It aims to leave the code either clearer and easier to understand, or better structured, or containing less duplication, or simply with less code.

Refactorings change the code in such a way that externally there is no difference to the way the program functions or performs. If there has been a change that you can see on the outside – for example a button behaves slightly differently, or a calculation gives a different result – then this is not a refactoring. Often, a refactoring is done as part of a larger change which does change functionality – such as making a button behave slightly differently, or fixing a bug in a calculation – but the refactoring and the change in functionality are not the same thing.

Refactorings are objective improvements to code. Refactorings are changes which any reasonable programmer would agree have improved the code. It is true that, sometimes, whether a particular change would improve the code or not can be a matter of debate, but most refactorings are not in that category – most of them the change is a clear and easily acknowledged improvement.

A refactoring may be a change that is small and almost trivial, such as the renaming of a variable. Or it may be a change which is very large, such as separating out all the business logic which has somehow got tangled up with all the code dealing with the presentation.

Refactoring is a disciplined process, following careful steps to minimise as far as possible any chance of creating errors. Were ad-hoc improvements to the internals of a program in the past also refactoring? Yes, I believe they were. We would no doubt have performed them with as much care and attention as we knew how to do then. Refactoring applies best practices in making the changes in a disciplined way, and today our knowledge and understanding of how to perform the refactorings in a disciplined way has greatly improved.

Refactoring is a disciplined process in the same way that flying is a disciplined process. When, in 1903, Orville Wright had, what we term the first controlled, powered flight, it certainly required a large dose of discipline alongside much experimentation and daring. Nonetheless, the Wright brothers would no doubt look with great admiration and respect at the extensive and detailed modern-flight checklists. But regardless of the amount of discipline involved, flying is still flying. A flight done without performing any pre-flight checks is still flying. Similarly, refactoring done without any discipline is still, strictly speaking refactoring. Nonetheless, we should always seek to use the most recent understanding of how to perform refactorings with the most discipline possible. The person who does refactoring without being as disciplined as possible, is liable to end up in as difficult a situation as a pilot who ignores all pre-flight checks.
Refactoring is an ongoing process, to be applied at all stages of software development. It is something which you start applying even from the first week of coding (where code is growing at the fastest rate, and so often needs high amounts of refactoring to keep it in peak condition).

Refactoring is rather like a continual tug of war between the forces for and against the internal quality of the software. On the one side, pulling the software into a worse and worse structure internally are things such as tight schedules, sheer quantity of new code being added, junior developers, lack of experience of the technologies and so forth. On the other side, trying to pull the internal quality into as good a state as possible is refactoring. Yes, all the other forces on software are tending to pull it in a negative direction; that is why refactoring is so desperately needed, and also why it must be an ongoing process.

That is why, if we just focus our attention on trying to improve the causes of failure, even new projects are ultimately doomed. But this is what we like to do. With any new software development, we like to believe that if only this time we can get the specs written better, or develop in a more iterative way, or control the changes better, or ensure that the developers are more experienced, or any of dozens of other things – if only we can do this – then, ah yes, then at last, this time, this time our software will be great. This time, we will do things first time right, and we will keep them first time right.

No! No! No! This is doomed. The reasons that your next project will end up with the software gradually decaying beyond help are exactly the same reasons that the previous projects software has or will decay beyond help. Every cause that we have looked at is something that is sure and certain. Software entropy is a fact of life caused by all the things we have mentioned. Every cause that we have looked at will certainly pull your software in a negative direction. Improving each of the causes, whether it is by better specs, or better programmers, will only help to reduce these forces of entropy pulling software in the wrong direction.

There will always be factors pulling a program the wrong way, tending to make a program get worse and worse. There are many causes of software decay, pulling the program inexorably in the wrong direction; it is the natural lot of a programmer’s life. By all means, let us try and reduce the causes of software decay to a minimum. But the forces which pull the program the wrong way in quality will still exist and the program will still get worse.

There is only one thing which we can do to pull a program in the right direction, that is by a constant and continual application of refactoring. Refactoring is the one thing where we actively try to pull a program in the right direction, trying to make it better and better. The only way we can hope to maintain the long-term health of our programs is if we have some force which pulls the program in the right direction in quality; that force is refactoring and our programs cannot survive without it.

Refactoring is also like a person and exercise. Adding code is like eating food, refactoring is like taking exercise. As you eat more and more, and take no exercise, the tendency is to get fatter and fatter. When you exercise, you help the body to stay fit and healthy. Refactoring is to software as exercise is to people: it helps keep the program in a fit and healthy state. The more food the man eats (=the more code development which is going on), the more exercise the man needs to do (=the more refactoring is required). Just as if a man keeps eating food, he will become fatter and less flexible and more sluggish, so it is for your software, as you add code, unless you constantly refactor, they will become more and more puffed up with duplicate code, and slower and slower to work with and change and enhance.

See my book The Refactoring Workout for more on refactoring.

Tuesday, October 14, 2008

What refactoring is NOT

It can sometimes be helpful to explain what something is by saying what it is not.

Refactoring is not adding nor changing any functionality. The code should do exactly the same after any refactoring as before the refactoring. It should return exactly the same results, display exactly the same output, and react exactly the same to any input. Very often, if not almost always, refactoring is done as part of or as preparation for a change in functionality, but the refactoring itself does not change the functionality.

Refactoring is not about improving performance. Performance may or may not change due to a refactoring, but any such change in performance would be an incidental consequence rather than a direct aim of the refactoring. Of course, almost any change to the internal structure of a program will to some small degree change the performance of a program. If I divide a long method into two or three smaller methods, there will inevitably be some tiny change to the performance of the program. But any such performance changes are an inevitable side-effect of any changes to a program, including refactoring, rather than an objective, and in most cases are liable to be negligible.

Refactoring is not about stopping work for a few months to make a program in a good shape before carrying on; that can be done, but normally refactoring will be done as an ongoing task alongside normal development, just as a developer constantly tests his code, so a developer doing refactoring will be constantly doing refactoring.

Refactoring is not about ripping everything out completely, breaking everything and then trying to rebuild things with a hope that it’ll eventually be working again, but better. No. Refactoring is about moving from the existing program, and making single changes in small deltas so that at no stage is the program ever broken.

Refactoring is not like the car mechanic who completely strips down the engine into every tiny piece and then puts it all back together again. No. Refactoring is like the car mechanic who, while he happens to be working on a part of the engine on a task involving removing the spark plugs, will take a look at each spark plug in turn, checking the spark plug gap, replacing it in the engine, and checking each time that the engine still works correctly.

Refactoring is not about changing lots of things in a program at once. It is about changing one thing and only one thing at a time.

Refactoring is not about hacking into a change in a haphazard indiscriminate gung-ho approach with giant sweeping changes all at once. No, it’s about carefully making a single specific change in a series of small steps and testing after each step to ensure nothing has broken.

Refactoring is not about one programmer changing the work of another programmer to be in a style that he/she happens to like more. It’s about making changes which have real tangible improvements which, for the most part, can be objectively shown to improve the internal design of a program.

Refactoring is not only about big changes like extracting whole classes or breaking apart complex inheritance structures. No, refactoring is about even the simplest of changes such as replacing a magic number with a symbolic constant or introducing a temporary explaining variable to help clarify a complex expression, or simply renaming a method.

Refactoring is not only about tiny changes like renaming methods. It can be used to make giant changes to programs which truly turn upside down the way they work. Things like converting a program written in a procedural style to an object-oriented style, or restructuring a program to separate business logic from the presentation or user-interface layer.

And finally, refactoring is not about doing ‘busy’ work, with no real business value – something just to make the program look nicer or make the programmers feel good. No. Refactoring is about delivering real business value. It’s about making the program easier to change. It’s about making it easier and faster to add new functionality. It’s about reducing the likelihood of bugs and reducing the speed needed to find and fix them. It’s about extending the life of the program since it is more adapted to cope with the needs of the future. And while doing all of these things, there are even other unplanned bonuses which have real business value. Such as the program being more able to be redeployed for a variety of other tasks which were never part of the originally intended purpose. And such as the positive effect on team morale that people are working on a program of which they can be truly proud, rather than one of which they truly wish to see the back of.

See my book The Refactoring Workout for more on refactoring.

Tuesday, October 7, 2008

Refactoring Defined

What is Refactoring? Let me give my own definition:

Refactoring is a disciplined technique for improving software internally without changing its external behaviour.

Or, as Fowler defines it in his book:
“The process of changing a software system in such a way that it does not alter the external behaviour of the code yet improves its internal structure”

Or, from Fowler’s www.refactoring.com website:

“Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a 'refactoring') does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it's less likely to go wrong. The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.”

For example, if I rename a variable within my code, I have (hopefully) chosen a more meaningful name, and thus the code has improved internally, but its external behaviour has (hopefully!) remained identical.

Another example. I have two classes which I notice share a degree of common functionality, so I create a parent class which contains the common functionality, and then in each of the two original classes I simply keep the functionality which is different. I have improved the software internally because I have reduced duplication and structured the code in a way that shows the relatedness between two previously separate classes, but yet without changing the external behaviour. The software should work exactly the same after the change as before the change.

Refactoring is a disciplined technique, that is, it is done systematically and according to a well-defined cycle of small step, test, small step, test, build.

So, to unpack the definition a little, refactoring is three essential things:
1. a disciplined technique
2. for improving software internally
3. without changing its external behaviour

See my book The Refactoring Workout for more on refactoring.