I've been involved with several migration projects (some mainframe migrations and some migrating from an older platform to a newer one). At the beginning of any migration project, I like to ask, "Why do you want to migrate your application instead of just rewriting it?"
One of the most common responses I hear (among the ones I mentioned in my
mainframe migration introduction post) is
We don't really have documentation; our source code is the documentation.
I've been in the software industry long enough to know that when schedules or money get tight (and they always do), documentation is the first thing to go, so I seldom expect adequate documentation and that's fine. Documentation is nice to have, but I'm pretty good and finding out what you need without it. After all, I get plenty of practice. By the way programmers talk, one would think there is almost never enough documentation and if there is, it's still not adequate.
I know that most projects lack documentation and that most code is legacy code (at least would be according to
Working Effectively with Legacy Code by Michael Feathers). This is really no big deal and doesn't differ all that much from most projects. In fact, most of my projects start with a pretty clean slate. I get to go in, help the client document processes, recommend improvements, automate everything I can, and by the end of the project the client has a nice sleek system with clean documentation and good unit test coverage.
For some reason, that process frightens clients who already have a system in place. As a result, a lot of clients facing a mainframe migration elect to migrate rather than rewrite the application.
We just don't have the time or money to start from scratch. All of our documentation is in the source code.
I have two issues with this. First, the source code is not adequate documentation because you cannot derive business logic from code. Second, if you use the source code as your documentation, the review process will consume almost the same resources as would the discovery process on a brand new project.
You Cannot Derive Business Logic from Code
I know this sounds a little hard to swallow at first. To be sure, there's a lot of code out there that you can look at and come up with a very reasonable approximation of what was the original intent of the developer. If you assume the developer sufficiently understood the original intent of the business logic, your result will also be a reasonable approximation to the actual business logic.
But, no client has ever asked me for an application that is just a reasonable approximation of the requirements they have. What you'll find is that the best result you get from interpreting source code is a set of reasonable approximations to desired business logic. Every interpretation, no matter how reasonable, will need to be clarified or you'll risk having a dysfunctional application.
Here's an easy example. Go ask a programmer to "write a method that will verify that a person is old enough to drink." You'll probably get something like this:
function VerifyIsOfLegalDrinkingAge(age)
{
return age >= 21;
}
Now, show that code to another programmer and ask what the original business rule was.
Chances are, you'll get, "ages 21 and older are of legal drinking age." While that's a close approximation, that's not the business rule you asked for. You said, "verify that a person is old enough to drink." You didn't say, "verify that a person is at least 21."
This is a really basic example and most problems in software engineering aren't as simple. In fact, even with simple problems, most code (especially legacy code) will not look like that. In your mainframe application, the above method would look more like this:
function CheckAge(age)
{
// be sure that the person has appropriate identification
// and that it is a valid picture id
return age > MINIMUM_AGE;
}
You'll dig through thousands of lines of code just to find MINIMUM_AGE = 20 somewhere. Why is the minimum age 20 instead of 21? Because the code had a bug in it and it should've been age >= MINIMUM_AGE. Instead of fixing the method, they just decremented the age. Then, you'll have to find the code that's calling the method to figure out why you would verify that an age is greater than twenty. By the time you get this method implemented correctly, you'll have consumed far more time than it would have taken the client to say, "I can't serve beer to anyone under 21."
Now, consider the fact that developers who know their languages well know what the language will handle for them. For example, C# defaults integers to 0. If you were to ask a C# developer to write a method that returns true if someone is too young to drink, you might get something like this:
public static bool IsTooYoungForBooze(Person drinker){
return drinker.Age < MINIMUM_AGE;
}
If you then ask someone to convert that to JavaScript, you'll probably get something like this:
function IsTooYoungForBooze(drinker)
{
return drinker.Age < MINIMUM_AGE;
}
Seems like a reasonable approximation? In C#, if someone fails to set Person.Age, 0 will be less than MINIMUM_AGE; however, in JavaScript, undefined is not less than MINIMUM_AGE. This kind of bug tends to rear its head late in the game. Is this the kind of mistake you're willing to make because you had a "reasonable approximation?"
Reviewing Source Code Is Resource Intensive
I know it's tempting to expect a programmer to look at your source code on one screen and write brand new code on the other screen. It's just not going to work that way. If you don't believe me, pick one of your smallest methods and ask a programmer to document the business logic.
Chances are, you'll get a dozen questions.
Why does this code never look at index 0 in this array? Where does this value get set? Is this always going to be 4 characters? What happens if this number is negative? What is this method trying to do? Who's calling it?
"Who's calling it?" is a very serious question. If you're prepared to list every place in the source code where that method is being called, be prepared for more questions about static or global variables, protection levels, database constraints, etc. If you finally get through that, be prepared for even more questions because now you're getting into the meat of the issue: Now that I know what it did then, what do you want it to do now?
Imagine how much improved your situation would be had you told your developer what the business logic was supposed to be instead of asking to have it inferred from a source that provides a best case scenario of a reasonable approximation.
What's the Solution?
One of my biggest pet peeves is when someone presents a problem but hasn't considered possible solutions. So, as not to break my own rules, I do have a solution to this particular problem. Rather than looking at your source code as the only source of business logic documentation, look at it as a guide and recognize that you simply don't have documentation.
I know it's rough to admit something like that, but if it makes you feel any better, nobody else has documentation either. Besides, admitting you have a problem is the first step to recovery, right? So now that we've acknowledged that we lack adequate documentation we can move on to bigger and better things. We can use the source code as a guide to get us (and keep us) moving in the right direction. We can write a new application in short iterations with open communication and good unit test coverage while we document business rules.
In my experience, a rewrite saves time and money in the short run. In the long run, as you consider feature changes and maintenance costs, it's no contest that a rewrite beats a migration every time. Keep in mind that your application is just an application. I know it seems like there's a lot going on in there, but it's just another application. Every developer I've ever talked to about mainframe migrations has said,
We shouldn't be migrating this. It would be so much easier just to rewrite from scratch and then it'd be a good program too. I mean, really . . . it's just a ____________.