Friday, June 25, 2010

Mainframe Migrations: Extracting Business Logic from Source Code

I've been involved with several migration projects (some mainframe migrations and some migrating from an older platform to a newer one). At the beginning of any migration project, I like to ask, "Why do you want to migrate your application instead of just rewriting it?"

One of the most common responses I hear (among the ones I mentioned in my mainframe migration introduction post) is

We don't really have documentation; our source code is the documentation.

I've been in the software industry long enough to know that when schedules or money get tight (and they always do), documentation is the first thing to go, so I seldom expect adequate documentation and that's fine. Documentation is nice to have, but I'm pretty good and finding out what you need without it. After all, I get plenty of practice. By the way programmers talk, one would think there is almost never enough documentation and if there is, it's still not adequate.

I know that most projects lack documentation and that most code is legacy code (at least would be according to Working Effectively with Legacy Code by Michael Feathers). This is really no big deal and doesn't differ all that much from most projects. In fact, most of my projects start with a pretty clean slate. I get to go in, help the client document processes, recommend improvements, automate everything I can, and by the end of the project the client has a nice sleek system with clean documentation and good unit test coverage.

For some reason, that process frightens clients who already have a system in place. As a result, a lot of clients facing a mainframe migration elect to migrate rather than rewrite the application.

We just don't have the time or money to start from scratch. All of our documentation is in the source code.

I have two issues with this. First, the source code is not adequate documentation because you cannot derive business logic from code. Second, if you use the source code as your documentation, the review process will consume almost the same resources as would the discovery process on a brand new project.

You Cannot Derive Business Logic from Code

I know this sounds a little hard to swallow at first. To be sure, there's a lot of code out there that you can look at and come up with a very reasonable approximation of what was the original intent of the developer. If you assume the developer sufficiently understood the original intent of the business logic, your result will also be a reasonable approximation to the actual business logic.

But, no client has ever asked me for an application that is just a reasonable approximation of the requirements they have. What you'll find is that the best result you get from interpreting source code is a set of reasonable approximations to desired business logic. Every interpretation, no matter how reasonable, will need to be clarified or you'll risk having a dysfunctional application.

Here's an easy example. Go ask a programmer to "write a method that will verify that a person is old enough to drink." You'll probably get something like this:

function VerifyIsOfLegalDrinkingAge(age)
{
  return age >= 21;
}

Now, show that code to another programmer and ask what the original business rule was.

Chances are, you'll get, "ages 21 and older are of legal drinking age." While that's a close approximation, that's not the business rule you asked for. You said, "verify that a person is old enough to drink." You didn't say, "verify that a person is at least 21."

This is a really basic example and most problems in software engineering aren't as simple. In fact, even with simple problems, most code (especially legacy code) will not look like that. In your mainframe application, the above method would look more like this:

function CheckAge(age)
{
  // be sure that the person has appropriate identification
  // and that it is a valid picture id
  return age > MINIMUM_AGE;
}

You'll dig through thousands of lines of code just to find MINIMUM_AGE = 20 somewhere. Why is the minimum age 20 instead of 21? Because the code had a bug in it and it should've been age >= MINIMUM_AGE. Instead of fixing the method, they just decremented the age. Then, you'll have to find the code that's calling the method to figure out why you would verify that an age is greater than twenty. By the time you get this method implemented correctly, you'll have consumed far more time than it would have taken the client to say, "I can't serve beer to anyone under 21."

Now, consider the fact that developers who know their languages well know what the language will handle for them. For example, C# defaults integers to 0. If you were to ask a C# developer to write a method that returns true if someone is too young to drink, you might get something like this:

public static bool IsTooYoungForBooze(Person drinker){
    return drinker.Age < MINIMUM_AGE;
}

If you then ask someone to convert that to JavaScript, you'll probably get something like this:

function IsTooYoungForBooze(drinker)
{
  return drinker.Age < MINIMUM_AGE;
}

Seems like a reasonable approximation? In C#, if someone fails to set Person.Age, 0 will be less than MINIMUM_AGE; however, in JavaScript, undefined is not less than MINIMUM_AGE. This kind of bug tends to rear its head late in the game. Is this the kind of mistake you're willing to make because you had a "reasonable approximation?"

Reviewing Source Code Is Resource Intensive

I know it's tempting to expect a programmer to look at your source code on one screen and write brand new code on the other screen. It's just not going to work that way. If you don't believe me, pick one of your smallest methods and ask a programmer to document the business logic. Chances are, you'll get a dozen questions.

Why does this code never look at index 0 in this array? Where does this value get set? Is this always going to be 4 characters? What happens if this number is negative? What is this method trying to do? Who's calling it?

"Who's calling it?" is a very serious question. If you're prepared to list every place in the source code where that method is being called, be prepared for more questions about static or global variables, protection levels, database constraints, etc. If you finally get through that, be prepared for even more questions because now you're getting into the meat of the issue: Now that I know what it did then, what do you want it to do now?

Imagine how much improved your situation would be had you told your developer what the business logic was supposed to be instead of asking to have it inferred from a source that provides a best case scenario of a reasonable approximation.

What's the Solution?

One of my biggest pet peeves is when someone presents a problem but hasn't considered possible solutions. So, as not to break my own rules, I do have a solution to this particular problem. Rather than looking at your source code as the only source of business logic documentation, look at it as a guide and recognize that you simply don't have documentation.

I know it's rough to admit something like that, but if it makes you feel any better, nobody else has documentation either. Besides, admitting you have a problem is the first step to recovery, right? So now that we've acknowledged that we lack adequate documentation we can move on to bigger and better things. We can use the source code as a guide to get us (and keep us) moving in the right direction. We can write a new application in short iterations with open communication and good unit test coverage while we document business rules.

In my experience, a rewrite saves time and money in the short run. In the long run, as you consider feature changes and maintenance costs, it's no contest that a rewrite beats a migration every time. Keep in mind that your application is just an application. I know it seems like there's a lot going on in there, but it's just another application. Every developer I've ever talked to about mainframe migrations has said,

We shouldn't be migrating this. It would be so much easier just to rewrite from scratch and then it'd be a good program too. I mean, really . . . it's just a ____________.

Tuesday, June 22, 2010

Mainframe Migrations: Migrating Data and Data Structures

It's hard to resist the urge to export data from a mainframe database, import them into the new database engine, and call the repository "finished." It is easy, on the other hand, to say "This has been working for 25 years; why would we change it now?"

Well, kind of a lot has changed in the last 25 years. For example, Adabas is a post-relational inverted list database. Sql Server on the other hand is a relational database with heaps and b-trees. I'm not saying that one is better than the other; to be sure, each has its merits. What I am saying is that they're different.

They store and retrieve data differently. They have different rules. They are optimized for different normal forms (again, with a background in both OLTP and OLAP, I'm not asserting that one is better than the other). With all of the ways that DBMSs differ, especially current vs. mainframe DBMSs, you just can't "lift and shift" and expect the same functionality or the same performance.

Continuing with the Adabas vs. Sql Server example (because that's what I have experience with), if you just pull the data right out of Adabas, you'll find that it is in need of serious refactoring. While it makes perfect sense to work with the Adabas data structure in Natural, it doesn't make sense using Sql Server and C#. The denormalized data will complicate your CRUD operations and clutter your database with empty results.

Then, you'll either have to filter your results or support CRUD operations that populate the empty rows so you'll get expected results back. In my experience, you can deal with a lot of these hassles by importing the data into a staging database and running an ETL process to get them into your application database. During the ETL process, you can eliminate empty rows, convert default values to NULLs, and normalize your data.

If you take the time to rebuild your database to support your application, it's really easy to keep your ETL process up to date to get the data into your database. On the other hand, if you build your application around the existing data structure, you'll spend countless hours pulling your hair out thinking to yourself, "who stores data this way?" You'll probably even find yourself cursing Adabas, Software AG, and mainframes in general. Meanwhile, it's your own fault!

You should have rebuilt the database and transformed your data to fit into it (which is easy), rather than leaving the database structure alone and forcing your programmers to compensate for the fact the the framework (and current software standards) won't support the intentional misuse of the new DBMS.

To put this another way, transforming data is repeatable and easy. It takes very little effort to extract data from a staging database and make them fit into a database designed to be efficient with your application. It's also not that hard to design the application database either. It will certainly take more time to design the database and the ETL process than it will to just dump the data into a database, but I assert that you will make up for that difference when your programmers start trying to fit a very large square peg into a very small round hole.

Just to make sure I'm not alone, I asked several of my colleagues who have work experience in mainframe migrations in a very informal survey. I asked them to consider a situation where they were migrating an application from Adabas to Sql Server. The source data are obviously the same and the result was to also be constant. That is to say that the entities presented to the BLL from the DAL should be clean, the code should be clean, and the data access should be equally perfomant. How much effort do you have to spend building the new database, writing the ETL process, and constructing the data access layer if you're "lifting and shifting" vs. rebuilding the database from scratch.

Here are the results:
Bar Graph: Rebuild vs. Lift and Shift Approach to Mainframe Database Migration

Bar Graph: Rebuild vs. Lift and Shift Approach to Mainframe Database Migration

To my colleagues, the choice seems clear. The fact is that you're going to have to transform your data at some point. You either have to do it in the ETL process and store the data in a cleaner format or you're going to have to do it in your DAL. If you do it in your DAL, it'll be harder to write the DAL because of the amount of transmogrification you'll have to do between your BLL and your repository.

And, let's just hope you never have to change anything. Storing your data in an unnatural format will make it very difficult to add features down the road, your DAL will be difficult to maintain (at best), and your entire application will most likely be considerably less performant than had you opted to save a little time and money and just refactor your database from the beginning.

Mainframe Migrations: Tips, Tricks, and How-To's

Yesterday, I wrote a blog post about mainframe migrations. The top brass in my company took a look at it and told me that the world is just not ready for my thoughts on mainframe migrations yet.

I unpublished that post and will republish it at the end of this series. Leading up to my thesis I will post a series of articles detailing the things I've learned leading my last few mainframe migrations.

Migrating Data and Data Structures
The Documentation is in the Source Code
Source Code Translation Tools
The Original Authors are Unavailable
My Original Mainframe Migration Post

Be on the lookout for more updates. As I write these posts, each item will become a link to the article.