Tuesday, June 22, 2010

Mainframe Migrations: Migrating Data and Data Structures

MainframeIt's hard to resist the urge to export data from a mainframe database, import them into the new database engine, and call the repository "finished." It is easy, on the other hand, to say "This has been working for 25 years; why would we change it now?"

Well, kind of a lot has changed in the last 25 years. For example, Adabas is a post-relational inverted list database. Sql Server on the other hand is a relational database with heaps and b-trees. I'm not saying that one is better than the other; to be sure, each has its merits. What I am saying is that they're different.

They store and retrieve data differently. They have different rules. They are optimized for different normal forms (again, with a background in both OLTP and OLAP, I'm not asserting that one is better than the other). With all of the ways that DBMSs differ, especially current vs. mainframe DBMSs, you just can't "lift and shift" and expect the same functionality or the same performance.

Continuing with the Adabas vs. Sql Server example (because that's what I have experience with), if you just pull the data right out of Adabas, you'll find that it is in need of serious refactoring. While it makes perfect sense to work with the Adabas data structure in Natural, it doesn't make sense using Sql Server and C#. The denormalized data will complicate your CRUD operations and clutter your database with empty results.

Then, you'll either have to filter your results or support CRUD operations that populate the empty rows so you'll get expected results back. In my experience, you can deal with a lot of these hassles by importing the data into a staging database and running an ETL process to get them into your application database. During the ETL process, you can eliminate empty rows, convert default values to NULLs, and normalize your data.

If you take the time to rebuild your database to support your application, it's really easy to keep your ETL process up to date to get the data into your database. On the other hand, if you build your application around the existing data structure, you'll spend countless hours pulling your hair out thinking to yourself, "who stores data this way?" You'll probably even find yourself cursing Adabas, Software AG, and mainframes in general. Meanwhile, it's your own fault!

You should have rebuilt the database and transformed your data to fit into it (which is easy), rather than leaving the database structure alone and forcing your programmers to compensate for the fact the the framework (and current software standards) won't support the intentional misuse of the new DBMS.

To put this another way, transforming data is repeatable and easy. It takes very little effort to extract data from a staging database and make them fit into a database designed to be efficient with your application. It's also not that hard to design the application database either. It will certainly take more time to design the database and the ETL process than it will to just dump the data into a database, but I assert that you will make up for that difference when your programmers start trying to fit a very large square peg into a very small round hole.

Just to make sure I'm not alone, I asked several of my colleagues who have work experience in mainframe migrations in a very informal survey. I asked them to consider a situation where they were migrating an application from Adabas to Sql Server. The source data are obviously the same and the result was to also be constant. That is to say that the entities presented to the BLL from the DAL should be clean, the code should be clean, and the data access should be equally perfomant. How much effort do you have to spend building the new database, writing the ETL process, and constructing the data access layer if you're "lifting and shifting" vs. rebuilding the database from scratch.

Here are the results:
Bar Graph: Rebuild vs. Lift and Shift Approach to Mainframe Database Migration

To my colleagues, the choice seems clear. The fact is that you're going to have to transform your data at some point. You either have to do it in the ETL process and store the data in a cleaner format or you're going to have to do it in your DAL. If you do it in your DAL, it'll be harder to write the DAL because of the amount of transmogrification you'll have to do between your BLL and your repository.

And, let's just hope you never have to change anything. Storing your data in an unnatural format will make it very difficult to add features down the road, your DAL will be difficult to maintain (at best), and your entire application will most likely be considerably less performant than had you opted to save a little time and money and just refactor your database from the beginning.

No comments:

Post a Comment