Tuesday, November 16, 2010

Improved UpTo Extension Method

RubyBack in May of '09, I wrote a post about an extension method to mimic Ruby's upto.

Today, I was hanging out at Visual Studio Live and had another idea. In Ruby, the upto method takes an end value and executes a code block. In C#, interpreted this as an extension method that takes an end value and an action. They syntax ends up being pretty similar and it works fine.

Lately, I've been using an Each extension method for IEnumerables. I know an Each extension method seems like a waste, but I like the fluency of it for basic actions. So, given that I have the Each method on IEnumerable<T> anyway, I decided UpTo should really return an IEnumerable as well.

Here's my new UpTo extension method:
public static IEnumerable<int> UpTo(this int start, int end)
    while (start <= end)
        yield return start++;
Now, instead of passing your action to the UpTo method like Ruby's upto, you get your enumerable and pass your delegate to your Each extension method. Here's what it looks like:
static void Main()
    //for (int i = 5; i <= 10; i++)
    //    Console.WriteLine(i);


    // output
    // 5
    // 6
    // 7
    // 8
    // 9
    // 10
If you're using .Net 3.5 or newer, Linq shipped with an Enumerator class which has a Range static method so your code could look more like this:
public static class NumericExtensions
    public static IEnumerable<int> UpTo(this int start, int end)
        return Enumerable.Range(start, end - start + 1);

class Program
    static void Main()
        //foreach (var i in Enumerable.Range(5, 6))
        //    Console.WriteLine(i);


        // output
        // 5
        // 6
        // 7
        // 8
        // 9
        // 10
Another reason to make UpTo return an IEnumerable instead of making it a void action executer is because sometimes you don't really want to pass a large anonymous method to the Each method. You can use the same UpTo method like this:
static void Main()
    //foreach (var i in Enumerable.Range(5, 6))
    //    // Do a lot of stuff!!

    //5.UpTo(10).Each(i =>
    //    {
    //        // Do a lot of stuff!!
    //    });

    foreach (var i in 5.UpTo(10))
        // Do a lot of stuff!!

Thursday, October 28, 2010

Results of the "What Really Makes a Good Programmer" Survey

Survey Response CountsBack in September I wrote a blog post called What Really Makes a Good Programmer? My goal was to ask various members of the development community what traits they thought contributed to the quality of a programmer. If you haven't taken the survey yet, I'd recommend you do that before reading the results. Wouldn't want to bias your opinions, right?

If you have taken the survey, but haven't told everyone you know to take it too, go ahead and do that; I'll wait.

Now that we've covered that, let's get on to the analysis. As you recall from taking the survey, there were 17 traits and you rated them on a scale of Very Important, Important, A little Important, Negligible, and Completely Unimportant. In order to do any analysis, I had to take these ratings and convert them to numbers.

I decided that A Little Important was the baseline and basically represented a lack of opinion on the topic. These traits are the ones most people feel are nice to have but aren't requirements. Basically, a good programmer often has these traits but not having them doesn't mean you're not a good programmer. I recoded these values with 0 points.

Very Important and Completely Unimportant are the ratings that people used when they felt strongly about the item. That means that, respectively, the trait is either absolutely necessary to being a good programmer or the trait has no bearing on the quality of programmer you are whatsoever (or perhaps a little bit of a negative indicator). These values were recoded as 10 and -10 respectively.

The central values of Important and Negligible are what I call the non-committal values. You feel that they make a difference but they're not quite important enough to bank on them. I gave these a 3 and -3 respectively.

For example, "Has good problem solving skills" was rated Very Important and "Cheap" was rated Completely Unimportant. One could say, "someone with poor problem solving skills would make a poor programmer." On the other hand, you cannot say, "someone who is not cheap is a poor programmer." Of course, the contrapositive could also be stated that, "someone who is a good programmer is also a good problem solver." The contrapositive "someone who is a good programmer is also cheap" is considered by the community to be a largely invalid assertion.

"Fast" and "Co-located" appear nearest to the baseline. This may be due to the fact that many of the respondents didn't know what I meant by co-located. In any case, one might say, "a good programmer is a good programmer whether or not she's in the same building," or that "just because you're fast doesn't mean you're good and just because you're slow doesn't mean you're bad."

"Communicates effectively" and "Interested in helping teammates" are moderately rated in favor of contributing to being a good programmer while the two "college degree" items are rated moderately against contributing to being a good programmer. For example, you might know a lot of programmers who are great developers but who lack the social skills or interest to become good communicators or team players. As a result, you might say that good programmers often communicate effectively and help teammates, but some good programmers cannot.

You might also say, "Many people without college degrees, let alone a computer science degree, are great programmers. Thus, while having a degree is helpful, you can learn to be a great programmer without it."

Now, I feel it is worth noting that "unimportant" does not mean "negative." Just because most of the development community feels that having a CS degree or other certifications is unimportant to being a good programmer, it doesn't mean that the degree itself is unimportant. It just means that having the degree won't make you a good programmer; you still have a lot of learning to do.

With notes on the analysis out of the way, what do you say we take a look at some data? Currently, the new Google charts won't let me force the display of the categories, so you'll have to hover over the dots you're interested in. Here's a legacy chart you can look at if the javascript version below is unsatisfactory. The legacy chart has current data; it's just not as fancy as this one:
I'm sure you noticed there are two series in this graph. All Responses represents the average responses for all respondents; however, as you noticed in the respondent histogram that I have a dramatically unbalanced sample with programmers outnumbering all other groups combined by almost 3 to 1. To get a more accurate representation of the "general feel" of the community, I included the Group Average measure. This is the average of the averages. It's sort of the electoral college of informal research.

This chart demonstrates the average responses for each group:

Improving Search Performance when using SQL Server 2008 Encrypted Columns

ConfidentialThis is a story of courage, honor, and data encryption. In addition to being something of a tribute to one of my favorite games of all time, I do feel I need to preface this post with a bit of a disclaimer.

Generally, when I describe this problem to someone, we go through pretty much the same conversation.
Why didn't you try Sql Server 2008 Transparent Data Encryption? Why don't you just update your ACL? Why don't you try this? Or that?

Well, sometimes the environment, either technologically or politically, precludes some of your better options. Ultimately, you have to go with your best permissible solution to solve the problem at hand. That being said, let's say you've been charged with securing your organizations PII. Your only option is to encrypt the column with SQL Server's column encryption.

You start looking at the impact this is going to have on your current and future applications. If you're like me, one of the first concerns you're going to have is performance. For example, let's say you have the need to search on the encrypted field. A good example of an encrypted field that you'll likely have to search is the Social Security Number.

In my original benchmarks, I found the following results:Knowing that I was going to have to improve this performance, I started playing with some searching alternatives. First of all, it's obviously much faster to search an indexed column (especially one that is clustered), so my first goal was to generate an indexed column.

Ryan McGarty (my best friend and a damn fine programmer) and I discussed and quickly ruled out a basic hash. Sure, it'd be fast to search an indexed column containing the hash value, but that opens you up to a relatively simple plain text attack (especially with a known set of possible values). I decided that you could reduce the threat and still speed up the search by using hash buckets a la hashtables instead.

I concocted a hash function that produced a reasonable distribution within the buckets. My good friend and esteemed colleague David Govek pointed out that an MD5 hash would produce a pretty effective distribution between buckets (with high cardinality data like the social security number. David was right and I ended up with this hash function:
CREATE FUNCTION [dbo].[GroupData] 
 @String VARCHAR(MAX),
 @Divisor INT

  DECLARE @Result INT;
  SET @Result = HASHBYTES('MD5', @String);
  IF (@Divisor > 0)
    SET @Result =  @Result % @Divisor;
  RETURN @Result;

I created the test data just as I had before, except that this time I also added the hash bucket. I created a clustered index on the bucket and I changed my select statement a little:
-- original select statement
FROM PersonData
  CONVERT(CHAR(9), DECRYPTBYKEY(SocialSecurityNumberEncrypted)) = '457555462'

-- hash key select statement
-- @Divisor is the divisor for the modulus operation in the hash function
FROM PersonData
  SocialSecurityNumberGroup = dbo.GroupData('457555462', @Divisor) AND
  CONVERT(CHAR(9), DECRYPTBYKEY(SocialSecurityNumberEncrypted)) = '457555462'

Here are my results:

So, what of the known plain text attack then?

So, I don't want to gloss over the plain text attack issue. Given the possible set of socials, the hashes would be very unlikely to have a collision. Thus, for most people, you'd be able to get their socials easily by hashing every possible social and joining to that table. By modulo dividing the hash value, I'm able to evenly distribute social security numbers among a known set of buckets. That means, I can control the approximate number of socials in each bucket given my set of values. I generally aim for about 1000 socials per bucket.

For example, an MD5 % 100 has a possible set of values from -100 to +100. That's 201 buckets so if you have 2010 rows of data to hash, you'll have about 10 rows per bucket. The benefit is that you now only have to decrypt 10 rows to find the exact row you're looking for. The detriment is that you've narrowed your possible result set. Within your own data set, you'd have narrowed it to 10 possible plaintext values; however, given that these values are unknown, you then have to look at the set of possible values.

Social security numbers have a set of possible values less than 1,000,000,000. It's hard to say exactly how many of them are the possible set of values, so let's say we're using only those socials currently in use by living Americans. The population of the United States at the time of writing was about 312,000,000. As I said, I usually aim for about 1,000 records per bucket. If I have 1,000,000 rows of data, I would modulo divide by 500 (1,001 buckets). If you knew my set of values, then you'd only have 1,000 possible values to reduce. Given that you know when and where I was born, you could probably narrow it to 20 or so possible socials.

But, you don't know my set of values, so you really have 312,000 values to chose from. Even if you did know the first 5 digits of my social security number (based on my state of issuance and my date of birth), your set of possible options would be so large, you'd probably be better off just pulling my credit report and getting my social that way.

Thus, while it seems to introduce a weakness to plain text attacks (and with low cardinality data it would be an issue), in the case of social security numbers, I don't believe it to be a reasonable attack.

Column Encryption in SQL Server 2008 with Symmetric Keys

ConfidentialMost of the time when I write blog posts, I do it to share ideas with my fellow developers. Sometimes I do it just so I can have a place to reference when I forget the syntax for something. This is one of those reference posts.

Recently I've been charged to column level encrypt some personally identifiable information. The present post is not intended to discuss the merits of column level encryption; rather, as I said it is to put a few code snippets up so that I can reference them later. If you should find yourself in a column level encryption predicament in a SQL Server 2008 environment, you may find these useful as well.

First thing's first. Get the database ready for column level encryption by creating a master key:
--if there is no master key create one
  FROM sys.symmetric_keys 
  WHERE symmetric_key_id = 101
  PASSWORD = 'This is where you would put a really long key for creating a symmetric key.'

Now, you'll need a certificate or a set of certificates with which you will encrypt your symmetric key or keys:
-- if the certificate doesn't, exist create it now
  FROM sys.certificates
  WHERE name = 'PrivateDataCertificate'
CREATE CERTIFICATE PrivateDataCertificate
   WITH SUBJECT = 'For encrypting private data';

Once you have your certificates, you can create your key or keys:
-- if the key doesn't exist, create it too
  FROM sys.symmetric_keys
  WHERE name = 'PrivateDataKey'

Before you can use your symmetric key, you have to open it. I recommend that you get in the habit of closing it when you're finished with it. The symmetric key remains open for the life of the session. Let's say that you have a stored procedure in which you open the symmetric key to decrypt some private data which your stored procedure uses internally. Someone who has access to the stored procedure can run it and then will have the key opened for use in decrypting private data. My point, close the key before you leave the procedure. Here's how you open and close keys.
-- open the symmetric key with which to encrypt the data.
   DECRYPTION BY CERTIFICATE PrivateDataCertificate;

-- close the symmetric key

Here's a little test script I wrote to demonstrate a few points. First, the syntax for encrypting and decrypting. Second, the fact that the the cipher text changes each time you do the encryption. This prevents a plain text attack.
-- open the symmetric key with which to encrypt the data.
   DECRYPTION BY CERTIFICATE PrivateDataCertificate;

-- somewhere to put the data
DECLARE @TestEncryption TABLE
  PlainText VARCHAR(100),
  Cipher1 VARBINARY(100),
  Cipher2 VARBINARY(100)

-- some test data
INSERT INTO @TestEncryption (PlainText)
SELECT 'Boogers'
SELECT 'Foobar'
SELECT '457-55-5462'; -- ignoranus

-- encrypt twice
UPDATE @TestEncryption
  Cipher1 = ENCRYPTBYKEY(KEY_GUID('PrivateDataKey'), PlainText),
  Cipher2 = ENCRYPTBYKEY(KEY_GUID('PrivateDataKey'), PlainText);

-- decrypt and display results  
  CiphersDiffer = CASE WHEN Cipher1 <> Cipher2 THEN 'TRUE' ELSE 'FALSE' END,
FROM @TestEncryption;

-- close the symmetric key

Tuesday, September 21, 2010

Overdraft Fee Survey

BurglarOK, I know I've been big on the surveys lately, but this one is really important to me.

As you probably know, there's been a lot of hoopla about the new overdraft fee regulations. Banks are no longer allowed to automatically enroll customers in what they call, "overdraft protection." To us common folk, we generally call them "overdraft fees" or "allowing you to spend money you don't have so we can screw you out of more money we know you don't have."

I started thinking about my history with overdraft fees and some things I learned from Dave Ramsey. I wondered, who really pays all of these overdraft fees that account for 38 billion dollars per year in revenue?

This survey intends to find out. At the time of writing, I had 84 responses and I need hundreds more. Please, take my anonymous survey and ask all of your friends to do the same. Here is the Overdraft Fee Survey:

Array Function to Recode Data in Google Apps Scripts

Google Docs IconI went on my honeymoon with my beautiful wife last week and the week before. Having a little time off of work gave me the opportunity to get some work done :). I've been wanting for a while to develop a survey to find out what makes a good programmer.

I've been working with Google Docs and Google Forms to see what they're capable of. This spreadsheet posed a few difficulties. The primary problem was that I had a set of text values which needed to be recoded to numerical values from another range.

I wrote this array function for Google Apps Scripts in Google Spreadsheets to recode values based on an array of values.

Here's an example spreadsheet demonstrating the Array Data Block Recode Function.

Here's what the function looks like with a few tests:
function recode(data, values, valueColumnIndex)
  var  valueHash = {};
  // if the values are in an array, make a hash table
  if (values.constructor == Array)
    for (var i = 0; i < values.length; i++)
      valueHash[values[i][0]] = values[i][valueColumnIndex];

    valueHash = values;
  var ret = [];
  // if the data are in an array, recursively recode them
  if (data.constructor == Array)
    for (var i = 0; i < data.length; i++)
      ret.push(recode(data[i], valueHash, valueColumnIndex));
    ret = valueHash[data] != undefined ? valueHash[data] : data;

  return ret;

var values = [['a', '1', 'I'], ['b', '2', 'II']];

print(recode('a', values, 1));
print(recode(['a', 'b', 'c'], values, 1));
print(recode([['a', 'b'], ['b', 'c']], values, 1));
print(recode(['a', ['a', 'b'], [['a', 'b', 'c']]], values, 2));

[1, 2, 'c']
[[1, 2], [2, 'c']]
['I', ['I', 'II'], [['I', 'II', 'c']]]

Google Apps Syntax:
=Recode(A1:B3, D1:F2, 1)
=Recode(A1:B3, D1:F2, 2)

Regular Expression Search Bookmarklet

iPhone BookmarkletsI have an old website where I keep most of my bookmarklets. I'm planning on deprecating that site and just putting up some personal stuff (since I don't do what that site says I do anymore).

This bookmarklet is probably my most used bookmarklet. Basically, you enter a regular expression and each match in the page will be highlighted. It cycles through 16 color schemes to change the highlight color.

If you just want to install the bookmarklet, grab this link and drag it onto your bookmarklet toolbar in your browser.
Regex Search

If you want to use it on your mobile device, you can use my Mobile Bookmarklet Installer Bookmarklet (which, by the way, will install itself too).

I'm sure it doesn't work in IE, but I haven't tested it in a really long time. If you'd like to see what it would do in IE if IE didn't suck so badly, just click it.

If you're interested in the code, here it is!
// check to see if the variable searches has been defined.
// if not, create it.  this variable is to cycle through 
// highlight colors.
if (typeof(searches) == 'undefined')
  var searches = 0;

    // just some variables
    var count = 0, text, regexp;

    // prompt for the regex to search for
    text = prompt('Search regexp:', '');

    // if no text entered, exit bookmarklet
    if (text == null || text.length == 0)

    // try to create the regex object.  if it fails
    // just exit the bookmarklet and explain why.
      regexp = new RegExp(text, 'i');

    catch (er)
      alert('Unable to create regular expression using text \'' + text + '\'.\n\n' + er);

    // this is the function that does the searching.
    function searchWithinNode(node, re)
      // more variables
      var pos, skip, acronym, middlebit, endbit, middleclone;
      skip = 0;

      // be sure the target node is a text node
      if (node.nodeType == 3)
        // find the position of the first match
        pos = node.data.search(re);

        // if there's a match . . . 
        if (pos >= 0)
          // create the acronym node.
          acronym = document.createElement('ACRONYM');
          acronym.title = 'Search ' + (searches + 1) + ': ' + re.toString();
          acronym.style.backgroundColor = backColor;
          acronym.style.borderTop = '1px solid ' + borderColor;
          acronym.style.borderBottom = '1px solid ' + borderColor;
          acronym.style.fontWeight = 'bold';
          acronym.style.color = borderColor;
    // get the last half of the node and cut the match
    // out.  then, clone the middle part and replace it with
    // the acronym
          middlebit = node.splitText(pos);
          endbit = middlebit.splitText(RegExp.lastMatch.length);
          middleclone = middlebit.cloneNode(true);
          middlebit.parentNode.replaceChild(acronym, middlebit);
          skip = 1;

      // if the node is not a text node and is not
      // a script or a style tag then search the children
      else if (
        node.nodeType == 1
        && node.childNodes
        && node.tagName.toUpperCase() != 'SCRIPT'
        && node.tagName.toUpperCase != 'STYLE'
        for (var child = 0; child < node.childNodes.length; ++child)
          child = child + searchWithinNode(node.childNodes[child], re);

      return skip;

    // use the search count to get the colors.
    var borderColor = '#' 
      + (searches + 8).toString(2).substr(-3)
      .replace(/0/g, '3')
      .replace(/1/g, '6');
    var backColor = borderColor
      .replace(/3/g, 'c')
      .replace(/6/g, 'f');

    // for the last half of every 16 searhes, invert the
    // colors.  this just adds more variation between
    // searches.
    if (searches % 16 / 8 >= 1)
      var tempColor = borderColor;
      borderColor = backColor;
      backColor = tempColor;

    searchWithinNode(document.body, regexp);
    window.status = 'Found ' + count + ' match'
      + (count == 1 ? '' : 'es')
      + ' for ' + regexp + '.';

    // if we made any matches, increment the search count
    if (count > 0)

Friday, September 3, 2010

What Really Makes a Good Programmer?

Survey Response CountsI've been working on a series of articles about mainframe migrations. My next post upcoming will be talking about source code translation tools. It got me to wondering, what is a good programmer?

I decided that the best way to figure out what makes a good programmer is to ask those who have a history of working with programmers.

I've been playing around with Google Docs and this seemed to be another solid opportunity to leverage some new Google Docs features, specifically Google Forms. I know this particular one is a little annoying but Google Forms didn't really give me many options for entering these data so I apologize for that.

I really appreciate your taking the time to read my blog and to help me with my research. So, without further ado, here is the What Makes a Good Programmer survey.

OK, maybe a little ado. Be sure to read and follow the instructions. They're pretty easy and, I think, relatively clever little recursive instructions to get a lot of responses. Thanks again for your help with my informal study.

If you've already taken the survey, I've published the results. Please don't look at the results of the "What Really Makes a Good Programmer" survey until you've completed it.

Friday, June 25, 2010

Mainframe Migrations: Extracting Business Logic from Source Code

MainframeI've been involved with several migration projects (some mainframe migrations and some migrating from an older platform to a newer one). At the beginning of any migration project, I like to ask, "Why do you want to migrate your application instead of just rewriting it?"

One of the most common responses I hear (among the ones I mentioned in my mainframe migration introduction post) is
We don't really have documentation; our source code is the documentation.

I've been in the software industry long enough to know that when schedules or money get tight (and they always do), documentation is the first thing to go, so I seldom expect adequate documentation and that's fine. Documentation is nice to have, but I'm pretty good and finding out what you need without it. After all, I get plenty of practice. By the way programmers talk, one would think there is almost never enough documentation and if there is, it's still not adequate.

I know that most projects lack documentation and that most code is legacy code (at least would be according to Working Effectively with Legacy Code by Michael Feathers). This is really no big deal and doesn't differ all that much from most projects. In fact, most of my projects start with a pretty clean slate. I get to go in, help the client document processes, recommend improvements, automate everything I can, and by the end of the project the client has a nice sleek system with clean documentation and good unit test coverage.

For some reason, that process frightens clients who already have a system in place. As a result, a lot of clients facing a mainframe migration elect to migrate rather than rewrite the application.
We just don't have the time or money to start from scratch. All of our documentation is in the source code.
I have two issues with this. First, the source code is not adequate documentation because you cannot derive business logic from code. Second, if you use the source code as your documentation, the review process will consume almost the same resources as would the discovery process on a brand new project.

You Cannot Derive Business Logic from Code

I know this sounds a little hard to swallow at first. To be sure, there's a lot of code out there that you can look at and come up with a very reasonable approximation of what was the original intent of the developer. If you assume the developer sufficiently understood the original intent of the business logic, your result will also be a reasonable approximation to the actual business logic.

But, no client has ever asked me for an application that is just a reasonable approximation of the requirements they have. What you'll find is that the best result you get from interpreting source code is a set of reasonable approximations to desired business logic. Every interpretation, no matter how reasonable, will need to be clarified or you'll risk having a dysfunctional application.

Here's an easy example. Go ask a programmer to "write a method that will verify that a person is old enough to drink." You'll probably get something like this:
function VerifyIsOfLegalDrinkingAge(age)
  return age >= 21;
Now, show that code to another programmer and ask what the original business rule was.

Chances are, you'll get, "ages 21 and older are of legal drinking age." While that's a close approximation, that's not the business rule you asked for. You said, "verify that a person is old enough to drink." You didn't say, "verify that a person is at least 21."

This is a really basic example and most problems in software engineering aren't as simple. In fact, even with simple problems, most code (especially legacy code) will not look like that. In your mainframe application, the above method would look more like this:
function CheckAge(age)
  // be sure that the person has appropriate identification
  // and that it is a valid picture id
  return age > MINIMUM_AGE;

You'll dig through thousands of lines of code just to find MINIMUM_AGE = 20 somewhere. Why is the minimum age 20 instead of 21? Because the code had a bug in it and it should've been age >= MINIMUM_AGE. Instead of fixing the method, they just decremented the age. Then, you'll have to find the code that's calling the method to figure out why you would verify that an age is greater than twenty. By the time you get this method implemented correctly, you'll have consumed far more time than it would have taken the client to say, "I can't serve beer to anyone under 21."

Now, consider the fact that developers who know their languages well know what the language will handle for them. For example, C# defaults integers to 0. If you were to ask a C# developer to write a method that returns true if someone is too young to drink, you might get something like this:
public static bool IsTooYoungForBooze(Person drinker){
    return drinker.Age < MINIMUM_AGE;
If you then ask someone to convert that to JavaScript, you'll probably get something like this:
function IsTooYoungForBooze(drinker)
  return drinker.Age < MINIMUM_AGE;
Seems like a reasonable approximation? In C#, if someone fails to set Person.Age, 0 will be less than MINIMUM_AGE; however, in JavaScript, undefined is not less than MINIMUM_AGE. This kind of bug tends to rear its head late in the game. Is this the kind of mistake you're willing to make because you had a "reasonable approximation?"

Reviewing Source Code Is Resource Intensive

I know it's tempting to expect a programmer to look at your source code on one screen and write brand new code on the other screen. It's just not going to work that way. If you don't believe me, pick one of your smallest methods and ask a programmer to document the business logic. Chances are, you'll get a dozen questions.
Why does this code never look at index 0 in this array? Where does this value get set? Is this always going to be 4 characters? What happens if this number is negative? What is this method trying to do? Who's calling it?
"Who's calling it?" is a very serious question. If you're prepared to list every place in the source code where that method is being called, be prepared for more questions about static or global variables, protection levels, database constraints, etc. If you finally get through that, be prepared for even more questions because now you're getting into the meat of the issue: Now that I know what it did then, what do you want it to do now?

Imagine how much improved your situation would be had you told your developer what the business logic was supposed to be instead of asking to have it inferred from a source that provides a best case scenario of a reasonable approximation.

What's the Solution?

One of my biggest pet peeves is when someone presents a problem but hasn't considered possible solutions. So, as not to break my own rules, I do have a solution to this particular problem. Rather than looking at your source code as the only source of business logic documentation, look at it as a guide and recognize that you simply don't have documentation.

I know it's rough to admit something like that, but if it makes you feel any better, nobody else has documentation either. Besides, admitting you have a problem is the first step to recovery, right? So now that we've acknowledged that we lack adequate documentation we can move on to bigger and better things. We can use the source code as a guide to get us (and keep us) moving in the right direction. We can write a new application in short iterations with open communication and good unit test coverage while we document business rules.

In my experience, a rewrite saves time and money in the short run. In the long run, as you consider feature changes and maintenance costs, it's no contest that a rewrite beats a migration every time. Keep in mind that your application is just an application. I know it seems like there's a lot going on in there, but it's just another application. Every developer I've ever talked to about mainframe migrations has said,
We shouldn't be migrating this. It would be so much easier just to rewrite from scratch and then it'd be a good program too. I mean, really . . . it's just a ____________.

Tuesday, June 22, 2010

Mainframe Migrations: Migrating Data and Data Structures

MainframeIt's hard to resist the urge to export data from a mainframe database, import them into the new database engine, and call the repository "finished." It is easy, on the other hand, to say "This has been working for 25 years; why would we change it now?"

Well, kind of a lot has changed in the last 25 years. For example, Adabas is a post-relational inverted list database. Sql Server on the other hand is a relational database with heaps and b-trees. I'm not saying that one is better than the other; to be sure, each has its merits. What I am saying is that they're different.

They store and retrieve data differently. They have different rules. They are optimized for different normal forms (again, with a background in both OLTP and OLAP, I'm not asserting that one is better than the other). With all of the ways that DBMSs differ, especially current vs. mainframe DBMSs, you just can't "lift and shift" and expect the same functionality or the same performance.

Continuing with the Adabas vs. Sql Server example (because that's what I have experience with), if you just pull the data right out of Adabas, you'll find that it is in need of serious refactoring. While it makes perfect sense to work with the Adabas data structure in Natural, it doesn't make sense using Sql Server and C#. The denormalized data will complicate your CRUD operations and clutter your database with empty results.

Then, you'll either have to filter your results or support CRUD operations that populate the empty rows so you'll get expected results back. In my experience, you can deal with a lot of these hassles by importing the data into a staging database and running an ETL process to get them into your application database. During the ETL process, you can eliminate empty rows, convert default values to NULLs, and normalize your data.

If you take the time to rebuild your database to support your application, it's really easy to keep your ETL process up to date to get the data into your database. On the other hand, if you build your application around the existing data structure, you'll spend countless hours pulling your hair out thinking to yourself, "who stores data this way?" You'll probably even find yourself cursing Adabas, Software AG, and mainframes in general. Meanwhile, it's your own fault!

You should have rebuilt the database and transformed your data to fit into it (which is easy), rather than leaving the database structure alone and forcing your programmers to compensate for the fact the the framework (and current software standards) won't support the intentional misuse of the new DBMS.

To put this another way, transforming data is repeatable and easy. It takes very little effort to extract data from a staging database and make them fit into a database designed to be efficient with your application. It's also not that hard to design the application database either. It will certainly take more time to design the database and the ETL process than it will to just dump the data into a database, but I assert that you will make up for that difference when your programmers start trying to fit a very large square peg into a very small round hole.

Just to make sure I'm not alone, I asked several of my colleagues who have work experience in mainframe migrations in a very informal survey. I asked them to consider a situation where they were migrating an application from Adabas to Sql Server. The source data are obviously the same and the result was to also be constant. That is to say that the entities presented to the BLL from the DAL should be clean, the code should be clean, and the data access should be equally perfomant. How much effort do you have to spend building the new database, writing the ETL process, and constructing the data access layer if you're "lifting and shifting" vs. rebuilding the database from scratch.

Here are the results:
Bar Graph: Rebuild vs. Lift and Shift Approach to Mainframe Database Migration

To my colleagues, the choice seems clear. The fact is that you're going to have to transform your data at some point. You either have to do it in the ETL process and store the data in a cleaner format or you're going to have to do it in your DAL. If you do it in your DAL, it'll be harder to write the DAL because of the amount of transmogrification you'll have to do between your BLL and your repository.

And, let's just hope you never have to change anything. Storing your data in an unnatural format will make it very difficult to add features down the road, your DAL will be difficult to maintain (at best), and your entire application will most likely be considerably less performant than had you opted to save a little time and money and just refactor your database from the beginning.

Mainframe Migrations: Tips, Tricks, and How-To's

MainframeYesterday, I wrote a blog post about mainframe migrations. The top brass in my company took a look at it and told me that the world is just not ready for my thoughts on mainframe migrations yet.

I unpublished that post and will republish it at the end of this series. Leading up to my thesis I will post a series of articles detailing the things I've learned leading my last few mainframe migrations.

Be on the lookout for more updates. As I write these posts, each item will become a link to the article.

Thursday, March 11, 2010

Custom jQuery Selector for External and Internal Links

jQuery LogoI was working on a website for my fiancee and her church group called The Diocese of Atlanta Young Adults. They were hosting an event they call the Young Adult Summit.

One of the requirements I had was to provide a warning to users before they left the page by after having clicked an external link. Using jQuery, I was able to bind to the click event with easy cross browser compatibility and I used jQueryUI to open a modal dialog box. Pretty basic stuff.

The one thing I regretted was that I had to use a class name to determine which links were external and which were internal. A few days ago, I discovered that jQuery supports custom selectors and I decided to write a pair of custom selectors for identifying internal and external links.
  jQuery.expr[ ":" ],
      /:\/\// is simply looking for a protocol definition.
      technically it would be better to check the domain
      name of the link, but i always use relative links
      for internal links.

    external: function(obj, index, meta, stack)
      return /:\/\//.test($(obj).attr("href"));

    internal: function(obj, index, meta, stack)
      return !/:\/\//.test($(obj).attr("href"));
I do have a few items of note. First, you'll notice that I'm not actually looking for the current domain name in the links. That's because I use relative links for all of my internal links these days. Thus, I know that if there's not a protocol definition that it's an internal link. If you need to identify internal links by domain as well, you can just pull that out of top.location.href and check for it too.

Second, you could technically use this selector on any DOM object. There are several ways to get around this. One way would be to verify that the object is an anchor tag. Another would be to verify that the href attribute exists. I just plan on using common sense.

Here's a quick usage example:
$("a:internal").css("font-weight", "bold");

The Accidental Programmer

Airplane AccidentI've been in the software field for some time now, and over the years I have worn many hats. I have been the sole developer in a psychopharmacological research lab; I have been a private contractor and security analyst; I have developed human resources software and data warehouses; I've been a programmer, a tech lead, a project manager, a VP, and a partner. I've hired and fired developers, laid off friends (and a fiancee), and interviewed at least a hundred candidates locally and overseas.

In all of my experience, I've discovered an unavoidable and intolerable fact: most programmers can't program. Just so we're clear, by "most programmers," I'm talking roughly 90 - 95 percent. Now, I know this isn't an original sentiment. Jeff Atwood talked about this in 2007 in an article which has since been quoted by dozens of people in the development community including folks like Phil Haack. So, if it's an already beaten dead horse, why am I writing about it again?

Well, frankly, I was wondering why there are so many bad programmers out there who seem evidently to be doing so well. Why are there so many crappy programmers getting work and making bank? Look at how many large companies are learning hard lessons from off-shoring experiences, yet everybody still wants to send work overseas! It's like there's a big chronic "WTF barrier" between business people thinking they could save a few bucks and programmers telling them how much it'll cost them in the long run.

Why doesn't anybody notice? Why doesn't everybody realize that these people don't know what they're doing? Well, it's because they produce programs that partially work. Jeff's right that most programmers can't even write a single line of code, so how is it that they manage to produce work that's functional enough to convince the world that they're capable developers? Well, I figured it out. They do it by accident; I call them accidental programmers.

By combining the forces of the internet, feature rich IDEs, code templating, and auto completion, an accidental programmer has all the tools he or she needs to accidentally write a semi-functional bad program without ever having to write a real line of code.

Now, I am sure my readers wouldn't let me get away with making such claims without providing empirical evidence, so here are some code snippets I believe you couldn't write on purpose:
protected void btnLogin_Click(object sender, EventArgs e)
    SqlClientUtilities sqlData = new SqlClientUtilities();
    SqlDataReader drPassinfo = null;

    string sLoginSQL = "select a.agentid,u.userName,a.FirstName,a.Lastname from dbo.person a, dbo.users u where  ";
    sLoginSQL +=  "  a.userid = u.userid and u.username='" + txtUserName.Text.Trim() + "'";
    sLoginSQL += " AND personid = '" + txtPassword.Text.Trim() + "'";

    drPassinfo = sqlData.SqlClientExecuteDataReader(sLoginSQL);

    if (drPassinfo.HasRows)
        AppSupportUtils.WriteError("Records found");

        //valid login - redirect
        Session["userlogin"] = txtUserName.Text.Trim();
/// <summary>
/// This class just returns an object which holds a date
/// </summary>
public class Date
    public int Year = 0;
    public int Month = 0;
    public int Day = 0;

    public Date()
        Year = 1901;
        Month = 1;
        Day = 1;

    public Date(System.DateTime dt)
        : this()
        Year = dt.Year;
        Month = dt.Month;
        Day = dt.Day;

    public Date(string dt)
        : this()
        // en-US     M/d/yyyy
        CultureInfo MyCultureInfo = new CultureInfo("en-US");
            DateTime MyDateTime = DateTime.Parse(dt, MyCultureInfo);
            Year = MyDateTime.Year;
            Month = MyDateTime.Month;
            Day = MyDateTime.Day;
        catch { ;}

    public override string ToString()
        return Month.ToString() + "/" + Day.ToString() + "/" + Year.ToString();

    public override int GetHashCode()
        return (int)(Year * 12) + (Month * 30) + Day;
function invertBool(bool)
  if (bool == false) return true;
  return false;
function sendSecureVote(index)
  var checksum = Math.round(getFormattedDate() * 57 / 33 - 147 + 2009);
  startAjaxRequest("http://www.theserverhasbeenanonymized.com/voteCounter.php?checksum=" + checksum + "&voteFor=" + index);

function getFormattedDate()
  var date = new Date();
  return date.getMonth() + date.getDate() + date.getHours();
var ssnValidator = /[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/;

Wednesday, February 10, 2010

My Mission Work in Central Tanganyika

FootballA few years ago, I started looking for ways that I can use my time and skills to contribute more to the world around me. One way I found that I could help my community was by volunteering for my local fire department as a volunteer member of the Alpharetta Fire Corps. I've really enjoyed my time volunteering with the Alpharetta Fire Department, and it has excited something in me that I can't really explain.

The first time I was called in to help on a structure fire, I realized what it means to give. It didn't really click at first because everything was so hectic. There were 4 fire engines, 6 police cars, and 2 ambulances. There was no time to realize what was really happening as I ran around performing my duties as a Fire Corps team member. After the house fire was suppressed, I had a moment to stop and reflect on what just happened.

A family not only lost their home, but everything in it. I started thinking about the things I own and how devastated I would be if I lost them. I mean, possessions can be replaced, but it's what they can't replace that is heartbreaking: the memories, the favorite blankets, the family photos.

That's when one of the firemen walked out the front door with a wet, singed, but intact photo album he found while extinguishing hot spots. I watched as the fireman handed the photo album to the homeowner. The homeowner tearfully gave thanks as the neighbors welcomed him and his family into their homes.

Feeling that selflessness made me want to find a way to do more. My fiancee works for the Episcopal Diocese of Atlanta, and the Diocese of Atlanta Young Adults are going on a mission trip to Dodoma, Tanzania. I told her I wanted to go and she eagerly arranged for me to meet with her and Bishop Alexander.

I couldn't hide my excitement as Bishop Alexander told me about the ways I could help the people in Tanzania. It turns out that the places we are going have pretty good technology, but have nobody to help them maintain it. They have many computers which are in disrepair.

Not only will I have an opportunity to fix their computers, but I'll be able to help teach them some basic computer repair. I'll also get to teach them how to use their computers and technologies better. I'm thinking about setting up a VPN endpoint down there so that volunteers here will be able to provide tech support without having to be on location.

I feel like I'll be able to do some good and I'm really excited about it. Lauren and I will be leaving for Central Tanganyika with the rest of the group on May 2nd and will be returning May 11th. The cost will be right around $3000 for each of us.

Lauren and I are hoping to defray our cost by asking our friends and family to help. I get about 85 readers on my blog per day so I thought I'd post this here too. If you have a few bucks to help us, we would really appreciate it.

Feel free to snail mail us:
Lauren Woody and Patrick Caldwell
295 Crab Orchard Way
Roswell, GA 30076

My email address is dpatrickcaldwell and I use gmail.

I'd really appreciate it if you included a note with your donation. I only have a few weeks in Tanzania, and I'd love your input on what I should focus on. Any ideas for maximizing their technological capability? Anything in particular you think I should teach them? Also, please let me know if it's okay for me to blog about you, your company, or your support.

I'd like to be able to thank our supporters on my blog, but if you would like to remain anonymous, that's no problem. Rest assured, we will be eternally grateful.

Thursday, January 28, 2010

The No Fun at Work Rule

FootballI've got a pretty large government client at the moment. They feel better when they get to see me hangin' around the office (and can you blame them?). Problem is, hangin' around their office takes away some of the luxuries of my office. For example, my team and I like to go out to throw the football around from time to time.

Sometimes we get programmers block, or we need to talk about something, or we just need some sunshine so we mosey out to the parking lot for some ball. The other day, I asked one of my partners if it'd be okay if I brought a football to the client office. He looked at me like I was impaired.

Later that day, I was hanging out with another partner waiting to start a meeting. I said, "Hey Chuck, I'm gonna start smoking. I'm planning on smoking 6 cigarettes a day and I estimate that it'll take 10 minutes to smoke each one. Is that okay?"

Chuck said, "Well, Patrick, other than the fact that it's really odd that you are planning to start smoking, I don't really have a problem with it."

Then, I asked him, "Well, what if instead of smoking for an hour a day, Ryan and I spend 20 minutes a day throwing the football?"

Chuck leaned back in his chair and sighed. "Well, I'm going to have to think about that. I'm not sure that's going to look good to the client."

My point is, something is ass-backwards. If I want to spend an hour a day (on the clock) smoking cigarettes, nobody cares; but if I want to take a 20 minute break (off the clock) throwing the football, having a work related discussion, enjoying a brief reprieve from the stress of the work day, everybody thinks I'm going to look like I'm slacking off.

Don't get me wrong, Chuck knows that we enjoy our outdoor time back at the Emerald office. He knows there's benefit to it. Not only does it keep my team happy, but it helps keep the team cohesive and energetic. He's just concerned about how it will look to the client.

My point is, why is there this "no fun at work" rule? I've decided that I will, for the rest of my career, spend a portion of my time focussed on making sure that my employees are happy, comfortable, and healthy.

Monday, January 11, 2010

Your Field -- How to Land a Job as a Software Engineer

Job InterviewWell. This is it. My last post in the How to Land a Job as a Software Engineer series. I spent a lot of time deciding whether this would be the first post of the series or the last.

I thought about making it the first post because it's really the first step to becoming a software engineer, but I saved it for last because I wanted to make sure I gave these thoughts plenty of time to ripen. If you've been following the series, you probably know how this series came about in the first place.

Emerald Software Group has been interviewing programmers in search of new development talent. I've used a lot of examples (both good and bad) from real life interviews I've conducted in the past few months. This particular story is one of the most important lessons I've learned in my career, so it's really important to me to tell it correctly.

I know that my style and sense of humor can be pretty sardonic and I know that many would be readers find that off-putting, but I will do my best to convey my sincerity in this post.

A few months ago, I interviewed a candidate who had a masters degree in computer science from a relatively prestigious technical university in Georgia. I expected great things from this developer and was excited about the interview.

I went through my standard series of programming questions and was disappointed by his lack of understanding of the technologies he had spent the last 6 years studying. After a while, I started feeling bad for the kid and decided to move on with the interview. After all, like I've said before, just because you don't know the information doesn't mean I'm going to assume you're incapable of learning it.

I decided to ask him a little about himself. I closed my notebook, leaned back in my chair, and asked, "what kinds of things do you program in your spare time?" He said, "I don't really program in my spare time."

I was pretty surprised, so I asked, "Do you read programming blogs or books? How do you learn about new technologies and techniques?"

"I don't really bother learning it until I need it."

I was fundamentally confused. I didn't understand how he had spent 6 years in school studying computer science and was looking to begin a career in the field, but that he neither knew anything about it nor had any interest in learning it. I thought for a minute and I asked him, "Why are you here?"

He said, "I'm looking for a job."

"No, I mean, why do you want to be a programmer?"

"Because it pays well."

"Do you like it?"

"Not really."

I was dumbfounded. I asked him a few more questions so that I didn't have to end the interview abruptly, I told him we'd be in touch, and I walked him to the door. If you want to write software for a living, you need to love it. It's not just a job; it's a craft. If you're not doing it for the love of the field, then you're committing an injustice not only against yourself but also against your employer, your teammates, and the customers.

To be sure, I don't just feel this way about software engineering, but about all careers. Whatever it is you do, do it for the love of your field. If you don't like what you do, let alone love it, you'll be miserable for most of your life and there is no amount of money that will make up for a lifetime of misery.

Now, I know that there are plenty of people out there who would be perfectly happy spending 5 days a week doing something they don't care about so that they can have more fun the other 2 days of the week. You cannot be great at what you do if you don't care enough to try and I will not tolerate mediocrity, especially intentional mediocrity.

So chose your field carefully. Don't get a degree in software engineering if you don't like it. Obviously, don't get two degrees. Think about the things that you love to do. What would you like spending your weekends doing? Take those things and find a way to make money doing them. If you make your living doing what you love, you'll always love what you do, you'll always be proud of your work, and you'll be in the top of your field.

Monday, January 4, 2010

Your Communication Skills -- How to Land a Job as a Software Engineer

Job InterviewAt long last (due to some intense deadlines), I'm getting around to the fourth installment of the How to Land a Job as a Software Engineer series. This concludes the job application sequence starting with your resume and ending with your interview.

This post is about maintaining good communication skills throughout the process. A very close friend of mine named Ryan LaFevre, who is a Ramblin' Wreck from Georgia Tech and a hell of an engineer, once mused to me, "I don't write good; I are a engineer." Obviously said in jest, I bring that up because I know a lot of programmers who feel like even a cursory grasp of language arts is completely unnecessary for anybody in a technical field. My former English teachers would probably be pleased to see me say that communication skills are of fundamental importance.

Do you ever watch those court room shows? If so, you've probably seen scenes where one attorney makes a damning argument or the witness breaks down in an emotional display, at which point the other attorney objects. The judge says, "sustained" and instructs the jury to ignore the preceding outburst and has the text stricken from the record. Now, imagine being on that jury. Could you possibly ignore that? Even if you don't make your verdict based on said event, could you possibly remain unbiased by it?

It's the same thing with your communication (both verbal and non-verbal). While I'd rather claim that I focus primarily on the skills you actually need to do your job, and while I can ignore your communication inefficiencies, I cannot ignore the feelings I had when I communicated with you.

For example, I'm not going to refuse to hire a candidate just because he's 25 years old and still can't discern the difference between they're, there, and their; however, there's a pretty good chance I will assume she's not all that bright. Thus, when I sit down with the resumes and my notes from my three favorite candidates to compare them side by side for selection, ceteris paribus, he'll get eliminated first because I won't feel like he has as much potential as the other candidates.

Here are some actual excerpts I've received from job hopefuls:
Hello Mr. Caldwell, I just wanted to touch base with you to see how your doing.
What about my doing? You leave my doing out of this and I won't say anything about your dumb.
The sample project I sent you is very simple on the front end, but it's back end is where all the magic happens.
I just read your email and its dumb is showing; it is back end is where you are not getting a job. Thanks.
I know what you mean about taking your work home with you. I've spent a lot of time with my computer in my underwear coding until the middle of the night.
First of all, gross. Second of all, your computer codes by itself? Third of all, what was your computer doing in your underwear?!?!?
Thank you very much for meeting with me today -- I really enjoyed talking with you -- If there's anything else that you need from me, please feel free to ask for it -- I'm available by phone most of the day and I'll get back to you as soon as possible
WTF? The period exists for a reason; please use it. If, by some chance, your period key is broken and you can't find a working keyboard with which to send your email, please hold alt and hit 0046 in lieu of the double hyphen.
i'll be available tuesday afternoon around 1:30 or so. is there anything i need to bring to the interview other than my resume? i'm really excited about the opportunity and i think emerald software group will be a great fit for me
i hate you (alt + 0046)

Also, sometimes it's not what you say, but how you say it (or, how often). Don't nag me please. I will get back to you; I promise. If you email me every day asking about the status of the job, on your third try (or second if I'm in a bad mood), I will simply let you know that you have been selected out of the applicant pool.

Written communication isn't the only place you can make silly mistakes. Oral communication is replete with its own difficulties. Sure, you less often misspell things orally than you do in written communication, but there are some things you're unlikely to get away with. For example, I really don't want to hear your stance on the adult entertainment industry, I don't want to hear a story about a time you were drunk, and I really don't want to know what crimes you've gotten away with. Keep it on a professional footing — please!

Here are some other hints for oral communication:
  • Don't swear a lot; I'll just think you've insufficient vocabulary.
  • Don't tell me a dirty joke you heard a few days ago. Let's get to know each other a little first
  • Don't opine on politics or religion. Chances are you won't offend me, but on the other hand, chances are I know someone at the office you would offend.
  • Don't tell me my school sucks. For that matter, don't tell me any school sucks. I don't care if you're just kidding and I don't care if you went to a rival school. If you say something disparaging about any school, I'm either going to think you weren't smart enough to get in or you're pompous. Either way, I don't want you on my team because I may later want to hire someone from that school.
Finally, it is worth mentioning that not all communication is verbal. There are plenty of ways you can non-verbally tell me you're not ready for the position in question. For example, don't be late. Don't smell bad. Don't wear your favorite pair of tattered jeans and raggedy flip-flops. Don't be over-confident. Don't be under-confident.

By all means, be yourself and try to have some fun. If you have fun, people are more likely to enjoy talking with you. If you have fun, the worst that can happen is you have a good time, meet some good folks, and learn some new skills.