D. Patrick Caldwell on Software Engineering: 2008

Wednesday, December 17, 2008

Using the Proxy Pattern to Write to Multiple TextWriters

I was working on a data synchronizing application the other day. I needed to write to a file for the export, write to a string builder for logging and analysis, and write to the console for debugging. I know that it's pretty common that I'll need to write to more than 1 text stream at the same time, so I figured I could write a quick proxy application to write to a collection of TextWriters.Please comment on this post and let me know how this TextWriterProxy article helped you.Here's what I came up with:

    1 using System.Collections.Generic;

    2 using System.Text;

    3 using System.IO;

    5 namespace ESG.Utilities

    6 {

    7     public class TextWriterProxy : TextWriter

    8     {

    9         // store TextWriters here

   10         private List<TextWriter> _writers = new List<TextWriter>();

   12         #region Properties

   14         /// <summary>

   15         /// This property returns Encoding.Default.  The TextWriters in the

   16         /// TextWriterProxy collection can have any encoding.  However, this

   17         /// property is required.

   18         /// </summary>

   19         public override Encoding Encoding { get { return Encoding.Default; } }

   21         /// <summary>

   22         /// Gets or sets the line terminator string used by the TextWriters in

   23         /// the TextWriterProxy collection.

   24         /// </summary>

   25         public override string NewLine

   26         {

   27             get

   28             {

   29                 return base.NewLine;

   30             }

   32             set

   33             {

   34                 foreach (TextWriter tw in _writers)

   35                     tw.NewLine = value;

   37                 base.NewLine = value;

   38             }

   39         }

   41         #endregion

   43         #region Methods

   45         /// <summary>

   46         /// Add a new TextWriter to the TextWriterProxy collection.  Setting properties

   47         /// or calling methods on the TextWriterProxy will perform the same action on

   48         /// each TextWriter in the collection.

   49         /// </summary>

   50         /// <param name="writer">The TextWriter to add to the collection</param>

   51         public void Add(TextWriter writer)

   52         {

   53             // don't add a TextWriter that's already in the collection

   54             if (!_writers.Contains(writer))

   55                 _writers.Add(writer);

   56         }

   58         /// <summary>

   59         /// Remove a TextWriter from the TextWriterProxy collection.

   60         /// </summary>

   61         /// <param name="writer">The TextWriter to remove from the collection</param>

   62         /// <returns>True if the TextWriter was found and removed; False if not.</returns>

   63         public bool Remove(TextWriter writer)

   64         {

   65             return _writers.Remove(writer);

   66         }

   69         // this is the only Write method that needs to be overridden

   70         // because all of the Write methods in a TextWriter ultimately

   71         // end up calling Write(char)

   73         /// <summary>

   74         /// Write a character to the text stream of each TextWriter in the

   75         /// TextWriterProxy collection.

   76         /// </summary>

   77         /// <param name="value">The char to write</param>

   78         public override void Write(char value)

   79         {

   80             foreach (TextWriter tw in _writers)

   81                 tw.Write(value);

   83             base.Write(value);

   84         }

   86         /// <summary>

   87         /// Closes the TextWriters in the TextWriterProxy as well as the

   88         /// TextWriterProxy instance and releases any system resources

   89         /// associated with them.

   90         /// </summary>

   91         public override void Close()

   92         {

   93             foreach (TextWriter tw in _writers)

   94                 tw.Close();

   96             base.Close();

   97         }

   99         /// <summary>

  100         /// Releases all resources used by the TextWriterProxy and by the

  101         /// TextWriters in the TextWriterProxy collection.

  102         /// </summary>

  103         /// <param name="disposing">Pertains only to the TextWriterProxy instance:

  104         /// true to release both managed and unmanaged resources; false to release

  105         /// only unmanaged resources.</param>

  106         protected override void Dispose(bool disposing)

  107         {

  108             foreach (TextWriter tw in _writers)

  109                 tw.Dispose();

  111             base.Dispose(disposing);

  112         }

  114         /// <summary>

  115         /// Clears all buffers for each TextWriter in the TextWriterProxy

  116         /// collection and causes all buffered data to be written

  117         /// to the underlying device.

  118         /// </summary>

  119         public override void Flush()

  120         {

  121             foreach (TextWriter tw in _writers)

  122                 tw.Flush();

  124             base.Flush();

  125         }

  127         #endregion

  128     }

  129 }

So far, it works great. It cleans up a lot of my code and gives me the option to write to any number of TextWriters with only one call. Further, if you are calling a method that takes a TextWriter as a parameter, you can pass the TextWriterProxy to it because it extends the TextWriter class. Here's what the usage syntax looks like:

    1 // create a TextWriterProxy instance

    2 TextWriterProxy proxy = new TextWriterProxy();

    4 // add the Console.Out TextWriter

    5 proxy.Add(Console.Out);

    7 // you can still write directly to console

    8 Console.WriteLine(string.Empty.PadRight(80, '='));

   10 // add a StreamWriter for a FileStream

   11 FileStream fs = new FileStream("C:\\TestExportFileAutoGen.abx", FileMode.Create);

   12 StreamWriter resultWriter = new StreamWriter(fs);

   13 proxy.Add(resultWriter);

   15 // add a StringWriter for a StringBuilder

   16 StringBuilder sb = new StringBuilder();

   17 StringWriter resultStringWriter = new StringWriter(sb);

   18 proxy.Add(resultStringWriter);

   20 // call a method that takes a TextWriter

   21 ClientSync.GenerateSessionDataExport("Sync.ServerExport", proxy);

   23 // write directly to the TextWriterProxy

   24 proxy.WriteLine("Export Complete!");

   26 // close all of my writers

   27 proxy.Close();

And there you have it. A TextWriterProxy class to write to multiple TextWriters at once.

Thursday, December 4, 2008

An Online Image Thumbnailer

Hey folks,

I've published a small online image thumbnail utility. If it proves useful (without bogging down our servers), I'll leave it up for public consumption. If you're interested in reading a little more about it, you can find the story below. If you'd just like to see the utility, visit D. Patrick Caldwell's Image Thumbnailer.

So, there I was, trying to get a background off of my Picasa Web Albums to put on my iPhone. In the iPhone Safari browser, you can save images by holding your finger on the image until a save dialog box comes up. Problem is, Picasa has somehow (somewhy) disabled it. So, I figured a bookmarklet would help me out. I could then link directly to the image and all of the Picasa scripts would be gone.

I found a few bookmarklets that displayed all of the images in the page, but they all open in the current window and I wanted a new window (and some nice formatting wouldn't hurt either). So, I looked for an online image thumbnailer and couldn't find one. About 30 minutes later, I had a thumbnailer. An hour after that, I had my bookmarklets (and a new background incidentally). I added a little error handling and some logging and had a fully functional service in about 2 hours.

Then I spent about 8 hours styling the welcome page :).

In any event, here's what my Thumbnailer does. I can take any image anywhere on the net (well, most anywhere . . . I have to have access to the image), and produce a thumbnail with one or two constraints: maximum width and maximum height. Here's an example of the same image scaled to 4 different heights:

The same thing works for widths:

Finally, you can specify both width and height and it'll use the most restrictive parameter.

Thursday, October 23, 2008

Health Care, Elections, and Media -- Oh My!

I was trying to learn more about the McCain health care plan today and I ran across this article. Now, I'm not a professional writer like Paul Krugman by any stretch of the imagination, but I did stay in a Holiday Inn last night, and I have had several articles published in peer-reviewed professional and academic publications. As such, most of the writing I have done requires a certain factual basis even to be considered for publication. Data must be presented within certain statistical confidence or your manuscript will be tabled and you'll be sent back to the drawing board. That's why I am so offended by the type of writing Paul Krugman managed to publish in the New York Times on April 6th of 2008.

The whole thing starts out like this:

Elizabeth Edwards has cancer. John McCain has had cancer in the past. Last weekend, Mrs. Edwards bluntly pointed out that neither of them would be able to get insurance under Mr. McCain's health care plan.

Now, while Elizabeth Edwards is a very smart lady, I've yet to find any evidence of her expertise in the fields of health care and insurance. So, my interest piqued by these strong claims, I read on looking for the evidence that Mrs. Edwards was correct.

It's about time someone said that and, more generally, made the case that Mr. McCain's approach to health care is based on voodoo economics -- not the supply-side voodoo that claims that cutting taxes increases revenues (though Mr. McCain says that, too), but the equally foolish claim, refuted by all available evidence, that the magic of the marketplace can produce cheap health care for everyone.

Well, beyond the fact that Mr. McCain is actually Senator McCain and beyond the obvious argumentum ad hominem, I still see no evidence to support Mrs. Edwards's claim yet. Furthermore, at this point, I am also looking forward to any "available evidence" that a free market won't produce cheaper products for the largest number of people. You probably noticed that I changed the quote a little (at least you should have; I did mark my changes in bold). First, I'm not looking for "all available evidence," because I'm certain Paul didn't really mean "all;" that would be absurd. I'm also looking for positive results for the greatest number, because looking for results for "everyone" would also be absurd. Finally, I replaced "healh care" with "products" because the economic principles which have undergone extensive academic investigation for decades apply to most products including insurance and health care.

As Mrs. Edwards pointed out, the McCain health plan would do nothing to prevent insurance companies from denying coverage to those, like her and Mr. McCain, who have pre-existing medical conditions.

The McCain Campaign believes that "no American should be denied access to quality and affordable coverage simply because of a pre-existing condition," and they have a plan to address it:

As President, John McCain will work with governors to develop a best practice model that states can follow – a Guaranteed Access Plan or GAP – that would reflect the best experience of the states to ensure these patients have access to health coverage. There would be reasonable limits on premiums, and assistance would be available for Americans below a certain income level.

At this point, evidently Paul turned to the McCain campaign for answers.

The McCain campaign's response was condescending and dismissive -- a statement that Mrs. Edwards doesn't understand the comprehensive nature of the senator's approach, which would harness "the power of competition to produce greater coverage for Americans," reducing costs so that even people with pre-existing conditions could afford care.

Mrs. Edwards almost certainly doesn't understand the comprehensive nature of the senator's approach. That's not condescending; it's a reasonable assumption. Further, if Mrs. Edwards did understand the comprehensive nature of the senator's approach, she probably wouldn't be claiming that she would be unable to get insurance. That's not to mention the fact that it's almost wholly unreasonable to suggest that Senator McCain would enact a policy that precludes himself from getting insurance as well.

It also doesn't strike me as particularly dismissive. Paul did, in fact, manage to get a valid and reasonable quote. I suppose, perhaps, the McCain campaign who responded to Paul's questions may have recommended that Paul visit the campaign website (http://www.johnmccain.com) for answers to Paul's specific questions about pre-existing conditions, but I hardly see that as dismissive; all of the information is there and is freely available.

This, however, is condescending and dismissive; well done Paul:

This is nonsense on multiple levels.

And it goes on . . .

For one thing, even if you buy the premise that competition would reduce health care costs, the idea that it could cut costs enough to make insurance affordable for Americans with a history of cancer or other major diseases is sheer fantasy.

Again, Paul, begging the question. Your premises are as follows:

Elizabeth Edwards and John McCain could not get health care coverage due to pre-existing conditions

Other people with pre-existing conditions could not get health care coverage

If people with pre-existing conditions can get health coverage, it would be cost prohibitive

You are trying to argue that Senator McCain's health care plan won't work because these premises are true, but your only argument to support these premises is that Senator McCain's health care plan won't work. It's almost textbook petitio principii.

Beyond that, there's no reason to believe in these alleged cost reductions. Insurance companies do try to hold down "medical losses" -- the industry's term for what happens when an insurer actually ends up having to honor its promises by paying a client's medical bills. But they don't do this by promoting cost-effective medical care.

Paul, don't attack the term "medical losses" please. They have to have some kind of succinct description to put on their income statements. They also call them "medical costs" sometimes. In either case, it does a decent job of differentiating them from, say, operating costs. On top of that, don't imply that insurance companies try not to honor their commitments as though the entire industry is designed to cheat people and line pockets. Think of the countless lives which have been saved and improved with thanks due to the excellent medical care paid for by insurance companies. Sure, insurance companies can be troubling to deal with, but that's because they also feel like it's unreasonable to pay 200 dollars for 2 Tylenol at the local hospital. They're fighting to keep costs down too, and rest assured that medical providers do cut prices to accommodate insurance companies. I may be wrong, but it sounds a lot like promoting cost-effective medical care.

So, Paul, how are they trying to keep costs down?

Instead, they hold down costs by only covering healthy people, screening out those who need coverage the most -- which was exactly the point Mrs. Edwards was making. They also deny as many claims as possible, forcing doctors and hospitals to spend large sums fighting to get paid.

I know a lot of unhealthy people who have insurance Paul. I don't consider unhealthy people to be healthy. Therefore, I am forced to reject your hypothesis that insurance companies only cover healthy people. As such, I must also reject your hypothesis that this is the way insurance companies cut costs. Further, I object to your claim that insurance companies deny as many claims as possible citing a lack of industry expertise. I would, however, be interested in reviewing your data regarding the large sums spent collecting debt. I wonder if these large sums would be smaller sums if medical care prices were more reasonable. I don't suppose you found anything about that in your research.

And the international evidence on health care costs is overwhelming: the United States has the most privatized system, with the most market competition -- and it also has by far the highest health care costs in the world.

Out of curiosity Paul, in your research, did you come across any data as to the quality of health care around the world? Waiting periods for appointments? Effectiveness of treatments? Life expectancy? Et cetera? Our health care costs are high and we do get what we pay for.

Yet the McCain health plan -- actually a set of bullet points on the campaign's Web site -- is entirely based on blind faith that competition among private insurers will solve all problems.

Paul, it is very interesting (and telling) that you believe that the McCain health plan is a set of bullet points on the campaign web site? Most people would look at such a thing and presume that the bullet points on the campaign web site are actually a succinct summary of the health plan rather than the plan itself. It's also interesting that you believe that the entire McCain staff hasn't considered any research in the design of their health care plan. There are a lot of politicians I don't care much for and there are even more with whom I have fundamental political and social disagreements, but there are very few whom I believe are stupid. Not one of the candidates in this election would dare decide such a critical issue on "blind faith" and it is appalling to think that there are people out there who believe they would, let alone who would publish this belief.

You say potato, I say potahto. You say tomato, I say tomahto. You say "blind faith," I say "decades of economic research." Let's call the whole thing off!

I'd like to single out one of these bullet points in particular -- the first substantive proposal Mr. McCain offers (the preceding entries are nothing but feel-good boilerplate).

That's Senator McCain Paul. You may disagree with him, but he is still a United States Senator and a patriot. Try to be at least a little respectful.

As I've mentioned in past columns, the Veterans Health Administration is one of the few clear American success stories in the struggle to contain health care costs. Since it was reformed during the Clinton years, the V.A. has used the fact that it's an integrated system -- a system that takes long-term responsibility for its clients' health -- to deliver an impressive combination of high-quality care and low costs. It has also taken the lead in the use of information technology, which has both saved money and reduced medical errors.

This is absolutely true. The VA Hospital system has made great strides to catch up to the rest of the health care industry. The VA Hospital is on the cutting edge of technology for a government institution. In fact, the VA is almost as good as privatized health care organizations. Thanks are due not only to the Clinton administration, but also to the Bush administration for "nearly doubling the VA budget, expanding community grants for homeless veterans, signing concurrent receipt legislation and investing millions of dollars in traumatic brain injury and psychological disorder research" according to the VFW newsletter.

Sure enough, Mr. McCain wants to privatize and, in effect, dismantle the V.A. Naturally, this destructive agenda comes wrapped in the flag: "America's veterans have fought for our freedom," says the McCain Web site. "We should give them freedom to choose to carry their V.A. dollars to a provider that gives them the timely care at high quality and in the best location."

Senator McCain has voted throughout his career to ensure that Veteran's Affairs health care programs get funding. Did we look at the same web site? The one I looked at had a pretty long list of things Senator McCain would like to do to support the VA. While I agree that the "freedom fighters should be free to chose" thing is a bit cheesy, I don't see what's wrong with telling our veterans, "hey, go to the VA if you'd like . . . especially for really big and important things, but feel free to go anywhere you wanna go and we'll still pay for it." Not only does Senator McCain intend to leave the VA intact, he would also like to extend VA privileges to retired veterans who aren't even eligible for VA health care. I rest assured that the VA will still be "mantled" when Senator McCain is president, though I encourage you to find out for yourself.

That's a recipe for having healthy veterans drop out of the system, undermining its integrated nature and draining away resources.

It's going to cost money if healthy veterans don't go to the doctor when a doctor isn't needed? I don't follow.

Mr. McCain, then, is offering a completely wrongheaded approach to health care. But the way the campaign for the Democratic nomination has unfolded raises questions about how effective his eventual opponent will be in making that point.

It's Senator . . . ah, forget it. You probably don't use official titles for anybody I bet. I guess I'll quit harping on you for that one Paul.

Indeed, while Mrs. Edwards focused her criticism on Mr. McCain, she also made it clear that she prefers Hillary Clinton's approach -- "Sen. Clinton's plan is a great plan" -- to Barack Obama's. The Clinton plan closely resembles the plan for universal coverage that John Edwards laid out more than a year ago. By contrast, Mr. Obama offers a watered-down plan that falls short of universality, and it would have higher costs per person covered.

I know I promised I'd stop, but it's Senator Obama. Let's try it together: Senator Clinton, Senator McCain, Senator Obama.

Worse yet, Mr. Obama attacked his Democratic rivals' health plans using conservative talking points about choice and the evil of having the government tell you what to do. That's going to make it hard -- if he is the nominee -- to refute Mr. McCain when he makes similar arguments on behalf of such things as privatizing veterans' care.

Yeah, and how absurd to believe that most people are clever enough to make their own choices. I'd have finished this post hours ago had I not spent so many hours laughing aloud at the thought of allowing people to have choices. Next thing you know, they'll want to invest their own money or, ha, could you imagine . . . vote on who should be president! Oh, oh, here's a good one . . . what if I chose to never read the New York Times again because they publish drivel? Choice. Pftt. Could never happen.

Still, health care ought to be a major issue in this campaign. I wonder if we'll have time to discuss it after we deal with more important subjects, like bowling and basketball.

I'll admit . . . I don't get the reference, but I sincerely hope it is a reference to something meaningful and controversial because it sounds trite. I've spent an entire blog post admonishing your lack of factual foundation in your writing so perhaps, Paul, I'll give you the benefit of the doubt on this one. The last sentence here isn't a mere potshot because someone was talking about bowling and basketball when you felt like they weren't paying quite enough attention to you.

Friday, September 12, 2008

Statistical Method for Estimating Software Projects

As the Vice President for Research and Development at Emerald Software Group, a large part of my job comprises managing software projects from conception to completion. As a programmer in a management position, I've discovered a few things. First, I am very good at estimating how many human hours of programming time it will take to write an application. Second, I am not very good at estimating how many man hours it will take to complete the entire project life cycle. So, with a background in psychology, experience with statistics for social sciences, and a knack for inquisitive observation, I set out to generate a formula for providing accurate and timely estimates by isolating the variable that I am best at estimating . . . development hours.

I started by observing my work with one of our larger clients because, after all, they were impetus for my little side-project anyhow. I took notes on all things client related. How much time did we spend talking to the client? How many people were in conference calls? How much time did it usually take to deploy in the client's environment? How much time did it take to test an application before deployment? I gathered as much data as I could and once I felt like I had a reasonable sample, I looked at the trends in the data. I noticed that four separate components stood out: coding time, testing time, deployment time, conference calls. I've renamed these factors, which I call the 4 Ds, and they are the basis for my formula: develop, debug, deploy, and discuss.

So, my records showed that for this particular client, if I spent 10 hours coding, I would have to spend 15 hours testing. If I spent 5 hours coding, I would spend 7.5 hours testing. I generally do most of my testing as I go along, so it was a little difficult to arrive at these numbers, but I found that I reliably spend 1.5 times as much time testing as I do programming. Thus, for every n hours I spend coding, I will spend 1.5n hours testing. I also noted that we spent an average of 1 hour deploying our applications to this client regardless of the amount of time spent programming so I add 1 hour for deployment regardless of the project size.

Finally, conference calls. Conference calls were kind of difficult to factor out of the raw data. It turns out that for this client, we always have at least 1 conference call and the average conference call lasts about 45 minutes. I also noted that in addition to the one conference call we have for every project, there's also one conference call for every six hours of development. Thus, to figure out how much time we spent discussing a project, I calculated (⌈n/6⌉ + 1) and multiplied that by 45 minutes or 3/4 hour. The problem was that this only lined up for projects with low complexity.

I reviewed the data again to find the source for the discrepancy. The time spent in conference calls after factoring out development time had a particularly high standard deviation. I decided there must be another factor. I thought about the many meetings we've had discussing projects for this client and I realized what was causing the within measure variability; the more hours we spent developing, the more people had to be involved with the conference call. I took that factor out and identified an average of 1.5 people per call and the estimates became a little more accurate.

The problem was that there was still some variability, not only in conference calls, but throughout the entire formula as well. To keep a long explanation from getting longer, I found that the overall variability wasn't a factor within the formula itself per se, rather a complexity factor that generally increased all of the estimates throughout the formula. Thus, my entire formula needed to factor in complexity anywhere the formula referenced development time.

My finished formula was:
h ≈ develop + debug + deploy + discuss
h ≈ cn + 1.5cn + 1 + 3 (1.5 * (⌈cn / 6⌉ + 1)) / 4

After simplifying:
h ≈ 2.6875cn + 2.125

Now, any time I get a new project for this client, I estimate the number of hours and I guess the unfortunately subjective complexity. I apply the above formula and the estimates are accurate within a 95% confidence interval when compared with actual times.

Wednesday, September 10, 2008

Providing both Authentication and Anonymity

I was reading a little tgdaily today and I found an article about a new iPhone app that may be showing up in the app store in the near future. The new application is called Trapster and it's a "social-networking speed trap warning website."

I know what you're asking. Well, I don't actually know what you're asking, but what you should be asking is, "Why are you blogging about this Patrick? You spend most of your time writing Human Resources Software for paperless onboarding and business process automation." Well, you're right, but I'm always fascinated by new ideas, new technology, and of course, social networking. I watched the tgdaily video, I read the article, and it got me to thinking . . . and blogging.

Trapster allows users to track their current location, to see where speed traps and cameras are located, and to support the community of "moving violationally" challenged people in their local area by reporting these pesky traffic control devices. It's a neat and clever idea. I started developing a similar app in windows mobile a few years ago, but abandoned the project, mostly because at the time there just weren't that many mobile devices with built in GPS receivers. One feature I considered for ensuring the validity of the speed trap data was to collect statistics on the frequency with which users reported a speed trap in the same location and the estimated duration that the speed trap was in place, but this would obviously require a large user base.

Pete Tenereillo, the maker of Trapster, addressed the issue by allowing users to rate the validity of reports and by using these ratings to calculate a historical "trustworthyness" of any particular user and his or her reports. In my application, I didn't associate reports with users so there was no way for me to calculate a trustworthyness factor on a per user basis rather than on a time-based historical basis. I opted to use time and frequency instead of user ratings because I was concerned about the privacy and security of my users. While unlikely, it is concievabe that an irritated government that finds itself losing revenue from ticketing fees and which sees an increase in brazen drivers may decide that they would like to outlaw the reporting of traffic devices. Not only that, but they could also decide to address the interferance of officers' duties by requesting access to Trapster's user data.

At first, you would think that your user information and driving history would be unavailable to an interested third party, but in the case of the government, you'd quickly find that this is a difficult battle to win. The law calls these data "regularly kept records" and they are subject to subpoena and seisure. Even search engine giant Google has suffered with this. Google provided 12TB of YouTube user data to Viacom in one case and another dataset to the Brazillian government in another case.

The problem is, I really like Tenereillo's brilliant idea of having the community rate contributor data. For one thing, people interested false positives by posting fake reports will quickly have discounted authority in the system. Furthermore, people who consistently try to create false negatives by disagreeing with other raters will also have reduced consideration by the system (At least, that's how I presume it will work 'cause that's what I would do). So, how do you calculate inter-observer reliability if you don't keep user data around? Furthermore, if you do keep user records, how can you keep track of user submissions without being able to relate users to their submissions? So, my question is this: is it possible to provide both authentication and anonymity with the same system?

I've put some thought into the problem and I'm off to a start with a potential pattern to solve it. The problem is demonstrated in the scenario below:

User registers with username and password

System assigns user id

System hashes and stores password

User logs in

User reports speed trap

System records trap report with user id

Most systems will work with a design that approximates this one. In this configuration, the system is aware of the relationship between reports and users, the system can provide historical report data, and a third party could subpoena historical user reports. An alternative would be, obviously, to save speed trap reports without a user id. As discussed before, the system would thus be unaware of the relationship between users and reports and a third party couldn't subpoena these data, but there would be no way to relate reports to eachother. There is, however, a third alternative. Imagine the above scenario modified as follows:

User registers with username and password

System assigns user id

System hashes and stores password

System hashes password + username and stores it with hash id

User logs in

User reports speed trap

System records trap report with hash id

With this pattern, you can provide historical report data even though the system is is unaware of the relationship between users and reports because the reports can be associated with other reports from the same user. In fact, the user can even view his or her own history. When the user logs in, he or she enters the password which is then hashed with the username and is stored in short-term memory rather than in the application database. Passing this hash to procedures in the database will allow the user to retrieve historical post data and will allow the system to calculate "user trustworthyness" statistics even though it cannot associate specific users with their posts.

For the sake of deeper explanation, here's how the system provides both authentication and anonymity. Hashing the password and storing it in the user table allows you to securely keep authentication information because a hash cannot be reversed, thus providing the authentication funciton. The other hash of the username concatenated with the password provides a unique identifier for the user that also cannot be reversed. This way, once the application has authenticated and validated the user, it can then use the second hash for retrieving and posting data. Hashing data which are not stored in the database (i.e., the password) means that the database alone cannot be used to associate the users with historical report data, thus providing anonymity.

One concern I haven't yet addressed (though it is truly unlikely to be an issue) is that by hashing the same password twice, you make it slightly easier to bruteforce the password.

Monday, August 11, 2008

A Table Valued Function to Split Strings


-- the function
CREATE FUNCTION SplitString
(
      @TargetString NVARCHAR(MAX),
      @Delimeter NVARCHAR(MAX)
)
 
-- the part repository
RETURNS @Parts TABLE
(
      PartId INT IDENTITY(1, 1),
      Part VARCHAR(MAX)
)
AS BEGIN
 
      -- just some variables to keep track of things
      DECLARE 
            @CurrentIndex INT,
            @DelimeterIndex INT,
            @PartLength INT;
 
      -- initialize the loop
      SELECT
            @CurrentIndex = 0,
            @DelimeterIndex =CHARINDEX(@Delimeter, @TargetString, 0),
            @PartLength = @DelimeterIndex - @CurrentIndex;
 
      -- if the delimeter exists, continue the loop
      WHILE (@DelimeterIndex > 0) BEGIN
 
            -- add the part to the part repository
            INSERT INTO @Parts VALUES (SUBSTRING(@TargetString, @CurrentIndex, @PartLength));
 
            -- update the indexing information
            SELECT
                  @CurrentIndex = @CurrentIndex + @PartLength + LEN(@Delimeter),
                  @DelimeterIndex = CHARINDEX(@Delimeter, @TargetString, @CurrentIndex),
                  @PartLength = @DelimeterIndex - @CurrentIndex;
 
      END
 
      -- add whatever comes after the last delimeter
      INSERT INTO @Parts VALUES (SUBSTRING(@TargetString, @CurrentIndex, LEN(@TargetString) - @CurrentIndex + 1))
 
      RETURN
END

Saturday, August 9, 2008

HR-XML Integration in Human Resources Software

Most of the programming and architecture work I do is in the human resources space. I write software for paperless onboarding and acculturation, personell change management, background checks and verification services, new employee requisitioning, and the like. As you can imagine, I spend a great deal of time and effort integrating with human resources software systems. I work with applicant tracking systems, payroll systems, human resources management systems, other onboarding systems, background check providers, and a number of other applications and services that all have to be able to talk to each other.

Some of the software, services, and systems I integrate with aren’t too bad. Some of them, however, are obscene. I work with one HRMS and payroll system, for example, that has a database with more than 1300 tables comprising an absurd 34,000 columns. Most of the tables have 8 to 10 character names; 3 of these characters are consumed by a 3 letter prefix which is carried over into the column names in the table. The tables may or may not have an identity field which is buried somewhere in the table structure, but you don’t need to know what the identity column is because it’s not used as a primary key. Instead, the primary key is derived from a time based calculation with a 1 second granularity. Furthermore, there are thousands of columns on most of the tables that start with UDF and end with a two digit number. In case you haven’t seen this abhorrent pattern before, UDF is short for “User Defined Field,” and it’s pretty darned tacky.

If you’re not yet convinced that this is indeed the ultimate in terrible database design, just go dig up E. F. Codd and you’ll most likely find him spinning in his grave. But, I’m not really writing this entry about bad databases; I mostly wanted to describe why it is so hard for multiple vendors to work together in the same environment. Well, to our rescue comes the HR-XML consortium, “ a non-profit organization dedicated to the development and promotion of a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges.”

So, what does that really mean? Well, if your imports and your exports follow the HR-XML schema, you know you can always talk between systems. Our suite of human resources software is fully capable of communicating via HR-XML data. For us internally, it means that any piece of our system can be added or removed based on client needs. It also means that we fit in any stage of an HR process between applicant tracking and termination of an employee.

We can easily integrate with HRMS and Payroll systems, background check vendors, and even the Department of Homeland Security eVerify system. The only real problem is that HR-XML is not well accepted. Most other vendors don’t provide HR-XML integration support. In fact, some companies even lie about HR-XML integration to trick potential customers into believing they are getting an easily extensible system. I even know of one company that not only lies about its HR-XML integration, but also sits on the board of the HR-XML consortium. To me, this seems like the ultimate software insult.

So, if you work in HR or in HR software development, what can you do? Well, tell your vendors you are interested in seeing HR-XML support. Interact with the HR-XML consortium. Make your solutions HR-XML compatible. There are numerous ways to get involved and I hope this article whets your HR-XML appetite. I plan on posting a series of HR-XML articles on my blog so keep checking back. In the meantime, I’d love to hear about your integration nightmares. Please post a comment on this post if you have experienced similar issues.

© 2008, D. Patrick Caldwell, Vice President for Research and Development, Emerald Software Group, LLC

[digg=http://digg.com/programming/HR_XML_Integration_in_Human_Resources_Software]

Wednesday, July 30, 2008

Recursively Searching for Classes of Specified Type from an Assembly or Type

Sometimes I find myself implementing a plugin architecture and I need to find a list of classes in an assembly that qualify as plugins for a given project. I wrote the following class to help me do this:

public static class TypeFinder
{
    public static List<Type> GetTypesFromAssembly(Assembly assembly, params Type[] assignableTypes)
    {
        List<Type> types = new List<Type>();
 
        if (assembly != null)
            foreach (Type type in assembly.GetTypes())
                foreach (Type baseType in GetBaseTypesFromType(type, assignableTypes))
                    if (!types.Contains(baseType)) types.Add(baseType);
 
        return types;
    }
 
    public static List<Type> GetBaseTypesFromType(Type type, params Type[] assignableTypes)
    {
        List<Type> types = new List<Type>();
 
        foreach (Type assignableType in assignableTypes)
        {
            if (assignableType != type && assignableType.IsAssignableFrom(type))
            {
                types.Add(type);
                break;
            }
        }
 
        if (type.BaseType != null)
           types.AddRange(GetBaseTypesFromType(type.BaseType, assignableTypes));
 
        return types;
    }
}

Here's what the call looks like:

List<Type> plugins = TypeFinder.GetTypesFromAssembly(assembly, typeof(IMyPlugin), typeof(MyAbstractClass), typeof(MyBaseClass);

Monday, July 28, 2008

Writing Anonymous Methods with Lambda Expressions

I like using Lambda expressions for anonymous methods. I don't really have a good reason for it, but I like it, so here's the difference. Let's say you have a class exposing the following delegate:

public delegate bool CheckExpirationDelegate();
public CheckExpirationDelegate CheckExpiration;

The normal anonymous delegate would look like this:
di.CheckExpiration += delegate() { return true; };

But, you can also do it like this with lambda expressions:
di.CheckExpiration += () => true;

Here's another anonymous method, but you can use this one without a delegate:

Func<string, char, string> GreetTheWorld =
    (greeting, punctuation) =>
        string.Format("{0} World{1}", greeting, punctuation);
    Console.WriteLine(GreetTheWorld("Hello", "!"));

There are many other great uses for lambda expressions . . . fodder for future posts.

© 2008, D. Patrick Caldwell, Vice President for Research and Development, Emerald Software Group, LLC

Sunday, July 13, 2008

Silence of the Lambs

if (!it.skin.apply(TheLotion))
{
    it.HoseCount++;
};

More computer humor

To be or not to be?

string TheQuestion = "2B || !2B";

More computer humor

All work and no play . . .

if (WorkQuantity == "All" && PlayQuantity == "None")
{
    Jack.BoyType = BoyTypes.Dull;
};

More computer humor

Bottles of Beer on the Wall

BottlesOfBeerOnTheWall(new Wall(99));

public int BottlesOfBeerOnTheWall(Wall TheWall)
{
    TheWall.BottlesOfBeer.TakeOneDown().PassItAround();

    if (TheWall.BottlesOfBeer.Count > 0)
    {
        return BottlesOfBeerOnTheWall(TheWall);
    }

    return 0;
}

More computer humor

I only allow authenticated rodents . . . sorry.

2007/09/18: "Unauthenticated user, Please check if virtual directory of <name changed to protect guilty> Server anonymouse access is disabled."

More computer humor

Well, that pretty much narrows it down.

2007/08/09: "Syntax error, permission violation, or other nonspecific error."

More computer humor

So that pretty much rules out 0 then I guess?

2007/10/17: "One or more component failed validation."

More computer humor

Thursday, July 10, 2008

Hello world!

So this was my auto-generated first post. I decided not to delete the post because it is aptly titled, "Hello world!" As it turns out, my blog (YATB) is going to be yet another techie blog.

I'm a software engineer and stuff so most of my posts will be about just that . . . software. I am also on the management team with my company so there will be various observations about business and leadership as well.

With that, here is my contribution to the interwebs . . . Patrick Caldwell's Weblog