D. Patrick Caldwell on Software Engineering: February 2009

Saturday, February 28, 2009

An RSS Feed Script for Your Blogger Template

I posted my list of social bookmarking scripts for blogger several days ago and I have made an update! The list now includes an RSS Feed link. Check it out.

Friday, February 27, 2009

Completely Automated Public Turing Test to Piss Humans Off

Okay, so it probably seems like I do nothing but gripe on my blog, but that's not true. I gripe on my blog most of the time, but only about legitimate concerns :). This one is no different. I really get frustrated by CAPTCHAs sometimes. I mean, I think it's a great idea to use them and they're very good at keeping computers from being able to submit forms, but sometimes they're good at keeping humans from submitting forms too).

So, I've devised a list of things CAPTCHA developers should keep in mind:

There's no reason to have a case sensitive CAPTCHA
If you insist on case sensitive captchas, letters which look the same both upper and lower case should be excluded or substitutable (e.g., o and O, x and X, m and M, s and S, etc)
You shouldn't compress your letters so much that you can't tell the difference between olo, ob, and do
Letters which are indistinguishable should be excluded or substitutable (i.e., 1 and l, 0 and O, S and 5, etc.)
Don't make them language specific (i.e., you should be able to embed one captcha on your website and it should work for any visitor in any language

If we follow these simple ground rules, CAPTCHAs will likely retain their low false negative rate (i.e., mistaking computers for humans) and lower their false positive rate (i.e., mistaking humans for computers).

Here's a list of CAPTCHAs that usually aren't that bad:

Here's a list of CAPTCHAs I think are rediculous:

Password Limitations May Mean Your Password is Unsafe

I wrote a post a while back complaining about the lack of security provided by websites which claim to be secure. A few days ago, my griped because the website for his bank won't allow him enough characters to use his standard high-security level password (you know, the one reserved for financial websites, SQL Server sysadmin, trucrypt, or your personal certificates).

I said, "that really sucks man. Now anybody who has access to that database can look at your password and therefore knows the password for your other accounts." He looked incredulous for a minute, started to ask why, and came to the same realization I had. The only really good reason to limit the length of your password (or the valid character set for that matter), would be if you're storing passwords in plain text. The only good reason to say, "your password cannot be longer than 12 characters" is if the password field in your user table is only 12 characters.

In our applications, you can have any length password and it can include any character that can be transmitted via HTTP post. Why don't we care what you use for a password? Well, that's because our data field is going to be a CHAR(32) or a CHAR(40) and no matter what you send us, we're just gonna salt it, hash it, and store it. There's no need disallow special characters (like ! or # which aren't even all that special) and there's no reason to limit the length of the password.

So, if some website wants you to limit your password (obviously I'm not talking about limiting passwords that are too short or are easily subjected to attack), then there's a good chance that they're storing your password in plain text and you would be fair and just in being upset about it.

Wednesday, February 25, 2009

File Download Resumer for HTTP

In my last post, I was complaining that my browsers of choice (namely, firefox and chrome) don't have good (if any) support for resuming failed or interrupted file downloads.

Now, there are very few things that irk me more than someone who complains but never offers a solution, so this post is proof that there is indeed a solution (and indeed a very simple one) to this problem.

To test it, I downloaded this file, hashed it, and got an MD5 of 0D01ADB7275BB516AED8DC274505D1F5. I downloaded about half the file, paused it in firefox, renamed the .pdf.part file to .pdf, resumed the download, hashed it, and got 0D01ADB7275BB516AED8DC274505D1F5. The file resumed exactly as I expected it to.

I threw together a quick download resumer and I've posted the project if you want the whole thing. Otherwise, these 4 lines contain the one line where the "magic" happens:

// create the request
HttpWebRequest request = HttpWebRequest.Create(Source) as HttpWebRequest;
request.Method = "get";
                
// set the range
request.AddRange(Convert.ToInt32(Downloaded));
                
// get the response . . . a.k.a., the "magic"
response = request.GetResponse() as HttpWebResponse;

Calling request.AddRange creates the header "Range: bytes=n-" where n is the number of bytes remaining in the download. Using this, browsers can append the desired bytes to abandoned file starting at exactly the position it left off. They could support it natively without plugins and allow you to "attempt to resume and hope for the best."

Tuesday, February 24, 2009

Why Don't Browsers Support Download Resumption

Yesterday, I was downloading a very large file from a co-worker's server. I forgot about the download when I shut my computer down and I got 6/7th of the file. Having realized my error, I tried to download the file again today and something went awry. I right clicked on the file, selected "Save Link As...," selected the file containing the first 6/7th of my download, and hit save. A dialog box came up explaining that the file already existed and asked if I'd like to overwrite the file. I clicked no, and much to my dismay, the dialog box went away and nothing else happened!

I'm concerned that my browser thinks that the only two reasonable options in this case are to overwrite the file or to not download at all. Why can't I just resume?

So, here's the thing . . . I can only guess that the reason they don't resume downloads is because they can't match the first part of the file to the file you're trying to download and therefore, you could end up corrupting the file if you try to resume it from the wrong source or at the wrong point. I say, "so what?" That should be my prerogative. If I'm convinced that I can resume the file, then who is my browser to try and stop me? After all, I'm sure as hell not gonna make a partial download any worse!

If you look at the W3C RFC for HTTP (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35) you'll see it already supports data range request headers. The iPhone uses this all the time to stream media efficiently and it works great. IMHO, browsers have no excuse to have shoddy support or to lack support altogether for file download resumes. If you start downloading at the byte after the last byte I downloaded, at least I could try to get that last 1/7th and save myself 6/7th of the time.

Monday, February 23, 2009

Social Linking on Blogger with Digg, StumbleUpon, Delicious, DotNetKicks, et Cetera

I've been doing more and more blogging lately, and I've noticed my visibility really picking up. When I added Digg (even though very few of my readers seem to Digg me . . . hint hint), I noticed my daily visits took a jump. I added Dot Net Kicks, and again . . . a jump. I've decided to add more social bookmarking links like StumpleUpon and Del.icio.us to see what happens. The only one that was difficult was the Delicious link because I couldn't find an example online. I decided that I'd post all of my bookmarking links here largely as a way to keep track of them myself, but you're all welcome to use them too.

Nota bene, I'm pretty sure I encoded them correctly so that if you copy and paste them into your template, you'll get what you're looking for. If they don't work (and the blogger template editor will usually warn you about an invalid html entity or something like that), then please leave me a comment and let me know and I'll fix it as soon as possible.

So, without further ado, here's how I got my Digg link:

<script type='text/javascript'>
digg_url = &#39;<data:post.url/>&#39;;
digg_title = &#39;<data:post.title/>&#39;;
digg_bgcolor = &#39;transparent&#39;;
</script>
<script src='http://digg.com/tools/diggthis.js' type='text/javascript'/>

Here's how I got my Reddit link:

<script>reddit_url=&#39;<data:post.url/>&#39;</script>
<script>reddit_title=&#39;<data:post.title/>&#39;</script>
<script language='javascript' src='http://reddit.com/button.js?t=2'/>

And my stumble upon link:

<a class='timestamp-link' expr:href='&quot;http://www.stumbleupon.com/submit?url=&quot; + data:post.url + &quot;&amp;title=&quot; + data:post.title' style=''>
<img align='' alt='Stumble Upon Toolbar' border='0' src='http://www.stumbleupon.com/images/su_micro.gif'/>
</a>

Dot Net Kicks:

<a expr:href='&quot;http://www.dotnetkicks.com/kick/?url=&quot; + data:post.url + &quot;&amp;title=&quot; + data:post.title'>
<img border='0' expr:src='&quot;http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=&quot; + data:post.url'/>
</a>

And the difficult and time consuming del.icio.us:

del.icio.us

<a 
  expr:onclick='&quot;window.open(\&quot;http://delicious.com/save?v=5&amp;noui&amp;jump=close&amp;url=\&quot;+encodeURIComponent(\&quot;&quot; + data:post.url + &quot;\&quot;)+\&quot;&amp;title=\&quot;+encodeURIComponent(\&quot;&quot; + data:post.title + &quot;\&quot;), \&quot;delicious\&quot;,\&quot;toolbar=no,width=550,height=550\&quot;); return false;&quot;'
  expr:href='&quot;http://delicious.com/save?v=5&amp;noui&amp;jump=close&amp;url=&quot; + data:post.url + &quot;&amp;title=&quot; + data:post.title'>
<img alt='Delicious' height='10' src='http://static.delicious.com/img/delicious.gif' width='10'/> del.icio.us
</a>

The considerably less difficult RSS feed link:

<a expr:href='data:blog.homepageUrl + &quot;feeds/posts/default?alt=rss&quot;'>
<img alt='rss feed' src='http://www.benjaminobdyke.com/images/rss_button2.gif'/>
</a>

This post will also be a living document, so when I update it I'll post an update notice so it'll come up in your RSS feed.

Thursday, February 19, 2009

Memoizer Attribute Using PostSharp

I was doing a lot of research yesterday and I came across the memoization technique. I've written a cache proxy to provide lazy loading and event driven refreshes as well as to provide valid durations and several other features. I still like the cache proxy, but there could be times where that's just overkill and a little bit of caching will go a long way.

My co-worker (Jamus) and I were talking about memoization in Ruby on Rails and the implementation is so clean. Basically, it replaces the original method with a call to the memoizer which checks for a cached value which it either returns or replaces. I wanted something like this in C# (seeing as how I am a C# developer).

The blog I was reading that introduced me to the memoizer had a generic memoization method that didn't work, but with a little tweaking, it did just fine. You basically passed your method call through this generic method and it would handle first checking for your cached value before passing the call through to your original method.

The problem was, what if you had a recursive method? I was playing around with a factorial method (of course) that looked like this:

static int factorial(int n)
{
    return n < 2 ? 1 : n * factorial(n - 1);
}

// test cases
//factorial(7);
//factorial(9);
//factorial(9);

The first test made 7 iterations through the factorial method, the second 9, and the third 0. I thought about it and it seemed like I'd be better off if I made each call to the memoizer, rather than the original method. That way, my first test would make 7 iterations, my second test 2, and the third test 0. The problem was, with the generic memoizer method, my factorial method would thus have to be aware of the memoizer and I didn't want that.

That's when Jamus introduced me to PostSharp. PostSharp is a library that modifies your code at compile time providing Aspect Oriented Programming capabilities to .NET. This was almost exactly what I was looking for . . . a way to cleanly separate the concerns of my application from the memoization concerns. PostSharp allowed me to take the memoization aspect out of the implementation of my methods and put them nicely into a method attribute. This is just a first version proof of concept so I haven't optimized it much, but I think the idea has a great deal of potential, so here it is:

    public static class Memoizer
    {
        // private field to store memos
        private static Dictionary<string, object> memos = new Dictionary<string, object>();

        // PostSharp needs  this to be serializable
        [Serializable]
        public class Memoized : OnMethodInvocationAspect
        {
            // intercept the method invocation
            public override void OnInvocation(MethodInvocationEventArgs eventArgs)
            {
                // get the arguments that were passed to the method
                object[] args = eventArgs.GetArgumentArray();

                // start building a key based on the method name
                // because it wouldn't help to return the same value
                // every time "lulu" was passed to any method
                StringBuilder keyBuilder = new StringBuilder(eventArgs.Delegate.Method.Name);

                // append the hashcode of each arg to the key
                // this limits us to value types (and strings)
                // i need a better way to do this (and preferably
                // a faster one)
                for (int i = 0; i < args.Length; i++)
                    keyBuilder.Append(args[i].GetHashCode());

                string key = keyBuilder.ToString();

                // if the key doesn't exist, invoke the original method
                // passing the original arguments and store the result
                if (!memos.ContainsKey(key))
                    memos[key] = eventArgs.Delegate.DynamicInvoke(args);

                // return the memo
                eventArgs.ReturnValue = memos[key];
            }
        }
    }

That's all there is to it, I modified my factorial method to look like this:

[Memoizer.Memoized]
static int factorial(int n)
{
    return n < 2 ? 1 : n * factorial(n - 1);
}

Using this technique, I achieved the results I was looking for. The first time I called factorial(9), it checked the cache, did't find an entry, created one, recursed until it got to factorial(7) where it found a memo, returned it, and stopped the recursion.

This method is pretty expensive and could get pretty memory intensive, but I'm on the lookout for a long process (especially a long recursive process) where I can test this out to see if it produces savings over the long run.

Wednesday, February 18, 2009

Tips for Jailbroken iPhones

If you're going to jailbreak or you have already jailbroken your iPhone, here're some tips you may find useful. I presume I'll be discovering new tips and tricks, but I haven't decided how I'll post those. I'll either edit this post or I'll add new posts as they come to me. If I decide to make the current post a "living" document, I'll be sure to post update notices so that they'll appear in my RSS feed.

Tip 1) This is the most important tip I have (or will ever have) for anyone with a jailbroken iPhone. Unless you are on the iPhone Dev Team, do not act like you've hacked your iPhone or that you have somehow done something that makes you anything more than a script kiddie. Anyone can follow the QuickPwn instructions.

Tip 2) Photos you take on your iPhone are stored in /private/var/mobile/Media/DCIM/100APPLE

Tip 3) If you legally download installers with Installulous, they're stored in /private/var/mobile/Library/Downloads

Tip 4) Your call history, your address book, your sms messages, and many other things are stored in SQLite databases in the /private/var/mobile/Library

Tuesday, February 17, 2009

Stored Procedure to Flip Staging Tables

We have several applications that are relatively data intensive. For the most part, the applications run just fine when executing simple queries against indexed tables. The problem comes when we need to aggregate data for reporting or perform queries against cross-joined or unioned tables. As a result, sometimes we create data tables that are periodically populated with the results of the slow cross-join or union and are subsequently indexed to optimize performance for future queries.

That's all well and good, but you can't really fill that table while the application is running queries against it, so what do you do? Well, you create a staging table, fill it, and the point your application at your new table, right?

Something I picked up from partitioning very large table during my data warehousing days is the idea that I can point a view at a table and treat it like a table and it will be almost as performant as the table with its indexes. Thus, if I want to have a staging table and a production table, I don't have to put any intelligence in my application; rather, I can leave the staging and swapping to the database.

For example, if I have a very large table of invoices (which I would, of course, love to have), then I could create two tables called Invoice1 and Invoice2. I could then point the view Invoice at Invoice1 and InvoiceStaging at Invoice2. Thus, any procedure or application that needs to build my staging table can do so always by referencing InvoiceStaging and my production table will always be referenced with Invoice.

The only difficulty really is switching the view to have it point to the new table and to point the staging view to the old table. I wrote a stored procedure that would look into a settings table, determine the current production and staging tables, swap them, and write the new settings back to the settings table. It was fine really and I didn't have any trouble with it, but one day I had an idea. I wrote this stored procedure to handle my switching for me and it does it all with nothing more than some of the system tables.

CREATE PROCEDURE [dbo].[FlipStagingTables]
(
  @TableName VARCHAR(255)
)
AS

  DECLARE @Active VARCHAR(255);
  DECLARE @Staging VARCHAR(255);

  SELECT 
    @Active = MAX(CASE v.name WHEN @TableName + 'Staging' THEN t.name ELSE NULL END),
    @Staging = MAX(CASE v.name WHEN @TableName THEN t.name ELSE NULL END)
  FROM sysdepends d
  INNER JOIN sysobjects v
    ON d.id = v.id
  INNER JOIN sysobjects t
    ON d.depid = t.id
  WHERE v.name IN (@TableName, @TableName + 'Staging');

  IF @Active IS NULL OR @Staging IS NULL
    SELECT @Active = @TableName + '1', @Staging = @TableName + '2';

  IF OBJECT_ID(@TableName) IS NOT NULL
    EXEC('ALTER VIEW ' + @TableName + ' AS SELECT * FROM ' + @Active);
  ELSE
    EXEC('CREATE VIEW ' + @TableName + ' AS SELECT * FROM ' + @Active);

  IF OBJECT_ID(@TableName + 'Staging') IS NOT NULL
    EXEC('ALTER VIEW ' + @TableName + 'Staging AS SELECT * FROM ' + @Staging);
  ELSE
    EXEC('CREATE VIEW ' + @TableName + 'Staging AS SELECT * FROM ' + @Staging);

  EXEC('TRUNCATE TABLE ' + @Staging);

Friday, February 13, 2009

Should an Astronaut Keep a Pistol for Propulsion

I was trying to find some group matrix ranking and estimation stuff I found one time in graduate school. I remember working on this project that led to the discovery. It was sort of a group assignment and I had done similar things 4 or 5 times throughout my education. So, I started searching for the documentation and ran across several examples.

The one I did in graduate school was something about some NASA astronauts were stranded on the moon and they had 25 things. We were to individually rank the items and then rank them as a group so that we could see whether our individual rankings or our group rankings were more similar to the expert rankings. The experts, of course, were NASA astronauts.

Today, I found another example about surviving in the Canadian wilderness. In this case, the expert was a US Army Survival School instructor.

In both cases, there is "a loaded .45 caliber handgun."

Now, here's my beef (there must be a beef . . . otherwise, what would I be blogging about?). I don't believe that the people who identified the "gold standard" rankings were actually experts. I think they're somewhat clever, but not NASA astronauts and not US Army survival instructors. In fairness, I don't believe that the official versions of these tasks make the claim that there were ever any experts involved, but most versions offer some fictitious evaluator to give clout to the "correct" answers.

So, here's why I doubt the expert claims. I don't believe that a US Army survival instructor would list a pistol as the 9th out of 12 items in terms of importance and certainly not because "although a pistol could be used in hunting, it would take an expert marksman to kill an animal with it. Then the animal would have to be transported to the crash site, which could prove difficult to impossible depending on its size."

Let's keep in mind that an army survival instructor has probably gone through at least the S part of S.E.R.E. During this training, they learn to kill animals with sticks and rocks. To be sure, if you can club a critter with a stick, you can shoot the little guy and eat him there or at camp.

Second, I doubt the army instructor would agree that "the pistol also has some serious disadvantages. Anger, frustration, impatience, irritability, and lapses of rationality may increase as the group awaits rescue. The availability of a lethal weapon is a danger to the group under these conditions."

I mean, here's a guy who has probably spent a fair amount of time in really crappy situations with a bunch of people who had a bunch of guns. If someone is going to flip his lid and start killin' folks, he's not really going to need the pistol to do so. Just bring it.

Of course, I disagree with a lot of the valuation assessments on most of the items, but I'm harping on the pistol thing for a good reason. Not because I like guns and not because I'm a proud gun toting American, but because I was really irritated when I ranked a pistol among the least important things in space but my group chastised me when they ranked it in the top 5!

Now, first of all, I realize that a gun can be fired in space, but I also realize that there's a low probability of having a target at which to fire the gun. The reason my group (and NASA . . . purportedly) wanted to keep the gun around was as a means of propulsion. After finding this again today, I've done a little math. I'll admit that my physics is a little rusty these days, I'm pretty sure that the numbers are pretty close.

So, pretty much the biggest bullet I could find for a .45 was 230 grain. It'll leave a Glock at 880 ft/s. 1 grain = 1/7000 lbs. Momentum = mass * velocity. Thus, the momentum of the bullet leaving the gun shot by a guy standing on the moon would be 28.9 pound foot per second. If a guy my size (180 lbs) was in a space suit (180 lbs according to NASA . . . no, really) that was designed for walking on the moon (the floating in space suit is much heavier), said guy would weigh 260 lbs carrying 2 lbs worth of gun for a total of 262 lbs.

According to Newton's third law of motion, the gun firing the bullet must exert equal force against the shooter (id est, 28.9 lb ft/s). If we divide that by the weight of the shooter and his junk, we get about .11 ft/s or 6.62 feet per minute or .0752 miles per hour. Any way you look at it, it'll take a long time to get around the moon with a handgun.

Wednesday, February 11, 2009

What We Can Learn From Michael Phelps

Now, I don't smoke pot, but this seems to be a relevant point. Michael Phelps has lost numerous endorsement contracts and has been suspended from competitive swimming as a result of the surfacing of an incriminating picture of him hitting a bong at a party (and I suppose we just presume there's marijuana in it . . . but whatever).

Now, people all over the world are looking at Michael Phelps like they did when that Dell kid (you know . . . dude, you're getting a Dell) got in trouble. People are like, "man, Michael, why would you ruin your life like that?"

What if we're looking at it wrong. Maybe we shouldn't think that Phelps is a fuck up; rather, that Phelps is not a fuck up. What if all that stuff they've always told us about marijuana isn't really true? What if there are more functioning potheads than there are functioning alcoholics? What if you can smoke pot and still be the most awarded Olympian of all time? Perhaps a little pot from time to time doesn't actually ruin your life and turn you into an absentminded vegetable.

Who'd've thunk it? A true Olympic endorsement for marijuana. And honestly, you can't tell me that you didn't know the Dell kid was a pothead? I mean, that's why everybody liked him! He acted like a pothead. He acted like a pothead at his audition and Dell hired him because he seemed cool and hip and people would relate. Then, when we found out he actually did smoke, we fired him!

And for my final point . . . marijuana is great for the snack food industry.