Hash Functions are Great June 27, 2008
Posted by Ufuk Kayserilioglu in : Uncategorized , trackbackWe recently had a project where there was a need to automatically collect email addresses over the phone. I am sure everybody can appreciate how hard that is. To begin with, the user cannot enter her email address using the keypad on the phone. On top of that, even if you integrate a Speech Recognition engine, it is almost impossible to exactly capture the email address that the user has spoken. This is already very hard when dealing with a human on the other side of a phone conversation. I am sure you already had the experience where you had to communicate your email address to someone over the phone and found it almost impossible to convey it properly.
The problem stems from two distinct causes, in my opinion:
- The first problem is that the amount of different characters that an email address might contain is very large. An email address can contain an underscore (“_”), a dash (“-“), a dot (“.”), a plus sign (“+”), etc, etc. and it definitely has to contain an at sign (“@”) somewhere. You need a full keyboard to enter these characters and speaking these characters is very hard since (a) they don’t have unique names (a dash or a minus sign) and (b) speaking their names could be mistaken for the literal characters implied by that speech. To exemplify the second point imagine someone said to you “foo plus”, would you write that as “fooplus” or as “foo+”?
- The last concern brings us to the second cause of confusion: Email addresses are not opaque, they look like pronounceable/intelligible words but may not be. This causes havoc with a human. You can have your name as part of your email address, but might also include “x123y” and append that with “+friends”. And this is just the username part.
So, what should one do? How can one transmit email addresses over the phone?
My proposal is to use a reversible-hash function (I know that strictly speaking a hash function is not reversible, but that is immaterial for the discussion, we just need something that is reversible). Suppose all email clients (desktop based and web based ones) had a feature where you could lookup the unique numeric (or even alphanumeric) hash code for your email address. When you had to communicate your email address, all you have to do now is to communicate this (preferably numeric) string to the other side. They would, then, reverse-lookup the hash that you supplied them to obtain your email address. Thus, conveying your email address becomes no harder than conveying your credit card number. On top this can be totally automated. The whole scheme could be implemented using cryptography where everyone knows the secret. We would be using cryptography just to turn email addresses into opaque objects with a limited character set. This secret key can be supplied to all using TXT records of a domain record even. So how about it, any takers?
On a similar note, I recently started to get bugged by the fact that credit card receipts print the last 4 digits of my credit card number on them. The idea is, I guess, to let you validate that the receipt is actually yours without exposing your full card number. Well, we already know a better way of doing that without exposing any part of our card number. Yes, you guessed it right, hash-functions (non-reversible, true hash functions this time).
Suppose all credit cards had the hash code (using SHA-1 of your name and card number, for example) printed on the back somewhere and that all receipts printed the same hash on the receipt. It would be trivial for you to match that with the number on the back of your card and almost impossible for someone who obtained that hash to extract your card number from it. So why are we not using such things? It is not exactly rocket science, is it?

Comments»
I know it won’t help you but here’s some trivia: in Israel they call @ a “strudel”. There is also the Turkish for @, “kuyruklu a” which seems to be used by Acık Radyo only.
Emin, thanks for the feedback. Now I have an example that can throw English speakers, Israelis and Acik Radyo people off balance: “Email me at strudel.dot.at@kuyruklua.nokta.com”