Hash Functions are Great June 27, 2008
Posted by Ufuk Kayserilioglu in : Uncategorized , 2commentsWe recently had a project where there was a need to automatically collect email addresses over the phone. I am sure everybody can appreciate how hard that is. To begin with, the user cannot enter her email address using the keypad on the phone. On top of that, even if you integrate a Speech Recognition engine, it is almost impossible to exactly capture the email address that the user has spoken. This is already very hard when dealing with a human on the other side of a phone conversation. I am sure you already had the experience where you had to communicate your email address to someone over the phone and found it almost impossible to convey it properly.
The problem stems from two distinct causes, in my opinion:
- The first problem is that the amount of different characters that an email address might contain is very large. An email address can contain an underscore (“_”), a dash (“-“), a dot (“.”), a plus sign (“+”), etc, etc. and it definitely has to contain an at sign (“@”) somewhere. You need a full keyboard to enter these characters and speaking these characters is very hard since (a) they don’t have unique names (a dash or a minus sign) and (b) speaking their names could be mistaken for the literal characters implied by that speech. To exemplify the second point imagine someone said to you “foo plus”, would you write that as “fooplus” or as “foo+”?
- The last concern brings us to the second cause of confusion: Email addresses are not opaque, they look like pronounceable/intelligible words but may not be. This causes havoc with a human. You can have your name as part of your email address, but might also include “x123y” and append that with “+friends”. And this is just the username part.
So, what should one do? How can one transmit email addresses over the phone?
My proposal is to use a reversible-hash function (I know that strictly speaking a hash function is not reversible, but that is immaterial for the discussion, we just need something that is reversible). Suppose all email clients (desktop based and web based ones) had a feature where you could lookup the unique numeric (or even alphanumeric) hash code for your email address. When you had to communicate your email address, all you have to do now is to communicate this (preferably numeric) string to the other side. They would, then, reverse-lookup the hash that you supplied them to obtain your email address. Thus, conveying your email address becomes no harder than conveying your credit card number. On top this can be totally automated. The whole scheme could be implemented using cryptography where everyone knows the secret. We would be using cryptography just to turn email addresses into opaque objects with a limited character set. This secret key can be supplied to all using TXT records of a domain record even. So how about it, any takers?
On a similar note, I recently started to get bugged by the fact that credit card receipts print the last 4 digits of my credit card number on them. The idea is, I guess, to let you validate that the receipt is actually yours without exposing your full card number. Well, we already know a better way of doing that without exposing any part of our card number. Yes, you guessed it right, hash-functions (non-reversible, true hash functions this time).
Suppose all credit cards had the hash code (using SHA-1 of your name and card number, for example) printed on the back somewhere and that all receipts printed the same hash on the receipt. It would be trivial for you to match that with the number on the back of your card and almost impossible for someone who obtained that hash to extract your card number from it. So why are we not using such things? It is not exactly rocket science, is it?
On Twitter, and the Bigger Picture of Next Generation Mail Clients June 6, 2008
Posted by Ufuk Kayserilioglu in : Uncategorized , 1 comment so farThe other day, a colleague, Orkun, from the office posted a blog entry in which he claims that I cannot be linked to. I guess he didn’t make his research thoroughly because here it is, a place on the Web that I call home. Granted, I have not posted anything for ages, and yes, the content leaves a lot to be desired for, but, nevertheless, it exists.
Anyways, the reason why my name came up in that blog post was me likening the Twitter system to basically no more than an IRC system on the Web. I have to confess that this is not my personal observation. Very recently, Scott Hanselman posted RFC: OpenTweets - Why is Microblogging centralized? and in that article, and in the following comments, the same analogy was made, and I picked it up from there.
In essence, the post by Scott, discusses an even more open and distributed framework as an alternative to the centralized service supplied by Twitter today. While the whole Twitter concept has not really made a very large impression on me, I am all for creating open, distributed systems that increase people’s communication and productivity.
My approach, however, is a little bit more old styled. For example, I read my RSS feeds using Thunderbird, but I am not using the built-in RSS feed support or the Forumzilla extension, because I like the manage my feeds the same way I manage my emails. Thus, I have developed a short PHP script which parses an XML list of feeds, fetches them, and pushes them to an IMAP server/folder of my choice. This way:
- I have a single place where all my RSS feeds entries are stored
- The body of the emails in those IMAP folders load the actual page in the message pane
- I can track read/unread status from all my clients (at work, at home, on my laptop, etc.)
- I can tag any feed entry using the tagging support of my mail client (and server, of course.)
- I can move any feed entry around
Similarly, my approach to other services like Facebook mini-feeds/status-updates or Twitter tweets is one that is very client oriented. If I can manage these services using my mail client, like if posting a tweet is as simple as sending an email, or responding to someone’s tweet was as simple as replying to it, I will be much happier.
Of course, all this leads to what the next generation of mail clients are going to be. I would like them to be things that manage email perfectly (without that it is a definite no-go) and also manage all the other torrent of news, snippets, status updates, feeds, blog posts, requests, invitations, notes, todo items, etc, etc. Of course, by then it won’t be called a mail client anymore. And, of course, it is unimaginable (for me at least) that all this data will be stored on the client, thus we similarly need newer unified protocols and servers that support them to achieve this. Right now, we have IMAP (love it) for email, WebDAV/CalDAV for calendaring, SyncML (too disconnected) for contacts/calender/todo/notes, and nothing else for the remaining bits. I say we need a single protocol to handle all this. I know that there already is some software out there that tries to do something along the lines of what I have said above, eg Chandler, but I think we need to think bigger than that. We need to build a future proof client/server system that can handle all the things we have now and whatever might be thought of by the next generation of Web applications/programmers.
Implementing such a thing would be an Exchange Killer -Squared (Exzillasq anyone?).
Anyone interested?
