Thursday, May 6, 2010

The quest for the perfect permalink

For SUSE Studio we are looking into adding nice permalinks to appliances. This turns out to be an amazingly difficult problem. The implementation is not too hard, but getting the scheme of the links right poses quite some interesting challenges.

So what do I actually mean by permalink? A permalink is a nice and convenient way to point to objects on a web site from outside of the web site itself. In our case this would be links which point to appliances on SUSE Studio. To make this nice and convenient the link needs to have a couple of attributes:
  • Permanent. The link should not change or depend on the state of the site or attributes of the user session. If you publish the link on another web site, e.g. in your blog, it should not break after a while or for other users.
  • Pretty. As the permalink is meant to be suitable for publication, it should have a pretty format, so that you can integrate it into text without completely destroying formatting and flow.
  • Expressive. When you see a permalink, it should be recognizable where it points to, so you don't have to click it to find out what it actually is about.
  • Short. For sharing the link it's nice, if the link is short. This is especially important when there are limitations for the length of the link, for example when sharing it via Twitter.
Meeting all these requirements is not easy, but there are also some additional challenges:
  • Handling change. The objects permalinks point to are being worked on, so they change in various ways. For example the name of an object could change. Permalinks have to handle this conflict between permanence and change in some way.
  • Namespacing. A site might handle different types of objects, so they need to be addressed in a way which doesn't cause conflicts. As we are talking about user-provided content here, there also is a cause for conflict by different users trying to use the same names. So we need to do some namespacing to handle these conflicts.
  • Potential abuse. The permalinks are pointing to user-provided content. So depending on how much influence the user has on the link, there might be some potential for abuse by users who try to create links which misrepresent the site.
  • Non-ASCII characters. If you base permalinks on names, you have to deal with characters which are natural in names, but not in URLs. This can make it hard to create permalinks.
Let's use a fictive example to illustrate the requirements and challenges:

John Doe likes baking cakes. He also likes to share, so he publishes his recipes on example.com. His favorite recipe is the chocolate cake of his aunt Tilly. So he publishes it and the site creates the permalink example.com/chocolate_cake. John tweets the link. His friends get it, bake the cake, and everybody is happy. This permalink is short and pretty. It conflicts with all other recipes named chocolate cake, though. So the first to publish a recipe wins. This is good for John, but bad for other users, so not a perfect solution.

One way to avoid the problem of conflicting names would be to add a namespace for the user, so the permalink would become example.com/jdoe/chocolcate_cake. It makes it longer, though, and there still is the potential for conflicts in the user name. So when John's sister Jane Doe joins example.com, she'll not be able to use her favorite user name, which also is jdoe, but has to choose something else. Still not perfect.

Now aunt Tilly is a modern lady. She reads the tweet and sends an email to John: "Hi John, I gave you this recipe. Be a good boy and mention this on your web site. All the best, Tilly". John is a good boy and changes the name of the recipe to "Aunt Tilly's Chocolate cake". The web site creates the permalink example.com/aunt_tillys_chocolate_cake. This makes aunt Tilly happy, it's still expressive, pretty and relatively short, but it breaks the link in John's tweet. So the site has to redirect the old link to the new one. It at least has to prevent that the old URL is used for something different. It makes the nicer URL example.com/chocolate_cake unavailable for other recipes in any case. This is good for permanence, but bad for pretty and short links.

Another problem which is illustrated by the name change is handling of special characters. The apostrophe is hard to handle in an URL, so the site just removes it for the link. You can come up with all kind of rules to handle these special characters, but they will eventually fail to generate pretty URLs, e.g. when somebody uses a Japanese name. This means that either the user edits the link, which introduces lots of opportunity to change and the problems associated with it, you give up on pretty URLs for at least some cases, or you let users deal with the problems of encoding special characters in URLs and the issues you can run into, e.g. when using tools which don't properly handle all of this. Another stumbling block on our quest for the perfect permalink.

An easy solution to avoid most of these problems is to generate random permalinks. This also removes all complexity with user-editable content and changes, as the the link is independent of the content. So aunt Tilly's yummy chocolate cake would be referenced by example.com/3hd63lbdxz. This is short and permanent, but not pretty or expressive.

You can think of various variations and combinations of these schemes, but meeting all requirements really is very hard. Seems like there is no perfect permalink. But let's look at some real-world examples.

Real-world examples

Wikipedia provides short and pretty permalinks. They have the advantage that the number of terms represented in links is limited, pretty well-defined and not completely up to users. They still have to deal with conflicts and do that with their disambiguation pages. They take on the challenge of encoding special characters, which is nice.

Gitorious lets users choose the permalink (or slug as they call it). They forbid special characters. The permalink becomes a top-level path, which is nice. You can't name your project login, though. You can change your slug, but this breaks old URLs.

Github goes a slightly different path. They prefix all projects with the user name. This is nice as it avoids conflicts and it also stresses the social aspect of the site. Some URLs become a bit ugly, e.g. github.com/rails/rails. They seem to cleanup their users and projects from time to time, as a nasty URL, which used to exist, I wasn't able to find anymore today.

s.opensu.se is a site to provide links to various openSUSE resources. You can for example reference repositories by short links like s.opensu.se/r?network:utilities. This is nice and short and reasonably expressive. If it's pretty is a bit a matter of taste. It doesn't address changing links.

Markmail is an example for random permalinks. This is probably the only way to cope with the number of objects they manage as a mailing list archive and the links are still short and relatively pretty: markmail.org/message/vyjutm3jkecxprzj. Change is not an issue for them as objects are static.

There are tons of other examples out there. If you know of one which solves the problems of permalinks in a particular good or interesting way, please let me know.

What do you think?

My preliminary conclusion is that it's probably impossible to come up with a perfect scheme for permalinks, and we need to do some compromise. I like including user names in the URLs as it solves some of the conflict issues, is actually useful information, and makes for expressive URLs. But of course there are other solutions as well.

What do you think? How would you like permalinks to SUSE Studio appliances to look like?