Address unknown
I’ve said it before: the failure of links is a catastrophic failure in a web site. It happens sometimes, to everyone. With internal links, because of a silly mistake in moving a file (or files); with external links (much more common) because of changes in sites over which you have no control.
However, if there is one reason which should not be the cause of link failure it is getting the URL wrong. I don’t mean making a mistake about what the URL is, I mean getting the URL syntactically wrong. And yet recently I have seen a number of sites, including some belonging to professional Web designers, which have failed for precisely that reason.
Now, no one is perfect, and everyone does design pages which fail sooner or later (usually sooner) in one or more browsers, but not understanding how a URL is written is rather like expecting to win the Booker Prize without knowing how to spell “the”.
There is nothing difficult about this. Look at any URL, and you see this structure:
protocol://server_address/directory/document.name
The protocol is now almost always the HyperText Transfer Protocol, so much so that modern browsers will often assume this is the protocol if a URL is entered without a protocol being specified. On the other hand, it is not absolutely always the protocol used to determine how files are transmitted, so it is good practice to always include http:// in any URLs quoted in your business cards, brochures, etc.
The next part of the address is the server address, which indicates the machine where the desired file can be found. Sometimes the server name contains full stops or “dots” — this is not something to bother about beyond getting the placement of the dots in the name right!
This may be followed by information about which directory on that machine contains the file. If the file is anywhere other than in the root directory, there may be several slashes separating directory levels before the final slash.
Usually, the server address begins with ‘www’, which stands for the World Wide Web. So this address:
http://www.windsword.co.uk/index.html
means that you are going to be using the HyperText Transfer Protocol to retrieve index.html from WindSword’s server, which is connected to the Web.
Not content with leaving off the http://, some people and companies drop the www. This is much worse and should never be done, because there is an increasing number of sites whose URL does not begin with www. These three letters are an important part of the address, and users need to know if they are present or not. (It cannot always be assumed that http://www.company.com/ and http://company.com/ will take you to the same place.)
The final piece of the address after the final slash is the actual name of the file to be retrieved. Usually this ends in .html or .htm signifying that it is an HTML file, though now there are more and more pages with other extensions, sometimes signifying they have been dynamically generated.
Remember that if the file you are looking for is index.html (or on some servers default.html) you need not enter the filename after the final slash since browsers will automatically look for index.hml in the absence of a specified filename. WindSword’s URL could, therefore, also be correctly written:
http://www.windsword.co.uk/
The rules for filenames on the Web are simple. They should contain only alphanumeric characters, plus a full stop (dot) to separate the main part of the filename from the extension (one dot only in the filename). If you want a space within the name, the underscore character, “_”, should be used. (Note that there are people out who detest the underscore. Hmm.)
Apart from underscore and the dot, the only special characters allowed are $ - + ! * ().
Other characters, particularly actual space characters, should not be used. If such unsafe characters are used, they will be mapped to their ASCII hex value by some browsers (but other browsers will just not cope). Space characters, for example, would become %20. It might seem to be a good idea to name a file ‘how can we design your site?.html’, but:
how%20can%20we20design%20your%20site%3F.html
looks a mess. Always remember that many surfers use the URL as a navigational aid. A filename like that helps no one, and that includes you trying to find it in a directory listing.
The filename extension is essential. Some browsers on some platforms can take a file with no extension, work out what it is and display it accordingly. Most can’t. If they manage to display the page, which is questionable, you will probably find that any internal hyperlinks in the page do not work properly.
Needing three or four letters in the filename to tell programs what to do with them may be dumb, but that’s the way it is. There is no difference in functionality between .htm and .html, but you must be aware that file.htm and file.html are not the same file, and could happily co-exist in the same directory. Pick whichever extension you prefer and stick to it to avoid confusion (the general preference seems to be for .html).
This all seems simple, and it is simple. Why does anyone get it wrong? There are two points — still very simple points — which trip up people who should know better (yes, I do mean Web designers):
- URLs are case-sensitive. INDEX.HTML is not the same as index.html is not the same as InDeX.htmL. Uploading a file with an uppercase filename to a site where all the links address it using lowercase names is a sure way to get broken links (and puzzled designers).
- The directory separator in URLs is the forward slash, not the backslash used on Windows machines. There is nothing optional about this.
Web site addresses use UNIX conventions, hence both the forward slash and the case-sensitivity. If there is a problem with case it often seems to affect image files — saved by software as, for example, image.GIF while the author inserts links to image.gif; however, the most common cause of malformed URLs seems to be use of the backslash.
I will admit to quite a bit of Schadenfreude when I come across a designer’s site which breaks down because someone fondly imagines that this sort of thing is a valid URL:
http://www.some.ones.com/site\thats\broken.html
There is an amusing story about why MS-DOS used the backslash — namely, that Microsoft forgot to allow for a directory separator and when it did occur to them the slash had already been used for something else! Hence DOS’s unique separator, and the one Windows machines now have to use.
Using the backslash on the Web, though, leads to links which don’t work and makes everyone think you really don’t know what you are doing.
Looking at sites like that can amuse; there are few pleasures like Schadenfreude. On the whole, though, I would happily forego that if it meant not staring at a white screen instead of the page I’m looking for.
© DC 2000. All rights reserved.


