SATURDAY APRIL 15, 2006 Find  

Home

About
Apple
Career
Experiences
General
Graphics
Hardware
History
Humor
Interface
Networking
OS
Opinion
Politics
Programming
Quotes
Reviews
Security
Software
Sound
Thought
Web



Cheap International
Airfare Online

Wachovia online banking
Get Free Coupons Online
Finding the perfect
discount hot tub

Payday Loans
Stock Trading Online
Stuffed Animals
Smart Investing Online

HomeInterface
File Extensions or Type Creator?
How about something new?
     By: David K. Every
Kind:
Created:
Size:
Article
2002-04-11 07:39:01
12 KB
 
rerequisites; You should really understand what a multi-forked filing system is before reading this article.



There's a debate going on over whether we should use type/creator (the Mac-Way) or File Extensions (the DOS / UNIX way)? Both have some strengths and weaknesses.

File Extensions are basically a way to easily manipulate the type of a file. If the file is named "sample.text", you know that the file is called sample, and the type of the file is text. If the file was "sample.doc" or "sample.word", then you know the file is a word document.

This method made a lot of sense when people were using a command line as their primary interface. When using a command line it is very easy to create a file of a certain type, or filter on a certain type - because the metadata (date describing the file) is actually embedded in the name data.

This is completely translucent behavior - users see the file type, and must manage the file type, and be very careful not to manipulate the file type when they are changing the name. This is known as "bad UI" - the computer is making the user maintain things for it, instead of the other way around (the computer doing the right thing by default). Far better UI is to split the type from the filename, and put the user in complete control of the name. Only explicit behavior to change the type should change the files type (not just renaming the file). Most of the time, users shouldn't ever have to manipulate "type" and it should be left to programs.

Apple decided that since the interface should separate the name from type (extension), then obviously the implementation should as well. So they broke out the extension from the filename - they created a separate metadata/attribute to allow the type to be stored somewhere else. This meant that users could change the name to their hearts content, and never screw-up the file's type by doing so.

On top of that, Apple figured that since they were giving files their own type, they should also allow files to track their own "creator". So that if I had a text file, that I created in word, the file knew what created it (Word). So then, whenever a user ran that file, it knew that by default it should open itself as a text file, in word. This was certainly more useful.

Apple isn't the only one thinking like this. Digital Equipment (makes of the VAX, Alpha and VMS Operating System), among others, also didn't implement things quite the same. Even though they had command lines, it only looked like the extension was part of the name - but they really kept the extension and file version, in separate attributes/indexes, in such a way that they were a little more separate. It was better, and had some potential (and more power/versatility), but still they were command-line centric, and so they never really fully exploited the power that this solution gave them.

Some people (Microsoft) chose to hack the solution. They didn't want to fix the problem (extensions in the filename) -- that would take time and money. They chose to just try to hide the problem. So everywhere that they deal with a file, they just hide the extension - unless you have some parameter that says you are a power user and want to see it. But that means that the problem is lurking right under the surface - and it dramatically increases the work to programmers. Everywhere that everyone deals with the file, they have to know off all the issues with the extension and do things the exact same, or the metaphor breaks down.

On top of that, there is another problem with embedding an extension in a filename - and that is that you have to steal a character as a "beginning of extension" delimeter (marker). If you were smart, you'd use a special character that had no other purpose but to mark the start of file extensions (and a special way to type that character). No one was that smart - they decided to just use '.' Instead. Which of course means that you can't use periods for other things in filenames - which is just silly. Of course in a perfect world, users should be able to type anything they want as part of the filename. But this whole delimeter (special character) mess, gets ugly. What do you do if a user starts adding things like ".gif" as part of their filename? Then the filename is "picture.gif.gif" or worse, "picture.gif.jpg" if they are wrong about the type. That is just ugly and wrong.



All of the solutions have problems.

The Mac way was the best for a user, on a single machine, since it put the user in the most control of the filename. But it has drawbacks.

The Mac interface never fully exploited the capabilities that it had. There should have been a utility (or better utilities) so that users could easily change the type and creator. Basically, a user should be able to easily change who would open a file - either just overriding it for that one file, for all files of that creator, or for all files of that type. The Mac never finished that easy management component - though there were programmer utilities, and partial fixes for some cases.

More than that - the Mac implementation was never designed to handle multiple users. A file really needs a creator for each user. Since user1 might want to edit that file in Word, while user2 wants to edit it in Bbedit (and might not even have Word available to them based on privileges).

Also there's still a gaping hole. Let's say that I have a file that is a stylized text file (like RTF), that is a C language programming file, that I want to the default editor to be BBEdit, for me only. That's more than two attributes. I count a few domains (levels).

1) We have the file core-type - it is a text file.
2) We have a subtype/modifier - it is stylized text (or the type of text file)
3) We have what the content is - its content is C programming language. And a tool needs to know that, because it can treat a C text file different from an HTML text file, and do the right thing for me.
4) We have the creator of the file -- we still want to pick what app opens the file.
5) We have a creator override (per user) - we want to be able to change what app opens the file on a per user basis, and still remember what the default one is overall.
6) Hell, while we're at it, let's do things right, and have a file also handle versioning. (Allow another dimension to a file, which is time - so that you can go back to previous revisions of a file, or see how the file has evolved over time.

So to do things right - we really can't use either the Mac way, or the DOS/UNIX way. We need to do something new. We need to create a hierarchical domain that can handle core types, attributes/subtypes, content descriptors, creators, and overrides (on a per user basis) - and has real versioning/revisions. This is all not that hard to do, nor is it that hard to present to users. All of them have been done at some levels (though the creator and user overrides is pretty much of my creation), it is just that no one has yet put them together.



How would I do it? People like to say, "oh, sure, Mr. Bigmouth talks about the problems - but how do you fix them?" Well Mr. Bigmouth is also Mr. Know-it-all - or in this case, I at least know-enough to know that this is solvable in quite a few ways.

The best way (in HFS+/Mac or NTFS/Windows) would be to add in the attributes to the metadata to handle each of the first 5 items. The Subtype/attributes would have to be an array of elements, since you could have more than one attribute for a type. Also the creator override would have to be an array since you need to keep one set of override per user. One area where the Mac fell down is that you also need to create a public interface to these items - sort of like the new GetInfo in OS X, to allow those attributes to be viewed (and managed if you have permission). In fact, I'd probably make creator override behave exactly like permissions (with owner, group, and everyone - just to be consistent).

Item 6 (versioning) is actually one of the easier things to implement (it only sounds hard). With HFS+ you could have multiple versions of a file, each in their own fork. Opening would open the right (data) fork - unless you went through the interface to tell it otherwise. Each time you saved a file, you'd rename the old fork by date/time stamp, and add a new fork called data. That would be the Mac way.

Another way to implement item 6, would be to use a real journaling filing system (one that a file is made up of all the changes to it - so everything you do is just recorded as a change). This means you could easily eliminate the save button all together - everything would be saved a part of the sequence of things you'd done to that file. But this has some flaws as well. I prefer a mix of journaling and forking. You journal each version until someone save's (Snapshots) the version. Then they can much quicker get to that version/variant that they snapshotted. (The save/snapshot is a marker, and a new clean journal).

So that fixes it under HFS+, if you wanted to go that path. But what if you hate multi-forked filing systems. (We can argue the merits or weaknesses of that separately).

Well, it is OK. I have a fix for UNIX / Flat filing systems as well. Every document (file) in Mac OS X should really be a folder. That folder (lets call it Doc-folder) should just be named as a UNIQUE KEY / identifier in the system. All of the attributes (metadata and data) for that doc-folder would be contained inside itself that folder. You could easily add all the attributes that I described. Even the name for that folder would be in a little file called "filename" - that could be something like a tag (XML) delimited file, that had variants for different languages/language systems, or even per user (So a user could rename a file for them, but other users would still see the original name). It could also have a history of names, so that you could find files that were ever named something.

The really nice thing about this solution, is that the unique key means that to reference any file on the system, you just need the key (not the path). This is similar to how HFS or many databases work (at the lowest level) and what makes it cool. So all the problems with renaming files or directories go away - the computer and programs deal with the key (which is persistent), and users deal with the attributes. Users could name the file anything (any mix of any special characters), and the computer could never lose track or have to worry about what the users see, because there would be 100% perfect separation between the two. The key is can easily be large enough that you could either have 100% unique filenames for every file (perfect uniquing) in the entire world - so you'd never have a collision. In fact, that encoded name, could have some nice domain spaces of its own, to allow the system to have parts that are unique and some that are common, for added capabilities that are beyond the scope of this article. But even if you didn't want to go that far, and you just have files unique to a drive (like HFS), you could easily fake greater uniqueness, and when a document changes drives, the original key (and source) is added into a histogram file in the file's folder - thus you have a way to find all copies of that file across other drives, or across a network.



The sad thing is that OS X's way, Windows way, or other systems that layer on top are all a non-melding of two incompatible systems that aren't meshing well. (Impedence mismatch). Covering the problem up, or trying to, only creates many new problems, that are often worse than the original.

The Mac still handles user control of the name domain better, and handling of ownership. But it falls apart in the multi-user domain. Extensions just suck on all levels. Mac OS X will never work well from a user point of view, unless they address the underlying problems, and stop trying to cover up the symptoms. Covering the symptoms almost always fails - look at Windows. Too many people need to know too much about extensions, and then there are security holes because of them. (Like users getting files that are "virus.jpg.vbs" or "Trojan.gif.exe").

It is time to address the root of the problem, and separate the name domain from the program/programmer attributes (like type). But as long as we're fixing that, lets do a little engineering, think forward and solve some problems for the future as well.

Format for Printing  Mail 

  About | Contacts | Privacy

Copyright 2003 DKE • All rights reserved • www.iGeek.comLegalese