Open source software security

Relational Filesystems

30 November -0001

I just began a project that I think might hold a lot of promise for me. I’m working on an online document storage application. In a nutshell, documents can be uploaded into a database and classified for sharing with other users. It’s a fairly straightforward use of a web based database application, but it got me thinking.

Finding documents in such an application is much easier than finding them on my hard disk. In fact, as the number of documents on my actual computer increases, the more difficult it becomes to find them. Even if you take the time to create carefully structured directory trees and classify each document, at a certain point the directory tree becomes so large that it isn’t very usable any more. Despite the fact that information is carefully organized, it can be hidden deep within a convoluted structure and difficult to retrieve.

One of the problems with this situation is that traditional filestructures are based on physical models of documents and folders. Anyone who has a messy office knows that no matter how many folders and filing cabinets you use, at a certain volume the organization is no longer very helpful. This is precisely why many people store things in piles or leave things scattered around their desks. People tend to have good spacial memories, and it’s often easier to remember that a document is “near the edge of the desk” than it is to remember that it is in the “work” cabinet, under the “November” folder, in the “receipts” division.

Many organizations with large or complex file structures turn to databases to solve their information organization needs. By putting documents in a relational database and tagging them with meta data it becomes much easier to search for and retrieve those documents. The primary reason for this is nothing very magical. In a filesystem documents are stored in exactly one location. Sure, you can link documents through other locations, but that is a manual process, and not very effective. In a relational database documents are stored based on taxonomy, rather than physical location. A document could be in a ‘tax’ category, and simultaneously in a ‘personal’ category, without having to link the document. Simply specifying good meta data allows the document to be reached by following many paths of relationships through the database.

This got me wondering why there are no popular relational filesystems. Many operating systems have attempted to address this problem with flashy features such as photo organizing software, or advanced desktop searches, but this doesn’t alleviate the problem. When I go to the ‘file’ menu in my favorite word processor, and click ‘open’ I’m presented with a standard filesystem interface. Even if I do manage to have my document cleverly classified in a database or application elsewhere, this organization becomes useless through the most basic interface of the operating system. The user, in essence, is limited to the organizational application, and without it the careful classification becomes useless. Because information has to be shared across the entire operating system and overlying applications it would be much simpler to restructure the filesystem to reflect a relational model.

How wonderful would it be if you could open up your documents folder, click through a couple of folders to find a document that could be found through several other relations. Although a typed search is often easiest to quickly find documents (as evidenced by the success of Google), you could easily represent taxonomies via graphical icons. After all, your computer should be cognizant of all the taxonomies with which the user has classified documents, so only those taxonomies could be presented.

I think we may still be a ways away from a true relational filesystem, but if I was advising any operating system manufacturers I would tell them that one of the big reasons desktops are losing out to the web is simply that the web is better organized. There’s no excuse for users to not be able to find their own documents, and it is truly remarkable that for most users it is easier to find information online than it is to find the same information on their own hard disks.

An even greater coup would be to integrate this sort of filesystem storage with the myriad of other information that users interact with on a daily basis. Emails, text messages, and other information typically aren’t even stored on the filesystem in any meaningful way. Integrating all the users information into a universal relational data store could save users time and frustration. Because many projects are often carried out in a variety of different informational forms but share common themes a relational store would be the ideal way to classify and categorize cross media information. Integrating such information with the filesystem would minimize the number of applications a user would need to call in order to locate desired information. Again, another coup of the web is that information is stored in universal formats. Even non-HTTP information stored online is served with browser plug-ins, meaning that even if the media isn’t hypertext, the user never has to shift applications and can serve all the information in a web browser. Even this simple functionality is unmatched by the filesystem currently.

Perhaps the traditional filesystem model suffers from issues of legacy. So much backwards compatibility would have to be changed to implement a new filesystem that it might not be worth the hassle. It may be a pie-in-the-sky dream, but it seems like one of the best ways to have a meaningful impact on a computer users desktop experience. Because the web has leapt so far ahead of the desktop in terms of usability, most people interpret the computer as synonymous with the network connection, and a computer without the network as useless as an appliance without power.