9. Implementation

9.1. Database (Metadata)

Each document has a single record containing all of its metadata, which consists of the following fields:

Table 2.3. DMS Metadata

NameSQL TypeOwnerRequiredDescription
fileidintegerSystemYesThe system identifier (primary key) uniquely defining a document within the system.
sizeintegerSystemYesThe size of the document.
ctimeintegerSystemYesThe epoch time the file was created.
mtimeintegerSystemYesThe epoch time that the file was last modified.
hashtextSystemYesThe SHA1 hash of the document.
guidtextSystemYesAn optional UUID to associate with the document.
nametextApplicationNoThe name of the document. This name is not unique within the system. It is up to the application to provide naming consistency.
descriptiontextApplicationNoThe file's description.


All fields that are owned by the application are not used in any way by the system. Conversely, all system files are managed exclusively by the system and are read-only to the application. Basically, all the system concerns itself with is a single primary key with which to identify a file, and the corresponding basic OS file attributes. The application concerns itself with what these files actually are, who they belong to, and what they contain. The application provides the meaning of the content, while DMS simply provides the storage facility. Its worth noting that there is nothing preventing the application from adding its own metadata. Since it has access to the fileid, it can simply create new tables containing whatever additional information it desires, using foreign keys to relate the information.

Also, it doesn't matter what the table is named. By default, the Repository classes assumes that the table is named file and it's primary key fileid. But you can change this in its constructor by passing the table name as the fourth argument. Then it will use that name and assume that the primary key is of the form {name}id. Additionally, it also assumes that the key's sequence is of the form {name}_{name}id_seq (which is PostgreSQL's default when you declare a SERIAL type for the primary key).

9.2. Filesystem (Data)

The raw filesystem layout is implementation detail that the application need not concern itself with. DMS system provides the raw file objects for the application to work with, there location is therefore immaterial. Nevertheless, there are a few things worth mentioning.

As the files are ordered sequentially by the fileid, so the files are layed out numerically as well. To keep directories from becoming huge, files are layed out using a scalable mapping scheme so that there are no more than 1000 files per directory. Documents are nested according to each three digit sequence in a document's ID. Thus, a document with ID 124452178 would be filed as:

  /{root}
   124/
     452/
       178

The actual file name would be 178.dat. Again, this is of no direct concern to the application, but it does provide a way of keeping the size of the directories in the file system managable. All files have a .dat extension.

9.3. Locations

One final point to address is where the file systems and data can be relative to the application. The location of database is simply an administrative issue. As with any application, it can reside on the same machine or another — it's simply a matter of configuring what PostgreSQL.

DMS uses PostgreSQL advisory locks to manage access to files, not filesystem locks. That is, concurrency is managed through the database. That said, the file system still needs to be accessible as if it were local to the system, as it has to be able to directly access the actual files. It is prefectly acceptable to use distributed file systems such as NFS however. Large applications can consist of multiple web servers that NFS mount the same document root.

Since both components can be hosted on other machines, it is fair to say that DMS is in fact a distributed system.