Each document has a single record containing all of its metadata, which consists of the following fields:
Table 2.3. DMS Metadata
Name | SQL Type | Owner | Required | Description |
---|---|---|---|---|
fileid | integer | System | Yes | The system identifier (primary key) uniquely defining a document within the system. |
size | integer | System | Yes | The size of the document. |
ctime | integer | System | Yes | The epoch time the file was created. |
mtime | integer | System | Yes | The epoch time that the file was last modified. |
hash | text | System | Yes | The SHA1 hash of the document. |
guid | text | System | Yes | An optional UUID to associate with the document. |
name | text | Application | No | The name of the document. This name is not unique within the system. It is up to the application to provide naming consistency. |
description | text | Application | No | The file's description. |
All fields that are owned by the application are not used in any way by the
system. Conversely, all system files are managed exclusively by the system and
are read-only to the application. Basically, all the system concerns itself with
is a single primary key with which to identify a file, and the corresponding
basic OS file attributes. The application concerns itself with what these files
actually are, who they belong to, and what they contain. The application
provides the meaning of the content, while DMS simply provides the storage
facility. Its worth noting that there is nothing preventing the application from
adding its own metadata. Since it has access to the fileid
, it
can simply create new tables containing whatever additional information it
desires, using foreign keys to relate the information.
Also, it doesn't matter what the table is named. By default, the
Repository
classes assumes that the table is named
file
and it's primary key fileid
. But you
can change this in its constructor by passing the table name as the fourth
argument. Then it will use that name and assume that the primary key is of the
form {name}id
. Additionally, it also assumes that the key's
sequence is of the form {name}_{name}id_seq
(which is
PostgreSQL's default when you declare a SERIAL
type for the
primary key).
The raw filesystem layout is implementation detail that the application need not concern itself with. DMS system provides the raw file objects for the application to work with, there location is therefore immaterial. Nevertheless, there are a few things worth mentioning.
As the files are ordered sequentially by the fileid
, so
the files are layed out numerically as well. To keep directories from becoming
huge, files are layed out using a scalable mapping scheme so that there are no
more than 1000 files per directory. Documents are nested according to each three
digit sequence in a document's ID. Thus, a document with ID 124452178 would be
filed as:
/{root} 124/ 452/ 178
The actual file name would be 178.dat
. Again, this is of no
direct concern to the application, but it does provide a way of keeping the size
of the directories in the file system managable. All files have a
.dat
extension.
One final point to address is where the file systems and data can be relative to the application. The location of database is simply an administrative issue. As with any application, it can reside on the same machine or another — it's simply a matter of configuring what PostgreSQL.
DMS uses PostgreSQL advisory locks to manage access to files, not filesystem locks. That is, concurrency is managed through the database. That said, the file system still needs to be accessible as if it were local to the system, as it has to be able to directly access the actual files. It is prefectly acceptable to use distributed file systems such as NFS however. Large applications can consist of multiple web servers that NFS mount the same document root.
Since both components can be hosted on other machines, it is fair to say that DMS is in fact a distributed system.