One of the features of a document manage system is that you should be able
to submit documents to it somehow and it automatically processes them. For
example you might set up a Samba share and connect it with a copy machine such
that the machine transfers all files it scans into the share. It would be nice
then if your document management system could automatically detect this and
process the document as soon as it arrives. This is where a Linux feature called
inotify
can be handy. This is a kernel feature providing a
simple API that allows you to monitor changes to files or directories in real
time. DMS includes a simple Ruby extension that encapsulates this API. Here's a
simple example to show how it works. Say out Samba share is on the
/tmp/scans
directory and we want to have a Ruby program
monitor it, and import files as soon as they come it. This is how we would do
it:
#!/usr/bin/ruby1.9.1 require 'jw/dms' # Create inotify instance inotify = JW::DMS::INotify.new() # Listen inotify.addWatch('/tmp/scans', IN_CLOSE_WRITE) # Loop forever while true # Listen for file close events. Blocks until something happens. inotify.getEvents() do |wd, mask, path| # path is relative to watched folder. Compute the full path of file. full_path = File.join(inotify.getPath(wd), path) # Make sure it is a regular file if File.file? full_path import_into_dms full_path end end end
For the most part, this example is pretty intuitive. But there
inotify.getPath()
might not make sense. So here's how
it works. Each Inotify
instance can in fact monitor not
one but multiple directories (and files). Each time you call
addWatch()
, you are creating a new "watch descriptor"
which represents the file or directory you are watching. Internally, the
Inotify
instance keeps a map of each watch descriptor and
the respective path that descriptor refers to. As events happen, Linux queues
them up in the main inotify queue, which sits waiting for you to read it. The
getEvents()
method reads all pending events on the
queue and yields them to the Ruby block. There are three arguments to the block:
The watch descriptor. This refers to the watch the event is associated with. You have to have this when you watch multiple files/directories otherwise you wouldn't know where the event is happening.
The path. The path is the patch of the file or directory
relative to the watched directory. When the event occurs
on the watch directory itself (as opposed to a file or subdirectory within
it), then this argument is nil
.
The event mask. This is a bitmask of all of the events that occured on
the target file/directory. These events are defined by a set of constants
listed in Table 2.2, “Inotify Constants”. This mask is used here and in
the addWatch()
method where you provide a bitmask of
the events you want to monitor.
With all this information, you have everything you need to know what
happened where, and the ability to watch as much as you want all at once. The
remaining question is how to you get the actual path name from the watch
descriptor. This is where the Inotify#getPath()
method comes
in. You pass it a watch descriptor and it will pass back a string containing the
full path of the file/directory that descriptor refers to.
Table 2.2. Inotify Constants
Name | Description |
---|---|
IN_ACCESS | File was read from. |
IN_MODIFY | File was written to. |
IN_ATTRIB | File's metadata (inode or xattr) was changed. |
IN_CLOSE_WRITE | File was closed (and was open for writing). |
IN_CLOSE_NOWRITE | File was closed (and was not open for writing). |
IN_OPEN | File was opened. |
IN_MOVED_FROM | File was moved away from watch. |
IN_MOVED_TO | File was moved to watch. |
IN_DELETE | File was deleted. |
IN_DELETE_SELF | The watch itself was deleted. |
That's basically it. The inotify
interface is a
powerful asynchronous mechanism for monitoring files and directories in real
time. This Ruby extension just allows you to tap into that power from Ruby,
extending DMS general capabilities.