8. The Inotify Module

One of the features of a document manage system is that you should be able to submit documents to it somehow and it automatically processes them. For example you might set up a Samba share and connect it with a copy machine such that the machine transfers all files it scans into the share. It would be nice then if your document management system could automatically detect this and process the document as soon as it arrives. This is where a Linux feature called inotify can be handy. This is a kernel feature providing a simple API that allows you to monitor changes to files or directories in real time. DMS includes a simple Ruby extension that encapsulates this API. Here's a simple example to show how it works. Say out Samba share is on the /tmp/scans directory and we want to have a Ruby program monitor it, and import files as soon as they come it. This is how we would do it:


#!/usr/bin/ruby1.9.1

require 'jw/dms'

# Create inotify instance
inotify = JW::DMS::INotify.new()

# Listen
inotify.addWatch('/tmp/scans', IN_CLOSE_WRITE)

# Loop forever
while true

  # Listen for file close events. Blocks until something happens.
  inotify.getEvents() do |wd, mask, path|

    # path is relative to watched folder. Compute the full path of file.
    full_path = File.join(inotify.getPath(wd), path)

    # Make sure it is a regular file
    if File.file? full_path  
      import_into_dms full_path    
    end
  end
end

For the most part, this example is pretty intuitive. But there inotify.getPath() might not make sense. So here's how it works. Each Inotify instance can in fact monitor not one but multiple directories (and files). Each time you call addWatch(), you are creating a new "watch descriptor" which represents the file or directory you are watching. Internally, the Inotify instance keeps a map of each watch descriptor and the respective path that descriptor refers to. As events happen, Linux queues them up in the main inotify queue, which sits waiting for you to read it. The getEvents() method reads all pending events on the queue and yields them to the Ruby block. There are three arguments to the block:

With all this information, you have everything you need to know what happened where, and the ability to watch as much as you want all at once. The remaining question is how to you get the actual path name from the watch descriptor. This is where the Inotify#getPath() method comes in. You pass it a watch descriptor and it will pass back a string containing the full path of the file/directory that descriptor refers to.

Table 2.2. Inotify Constants

NameDescription
IN_ACCESSFile was read from.
IN_MODIFYFile was written to.
IN_ATTRIBFile's metadata (inode or xattr) was changed.
IN_CLOSE_WRITEFile was closed (and was open for writing).
IN_CLOSE_NOWRITEFile was closed (and was not open for writing).
IN_OPENFile was opened.
IN_MOVED_FROMFile was moved away from watch.
IN_MOVED_TOFile was moved to watch.
IN_DELETEFile was deleted.
IN_DELETE_SELFThe watch itself was deleted.


That's basically it. The inotify interface is a powerful asynchronous mechanism for monitoring files and directories in real time. This Ruby extension just allows you to tap into that power from Ruby, extending DMS general capabilities.