A Fedora in a Pairtree

The California Digital Library (CDL) has released a number of exciting micro-services specifications for digital libraries. The Fedora repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily. Here is a first attempt at implementing the Pairtree filesystem hierarchy for Fedora:

package fedora.server.storage.lowlevel;

import java.io.File;
import java.util.Map;

import fedora.server.errors.LowlevelStorageException;

/**
 * @author Chris Beer
 */
class PairtreePathAlgorithm
        extends PathAlgorithm {

    private final String storeBase;

    private static final String SEP = File.separator;

    public PairtreePathAlgorithm(Map configuration) {
        super(configuration);
        storeBase = (String) configuration.get("storeBase");
    }

    @Override
    public final String get(String pid) throws LowlevelStorageException {
        return format(pid);
    }

    public String format(String pid) throws LowlevelStorageException {
        String pt = to_pairtree(pid);
		return storeBase + pt + "obj" + SEP + pid;
    }

    private String to_pairtree(String s) {
		String pt = SEP;
		String src = escape(s);

		int i = 0;
		while(i < src.length()) {
			pt += src.substring(i, i+2) + SEP;
            i+= 2;
		}

		if(i < src.length()) {
			pt += src.substring(i);
		}

		return pt;
    }
    private String escape(String s) {
		/*
		 Fedora PIDs do not support non-visible ASCII or the characters below,
		 so we skip hex encoding:
		 "   hex 22           <   hex 3c           ?   hex 3f
		 *   hex 2a           =   hex 3d           ^   hex 5e
		 +   hex 2b           >   hex 3e           |   hex 7c
		 ,   hex 2c
		 */
		return s.replace("/", "+").replace(":", "+").replace(".", ",");
    }
}

See also: http://gist.github.com/280020 This basic services replaces the Timestamp Path algorithm for FOXML storage and creates a minimally compliant Pairtree. A better implementation could add:

Splitting Fedora datastreams into individual files on the filesystem. A first step would be to implement an appropriate managed content mapper
Add the appropriate identifier cleaning specified in §3. Much of this was omitted in this implementation, with the assumption that the repository core would handle identifier validation
The implementation should support pairtree initialization (§4). The current assumption is the repository maintainer would pre-establish a pairtree hierarchy for Fedora to populate. To do this properly, I think one would need to override the DefaultLowlevelStorageModule to add an initialization step.