A Fedora in a Pairtree
The California Digital Library (CDL) has released a number of exciting micro-services specifications for digital libraries. The Fedora repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily.
Here is a first attempt at implementing the Pairtree filesystem hierarchy for Fedora:
package fedora.server.storage.lowlevel; import java.io.File; import java.util.Map; import fedora.server.errors.LowlevelStorageException; /** * @author Chris Beer */ class PairtreePathAlgorithm extends PathAlgorithm { private final String storeBase; private static final String SEP = File.separator; public PairtreePathAlgorithm(MapSee also: http://gist.github.com/280020 This basic services replaces the Timestamp Path algorithm for FOXML storage and creates a minimally compliant Pairtree. A better implementation could add:configuration) { super(configuration); storeBase = (String) configuration.get("storeBase"); } @Override public final String get(String pid) throws LowlevelStorageException { return format(pid); } public String format(String pid) throws LowlevelStorageException { String pt = to_pairtree(pid); return storeBase + pt + "obj" + SEP + pid; } private String to_pairtree(String s) { String pt = SEP; String src = escape(s); int i = 0; while(i < src.length()) { pt += src.substring(i, i+2) + SEP; i+= 2; } if(i < src.length()) { pt += src.substring(i); } return pt; } private String escape(String s) { /* Fedora PIDs do not support non-visible ASCII or the characters below, so we skip hex encoding: " hex 22 < hex 3c ? hex 3f * hex 2a = hex 3d ^ hex 5e + hex 2b > hex 3e | hex 7c , hex 2c */ return s.replace("/", "+").replace(":", "+").replace(".", ","); } }
- Splitting Fedora datastreams into individual files on the filesystem. A first step would be to implement an appropriate managed content mapper
- Add the appropriate identifier cleaning specified in §3. Much of this was omitted in this implementation, with the assumption that the repository core would handle identifier validation
- The implementation should support pairtree initialization (§4). The current assumption is the repository maintainer would pre-establish a pairtree hierarchy for Fedora to populate. To do this properly, I think one would need to override the DefaultLowlevelStorageModule to add an initialization step.