Repositories: What are they and what are they good for?
Semi-controlled-folksonomic-tagging-vocabulary: Encouraging Useful Metadata Contributions. New England Code4Lib, December 2008.
Beer, C., and Michael, C. Semi-controlled-folksonomic-tagging-vocabulary: Encouraging Useful Metadata Contributions. New England Code4Lib, December 2008.jQuery, OpenSearch and Autocomplete
Here's a quick code snippet for making JQuery's autocomplete ui element consume an OpenSearch resource:
jQuery('#term').autocomplete('/proxy/opensearch', {parse: opensearch}); function opensearch(data) { data = eval(data); var parsed = []; for (var i=0; i < data[1].length; i++) { var row = jQuery.trim(data[1][i]); if (row) { parsed[parsed.length] = { data: [row], value: row, result: row }; } } return parsed; }
Federated/distributed digital repositories
For the bVault project I am developing, one of our secondary goals is to create a replicable model for other digital media repositories. One of the ways we are pursuing this is to lay the foundations for an interface to a federated/distributed repository among other public broadcasters, which takes advantage of one of the architectural features of public broadcasting in the US‚ the public broadcasting network is really a federation of individual stations that subscribe and contribute to a particular programming distribution service (PBS and NPR among others)
A federated repository ultimately needs three things:
- A common API among the participating repositories,
- A search index that covers all the repositories, and
- A resolver to translate a search result back to the originating repository
Common API
For bVault, the common API is the set of web services exposed by Fedora, and the metadata translation dissemination service behind that, which allows a client to receive a particular metadata format, regardless of the underlying schema. This is an important feature, because it allows individual repositories to use whichever metadata format is most natural to their needs, while seamlessly generating interoperable metadata.
Search index
The exact methods employed to generate a spanning search index are essentially arbitrary. Solr provides some distributed/sharded search capabilities, but the index could also operate on a pub/sub model where repositories push content out to a master search index, or with a search engine like crawler using OAI-PMH endpoints for the repository. Because the search index is loosely coupled to the whole system, it ultimately is an architectural decision rather than a technical one
Distributed Resolver
Now that we have a way to discover items within a repository, the interface needs a way to extract the content from the origin. For this, we need a way to resolve a unique resource identifier (URI!) back to its source. Again, the method is somewhat arbitrary, but for this project, we elected to require unique namespaces for each repository (quite reasonable, considering the application).
To do this, I’ve slipped a namespace resolver into the client’s API call to allow the interface to act independently from the source of the content. For a simple API call, like listDatastreams, we have:
public function listDatastreams($pid, $asOfDateTime = null) { return Fedora_Repository::get('API-A', $pid)->listDatastreams(array('pid' => $pid, 'asOfDateTime' => $asOfDateTime)); }
This requests the API-A binding appropriate to the current persistant identifier (pid):
/** * Retrieves a Fedora Repository that can provide the $type endpoint for the PID/prefix $prefix * * @param string $type * @param string $prefix * @return Fedora_Repository */ static public function get($type, $prefix = '') { global $objManager; $arrRepository = $objManager->resolve($prefix); $objClient = new stdClass; if(count($arrRepository) == 1) { $objClient = $arrRepository[0]->getSoapClient($type); } else { $arrKey = array_rand($arrRepository, count($arrRepository)); foreach($arrKey as $key) { $objClient = $arrRepository[$key]->getSoapClient($type); if($objClient !== false) { break; } } } if($objClient instanceof SoapClient) { return $objClient; } else { return false; } }
Creating a distributed repository doesn’t cost much now, and if you design it right, you can benefit from the potential for redundancy and mirroring immediately, even before there is a federated network to tap into.
The full source is available from the bVault Fedora PHP library.
Zend_Cache for Web Services
My current project involves a number of SOAP Web Services requests to retrieve information from our Fedora repository. To help minimize overhead from HTTP requests, I’m using Zend Framework’s Zend_Cache_Frontend_Class to wrap the whole Fedora/PHP interface class. Zend_Cache allows me to implement this style of caching with only a single line of code.
Our web services consumer provides a couple of access methods that can be safely cached:
class Fedora_Object { /* .... */ public function getDissemination($pid, $sDefPid, $methodName, $parameters, $asOfDateTime = null) { try { return Fedora_Repository::get('API-A', $pid)->getDissemination(array('pid' => $pid, 'serviceDefinitionPid' => $sDefPid, 'methodName' => $methodName, 'parameters' => $parameters, 'asOfDateTime' => $asOfDateTime)); } catch(SoapFault $s) { return $s; } } /* .... */ }
In the bootstrap file, instead of initializing the Fedora_Object class, I wrap it in a Zend_Cache instance:
$fedora = Zend_Cache::factory('Class', 'File', array('cached_entity' => new Fedora_Object(), 'cached_methods' => array('getObjectXML', 'getDatastreamDissemination', 'getDissemination'), 'cache_by_default' => false));
This code tells Zend_Cache to cache only the specified cached_methods and pass everything else through. Easy.