- Created a microservices "organization" on github to hold the community-driven source code repositories. Before, the projects were held under a personal account that had a diversity of projects in various states of use and support. By creating a topic-driven organization, we hope to attract contributors and promote easier discovery of these projects
- Created a mailing list to record decisions, answer questions, and collaborate.
- Agreed to a set of standards and practices for microservices projects to ensure consistency and quality across these projects:
- Basic "meta" files -- like README, TODO, LICENSE, etc -- should be present and contain enough information to help people get started using and contributing to the projects
- Clarified source code licenses, and standardized on the Apache Public License 2.0 for each project.
- Vastly improved the source code testing and documentation coverage, and standardized around rspec and yard. Projects are now subject to continuous integration to ensure tests pass, documentation is built, and test coverage remains high.
AccessTo address the need to expose archival content in a sustainable manner, for a variety of audiences, and to encourage innovation within media archives, WGBH created Open Vault2, which provides a digital access portal into a cross-section of material from the WGBH Media Library and Archives. Although designed as an access portal, a secondary objective in creating Open Vault was to explore the potential for the system to fit within the multifaceted content management ecosystem for both access and preservation use. WGBH Open Vault is built using Blacklight3, Solr and the Fedora repository. Beyond the Open Vault user interface, we exposed a number of APIs, either for internal use or to support existing data exchange projects, including Atom/RSS feeds, unAPI4, oEmbed5, and OAI-PMH. By taking advantage of existing open-source solutions as much as possible, we were able to focus our efforts towards domain-relevant issues. This has proven a reliable platform, and we have since deployed similar technology for a couple cross-institutional, data-intensive projects. In 2006, WGBH launched Open Vault, an access repository based on CWIS. This site combined clips of media assets from four different series (three of which had separate finding aid websites created earlier). In 2008/9, WGBH MLA and Interactive completed an Andrew W. Mellon Foundation funded project which allowed us to work closely with humanities scholars researching their needs and habits in using digital media in their work. We developed a prototype, dubbed "Open Vault Research", using Fedora and a PHP front-end. One discovery was scholars lack tools for working with media, while traditional scholarship is still focused on citing textual resources. To address this, we created a number of tools for working with media material: - aligned transcripts, which allows the user to rapidly scan the transcript of an interview, and seek immediately to a section of interest; - annotations + tags, which allows the user to segment and describe media fragments and refer back to those notes later; - fragment addressing, which allows the savvy user to deep-link into a particular point in an object. Taking these user needs into account, we developed Open Vault v2 using Blacklight and the Fedora repository. Finally, we are about to deploy a new iteration of Open Vault using Blacklight 3.0 (and, as a footnote, although our application has significantly different behavior, the customizations are only about 3500 lines of code, more than half as HTML templates). Although the Hydra framework as matured significantly since the beginning of the project, because the management of the media and metadata is still performed in external systems, we continue to access the Fedora APIs directly. In this redesign, we looked at usage patterns over the collection and re-organized and re-prioritized elements of the user experience. - The majority of our users entered the website at a record page from an external search engine (with about a 50% bounce rate). However, if a user stayed and watched a video, often they would navigate the website to "related content" (exposed using solr more like this) - Subject browse was used more frequently than expected to give an overview of the materials in the collection
TechnologyFor our media player environment, we needed a technology that supports several requirements:
- the ability to jump into any point of an item, which is especially important when serving hour long raw interviews (which excludes standard delivery (over HTTP, or otherwise) of the content),
- an open source, or low cost, delivery platform,
OAI-PMHTo support "traditional" aggregation, like the Digital Commonwealth project, we have an OAI-PMH endpoint. (see also why OAI-PMH should die)
OpenSearch (blacklight)For other aggregation efforts, we provide an OpenSearch endpoint that allows simple machine-to-machine discovery in a standard way.
Atom/RSS (blacklight)All search results expose a discoverable Atom/RSS feed. Blacklight also provides functionality through the Document Extension Framework that allows clients to request specific representations of objects as part of the content of the feed.
unAPI (blacklight plugin)The unAPI endpoint allows applications to discover structured information based on an identifier and a content type.
oEmbed (blacklight plugin)oEmbed, rather than forcing implementors to discover media assets (through page scraping or unAPI), allows a client to discover the embeddable properties for an asset (and construct a player) in a standard way. oEmbed provides an easily parseable set of metadata required for embedding and, possibly, a pre- generated player implementation;
HTML <meta> tagsWhile encouraging re-use of materials, we documented possible improvements to make ad-hoc innovation and mash-up creation significantly easier, including:
- oEmbed, rather than forcing implementors to discover media assets (through page scraping or unAPI), introspect the assets for technical metadata, and then construct a player, oEmbed provides an easily parseable set of metadata required for embedding and, possibly, a pre- generated player implementation;
- additional information in the Atom/RSS feeds, in particular ensuring the data contained within the feed representations is comparable to the normal user interface;
- and, exposing additional information on the page for developer-use, which, in the case of technical or rights metadata, is less relevant to our primary audience, but may be essential to building third-party interfaces to content.