15 ways to improve PBCore
This is a post describing shortcomings and potential improvements for PBCore, an XML markup for media material interchange. These suggestions try to work within the current confines of PBCore, rather than introducing radical changes (which could bring PBCore more in line with the rest of the XML and linked data worlds). Further, we recognize the strength of PBCore is in descriptive metadata, and these suggestions are primarily to strengthen those components, rather than trying to compete on technical metadata.
- Define what all the data dictionary elements mean — “clip”, “element”, “actuality”, “version of”, etc. These need to be defined in order for the community to better apply consistently. Other communities have come up with these already – we just need to determine which ones apply to which elements. See for example, the European Broadcasting Union does a nice job of distributing machine-readable XML definitions for their data dictionary.
- Enhance semantics of relation types by creating an ontology (using rdfs or similar, like the Fedora RELS-EXT ontology) – eg. instead of simply “version of” allow “derivation of”, “copy of” “identical to” etc.
- PBCore only has contextual date on individual instantiations, but we want an overall date with types for created/issued/etc (e.g. the date an interview was conducted). A similar issue exists for locations. Both of these are different from pbcoreCoverage — coverage is about the content, rather than the context.
- Format of the content — whether it is an interview, a panel discussion, a live event, b-roll, beauty shots, etc. formatGenerations provides a piece of this puzzle, but this is ultimately descriptive metadata, which probably don't belong in an instantiation. EBUCore provides for part of this with a controlled vocabulary for editorial formats, but it’s not granular enough (e.g. Discussion/Interview/Debate/Talkshow). Our suggestion is to explore enhancing the genre data dictionary to include archival descriptors like “interview” “b-roll”, which would solve this in a backwards-compatible way.
- Machine parseable rights language; we're embedding the Open Digital Rights Language (ODRL) as a member of pbcoreRightsSummary, but it would be nice to have a common way to express rights (both rights the publisher has, and rights granted by the publisher to the user). An alternate (and perhaps desirable and necessary) solution would be to at least investigate better ways to combine PBCore with established schemas like ODRL, MODS, etc.
- A way to identify the primary title and description of an asset, for use in a discovery interface. Existing solutions, like picking titles based on hierarchy, or using a separate metadata document, are flawed.
- A formal way to order, prioritize, and relate instantiations within a record (e.g. programs within a series, provenance/hierarchy of digital instances).
- A way to label the type for a pbcoreSubject is (e.g. person, organization, place, date, etc), in addition to the existing authority reference.
- Authority references should be available in most (if not all) PBCore containers, which could help enable linked data applications. This could be accomplished through new xml attributes, which would be ignored by legacy applications, and perhaps better in line with other standards.
- Better handling of "element" level materials, for archival raw footage and similar. Finished programs are handled decently in the existing PBCore, but the data dictionaries aren't prepared for this level.
- Adopt proper RDF relationships for PBCore relations.
- Consider adding educational levels and standards. PBCore currently addresses this tangentially with audienceLevel and audienceRating.
- Better way to handle metadata about people, whether by enhancing the existing structure, supporting an hCard microformat, or otherwise.
- Semantics to deal with thumbnails for discovery interfaces, or how to attach visual representations/facsimiles of a PBCore media instantiation. This is probably a low priority, nice to have change.
- Content flags, which include advisory messages about sensitive content, are regularly created for broadcast programs, but PBCore doesn't provide a way to capture these. Perhaps the best way here is to add time-based metadata to the descriptive material (but, then, what do you base the timecode against? See next.)
- BONUS: Add timecode information to instantiations and relationships to identify sections of content, in order to support time-based metadata, content flags, etc.