blog.cbeer.info

Oct 26, 2010

Useful Standards for Public Media Projects: unAPI

The oEmbed standard is great for exposing embeddable content in way that, most of the time, the average user never has to think about it -- it just works. However, with more complex data, including multiple content objects or underlying metadata formats, something less opinionated is necessary. This is the niche unAPI plays a simple, but powerful, role. Like oEmbed and other standards, unAPI defines a service endpoint with a handful of basic operations and discovery mechanism, and does so in a plain and obvious manner that makes it easy for tech-inclined folk to work with. For an archives project at work, we implemented unAPI as a simple way to segment the page content and expose underlying metadata formats, in order to offer our partners a quick (and content agnostic) way to pull elements into existing tools. The API endpoint is exposed within the page header:

&lt;link rel="unapi-server" type="application/xml" title="unAPI" href="http://openvault.wgbh.org/api/unapi/" /&gt;

Source: http://openvault.wgbh.org/catalog/org.wgbh.mla:0119be4cad49d0c0f47e9eca1d343e0464539a4c

Within the page, any number of unAPI IDs are embedded:

&lt;abbr class="unapi-id" style="display: none" title="org.wgbh.mla:0119be4cad49d0c0f47e9eca1d343e0464539a4c"&gt;&lt;/abbr&gt;

Source: http://openvault.wgbh.org/catalog/org.wgbh.mla:0119be4cad49d0c0f47e9eca1d343e0464539a4c

An unAPI client can request a list of formats from the service:

&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;formats id="org.wgbh.mla:0119be4cad49d0c0f47e9eca1d343e0464539a4c"&gt;
  &lt;format type="application/xml" docs="http://www.openarchives.org/OAI/2.0/oai_dc.xsd" name="oai_dc"/&gt;
  &lt;format type="application/xml" docs="" name="pbcore"/&gt;
  &lt;format type="image/jpeg" name="jpeg"/&gt;
&lt;/formats&gt;

Source: http://openvault.wgbh.org/api/unapi?id=org.wgbh.mla:0119be4cad49d0c0f47e9eca1d343e0464539a4c

These formats could be used to share any type of data -- different flavors of metadata, content outside the application context. As with oEmbed, this provides a basic way to provide federated and aggregated data within a common framework. In addition to being a convenient service to share content among applications, there is also support for unAPI within the Zotero citation management tool. What could this look like in order to access NPR.org content? Instead of a rich API, by simply feeding a story url into the unAPI service, an application could retrieve the different content elements -- a text/html or text/plain representation of the story, the audio/mp3 from the broadcast, an image/jpeg feature image, and perhaps an application/xml+rss feed of the series, comments, or category. It lacks the power behind the stand-alone API, but provides data in a form that is a little easier to craft new ways of highlighting this content on station websites.

Oct 24, 2010

Useful Standards for Public Media Projects: oEmbed

oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.

oEmbed allows the content provider to expose content from a web page (along with a basic set of metadata to support that), allowing an application to embed content on behalf of a user without assuming the user knows what to do with raw HTML code. Here's example JSON output from Youtube:

{"provider_url": "http:\/\/www.youtube.com\/", 
"title": "NOVA | Emergency Mine Rescue",
 "html": "<object width=\"480\" height=\"295\">[...]<\/object>", 
"author_name": "NOVAonline", 
"height": 295, 
"thumbnail_width": 480,
 "width": 480, 
"version": "1.0", 
"author_url": "http:\/\/www.youtube.com\/user\/NOVAonline", 
"provider_name": "YouTube", 
"thumbnail_url": "http:\/\/i3.ytimg.com\/vi\/NUrLEKfHB_0\/hqdefault.jpg", 
"type": "video", 
"thumbnail_height": 360}

Source: http://www.youtube.com/oembed?url=http://www.youtube.com/watch?v=NUrLEKfHB_0

When given a URL, for which there is an oEmbed endpoint defined or discoverable, an application can query the oEmbed service to retrieve the embed code and automatically insert it into the page. The great thing about this standard is aggregating media from any compliant source is now as easy as writing text, with all the heavy lifting done in the background. --- ~~In preparing this post, I noticed PBS Video is offering the oEmbed discovery endpoint, however the offered URL returns a 404 error rather than embed content. So close.~~ (looks like oEmbed works on some videos and not others..)

Oct 17, 2010

Useful Standards for Public Media Projects: FOAF

This series of post was inspired by Barrett Golding's Hacks & Hackers digital projects round-up, which highlights some high-level initiatives coming out of public media, some of which may be developing or adopting standards for content distribution, aggregation, or preservation. The challenge with many standards in public media is they depend on infrastructure, systems, or getting a large enough group within the community to commit to supporting it. I'm hoping this is a practical set of several easy-to-implement standards in public media digital projects, especially as organizations are thinking about ways of building communities, becoming better neighbors, talking about aggregation and decentralization, and more. The standards I'm discussing are used in contexts much large than public media, are inherently useful, and systems-agnostic.

Friend of a Friend (FOAF)

There is currently no centralized, accessible database of public media organizations, and no group who is willing to take on the headache of populating, maintaining and administering such a creation. Fortunately, most organizations (and likely all the salient ones..) have some web presence and we can use the already-decentralized web to model our decentralized organizational structure. From the project website:

The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people, the links between them and the things they create and do; it is a contribution to the linked information system known as the Web. FOAF defines an open, decentralized technology for connecting social Web sites, and the people they describe.

Within the context of public broadcasting, what can FOAF do? (In a gross simplification), if every organization published an authoritative FOAF document, containing any information each station thought relevant, we could link, aggregate and query the decentralized data set to begin to answer any number of questions programmatically (where is the closest NPR station? what is the URL for streaming audio for station XYZ? what is the pledge phone number for every station in Wisconsin?). Here's a quick demonstration document for a public media station:

<rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:admin="http://webns.net/mvcb/">
<foaf:PersonalProfileDocument rdf:about="">
  <foaf:maker rdf:resource="#wkar"/>
  <foaf:primaryTopic rdf:resource="#wkar"/>
</foaf:PersonalProfileDocument>
<foaf:Organization rdf:ID="wkar">
<foaf:name>WKAR</foaf:name>
<foaf:age>78</foaf:age>
<foaf:mbox rdf:resource="mailto:webmaster@wkar.org"/>
<foaf:phone rdf:resource="tel:+15174329527" />
<foaf:homepage rdf:resource="http://wkar.org"/>
<foaf:weblog rdf:resource="http://www.publicbroadcasting.net/wkar/news.newsmain" />
<foaf:tipjar rdf:resource="http://wkar.org/give/" />
<foaf:tipjar rdf:resource="tel:5174323120x371" />
<foaf:isPrimaryTopicOf rdf:resource="http://en.wikipedia.org/wiki/WKAR_(AM)"/>
<foaf:isPrimaryTopicOf rdf:resource="http://en.wikipedia.org/wiki/WKAR-FM"/>
<foaf:isPrimaryTopicOf rdf:resource="http://en.wikipedia.org/wiki/WKAR-TV"/>
<foaf:depiction rdf:resource="http://wkar.org/images/wkar-w-140x50.gif"/>
<foaf:logo rdf:resource="http://wkar.org/images/wkar-w-140x50.gif"/>
<foaf:member rdf:resource="http://www.mprn.org/foaf.rdf#mprn" />
<foaf:member rdf:resource="http://www.pbs.org/foaf.rdf#pbs" />
<foaf:member rdf:resource="http://www.npr.org/foaf.rdf#npr" />
</foaf:Organization>
</rdf:RDF>

I'm making no claims about the accuracy, correctness or well-formedness of this document, I'm just offering it as an example of what could be done. Because this format is just an RDF document, it is trivial (and encouraged) to extend the FOAF vocabulary with domain-specific information from other sources, e.g. the BBC Programmes ontology. FOAF has rules and methods for FOAF document discovery and by creating a mesh of organizations we could assemble a full, real-time picture of public media organizations without the overhead of centralization and contribute to a tiny piece of the larger web of knowledge without significant work on the part of any single individual.

Oct 10, 2010

Public Media Links

Zeitgeist - the most shared BBC links on Twitter : Earlier this summer, the BBC R&D group created Zeitgeist dashboard. Recently, they released the code as open source on github. It seems like there's a ways to go before it is easily replicable, but very interesting nevertheless.
Backstage: Proposed Q&A site for those involved in the technical aspects of television and radio broadcasting, including the publication and consumption of related metadata and APIs, and content delivery over the Internet.
Open Video Conference: There was a sizable public media contingent at OVC this year and a lot of ideas about the direction and future for public media.
HTML5 Video Player Comparison: Following on OVC, this is a great list of HTML5 video players, which often also include a javascript API wrapper for fallback players. I need to spend time updating my video utilities to use the new HTML5 APIs.
PublicMediaCamp '10: Public Media Camp 2010 has been scheduled for November 20th and 21st in Washington, D.C.
Michael Edson: Fast, Open, and Transparent: Developing the Smithsonian's Web and New Media Strategy: Created earlier this year, it re-surfaced again recently. The presentation articulates many of the same issues faced by public media (decentralization, unexpected rivals, brand identity, relevance, and "thermocline" issues are the "pain points" Michael Edson identifies).

Sep 12, 2010

Gallery of Station Websites: Progress

After two weeks of use, the station gallery has had a fair amount of traffic and feedback (bolstered by a write-up in Current and a couple station resource sites. While I've been revising the site all along, I rolled out some more substantial changes this weekend, which should improve the functionality and performance of the gallery. First, and most obvious, I did a little bit of styling work to take it from barely functional to ugly-but-useable. If anyone has design ideas for the site, please leave a comment. Other web galleries usually severely crop their images, but I'm attracted to the full page screenshots, but that comes with a whole range of other design challenges. As part of the design work, I've moved the "user-generated" content forms to the gallery view (and eliminated the station-view entirely), which I hope will encourage some conversation. I toyed with the idea of integrating with services like twitter and delicious to provide comments and tagging, respectively, but I couldn't come up with a good way to distinguish between conversation and criticism. I've been working on a couple ideas for a proper home page to the gallery to help give the site some context besides these blog posts. I'll try to make some more progress on this as long as people believe this could be a useful resource for the public media system. Shortly after the first prototype, I started migrating data into Solr (using the ruby sunspot library) and earlier this week I added full-text search for page content (which may or may not match the screenshot content). I'm still playing with approaches to crawling station websites to extract different types of pages (schedules, contact, news stories, features, etc) with Anemone, hopefully I will have something interesting in the next couple weeks. I'm still working out ways to automate (and schedule re-occuring) screenshot updates, which is complicated by the lack of decent cross-platform tools. On Mac OS X, I've been using webkit2png, which has been great, but this server is running Debian Linux and the only comparable utility I've found is wkhtmltopdf, which requires a patched version to QT. Messages queues or workflow engines seem like overkill, so in the meantime I'll do manual updates occasionally. As always, the source code to the site is available at http://github.com/cbeer/publicmediatech-stations for anyone interested in hacking on the gallery or just seeing how it was put together.