blog.cbeer.info

Jun 14, 2009

Managing Wordpress plugins with svn:externals

I'm using subversion to make the process of updating wordpress easy (installing an upgrade is a simple `svn update`). To extend that simplicity to plugin management, I'm using the svn:externals property to automatically update my plugins. Installing it is simple: `svn propset svn:externals -F externals .` My externals file looks like:

advertising-manager http://svn.wp-plugins.org/advertising-manager/trunk/
akismet http://plugins.svn.wordpress.org/akismet/trunk/
amazon-showcase-wordpress-widget http://svn.wp-plugins.org/amazon-showcase-wordpress-widget/trunk/
google-analyticator http://svn.wp-plugins.org/google-analyticator/trunk/
google-sitemap-generator http://svn.wp-plugins.org/google-sitemap-generator/trunk/
google-syntax-highlighter http://svn.wp-plugins.org/google-syntax-highlighter/trunk/
latex http://svn.wp-plugins.org/latex/trunk/
smart-archives-reloaded http://svn.wp-plugins.org/smart-archives-reloaded/trunk/
stats http://svn.wp-plugins.org/stats/trunk/
wp-rdfa http://svn.wp-plugins.org/wp-rdfa/trunk/
xrds-simple http://svn.wp-plugins.org/xrds-simple/trunk/

Jun 1, 2009

Disseminating Broadcast Archives: Exposing WGBH Materials for Scholarly Use

Beer, C., and Michael, C. 2009. Disseminating Broadcast Archives: Exposing WGBH Materials for Scholarly Use. OpenRepositories 2009.

The WGBH Media Library and Archives is currently prototyping an online archive of moving image content. Funded by The Andrew W. Mellon Foundation, the project seeks to serve scholars in their efforts to incorporate media into their research and communications activities. We are currently prototyping a Fedora-backed online archive incorporating search, browse, data visualization, and web services. We will present the open source infrastructure behind our web project which includes Fedora, Solr and a PHP front end. Our Fedora content model addresses the specific needs of a moving image archive, allowing for the expression of complex relationships between conceptual and instantiated assets. In addition, it allows us to express the myriad permutations and oddities occurring within broadcast asset relationships. We will share lessons learned and new challenges regarding the representation of archival moving image collections online, the unique cataloging and metadata needs of the online researcher, and barriers to the use of online archives by scholarly researchers. Finally, we will cover technical challenges involving storage and delivery of long form video content, rights management, and user authentication and sustainable business models.

May 22, 2009

Open Repositories ‘09, May 18-21

I attended the Open Repositories conference, May 18-21 in Atlanta, which “attempts to create an opportunity to explore the challenges faced by user communities and others in today’s world”. In general, the OR community is very relevant to our work with repositories (for Mellon/OpenVault, Teachers’ Domain, the DAM system, etc), and so many people are facing the same problems with cataloging, preservation, and dissemination. The California Digital Library had a presentation that provided a connection between curation and preservation goals (which I think is something we’re very interested in), saying: Lots of [copies, description, services, uses] keeps stuff [safe, meaningful, useful, valuable]. John Wilbanks, VP of Science at Creative Commons, gave the keynote — “Locks and Gears: Digital Repositories and the Digital Commons” — stressing the importance of Open Data, helps bring together isolated knowledge pools. The ultimate goal is to turn databases into the web, to allow useful “stuff” to happen rather than locking it away. Making this information available, linked, shared could help solve existing problems that lack funding (cure for Huntington’s Disease was one example). The HD Foundation is funding some of the Science Commons’ work opening up genetic databases, and creating semantic web endpoints (with SPARQL) for that data to make it more accessible; Wilbanks had an analogy between the ability to easily edit an HTML page with the ability to easily edit a SPARQL query, which allows for more “hackability” (potentially in the face of copyright or IPR). The OR community is finally starting to think about video material, which makes our appearance very timely, and allowed us to make several excellent connections, both on a technical level — Glasgow’s Spoken Word project, U. of Alberta’s digitization, encoding, and cataloging workflows, Rutgers’ work with NJVid + RUcore to form a state-wide educational video delivery network, etc — but also around content and preservation — the educational TV collection ofIndiana University and a collection at Northwestern. We also connected with a community group interested in creating repository-backed tools for scholarly research, trying to provide solutions and tools to support scholars and make repositories useful and exciting new mediums and doing so in an open manner to “cross-polinate” across disparate groups, which can lead to previously unrealized benefits. There was also a lot of interest around creating multiple, light-weight interfaces to collections to meet the needs of a group of users, rather than “building the death star”. This community seemed split into people using existing applications (Drupal — UPEI among others)) on top of Fedora or building front-ends on top of a framework (PHP/Zend Framework (WGBH, NASA), Ruby on Rails (MediaShelf, Hydra), Django). On the other end, there was interest around Sun’s OpenStorage platform (which apparently will still have life inside Oracle, the iRODs distributed storage repository, and DuraCloud, a cloud/distributed storage abstraction layer for repositories. Tony Hey, VP Microsoft External Research, convinced me that MS isn’t wholly evil, and is trying to do the right thing among scholarly communities by embracing open standards and interoperability (obviously, when it suits them, but still an improvement). They’ve done some great work with MS Office add-ins to connect the suite with institutional repositories. Finally, on Wednesday, MS launched Zentity, their new repository offering build on the MS stack (IIS, MSQL, etc), perhaps useful for institutions to get up and running with a repository; everyone recognizes this is not a new product line, but a research project, and MS is trying to break into a monopolized market. Our presentation was well received, and our poster won the poster session (out of 30+ posters; note: in future posters, specify the pantone/CMYK/etc colors + don’t be afraid of obvious branding).

May 10, 2009

inbflat mixer

I’ve quickly hacked together a mixer-style interface (using jquery + the youtube chromeless player) for the inbflat youtube mashup media project. If I get really motivated, there are a couple of features that would be nice:

Solo-mode
Media scrub bar — I’m not sure what the appropriate interface would be for this
Add/remove video clips
Port this to the HTML5 canvas/audio system for fun

May 9, 2009

Getting mail server statistics into ganglia

There are a number of fine stand-alone mail statistics generating packages out there, most based on RRD. We, however, are happily using ganglia to aggregate and display statistics in a dashboard. Here’s the quick and dirty patch to take the simple mailgraph perl script and transform it into a gmetric data source:

--- mailgraph.pl        2007-08-29 09:06:01.000000000 +0000
+++ mailgraph-ganglia.pl        2009-05-09 17:21:52.000000000 +0000
@@ -871,15 +871,13 @@
        return 0 if $m < $this_minute;

        print "update $this_minute:$sum{sent}:$sum{received}:$sum{bounced}:$sum{rejected}:$sum{virus}:$sum{spam}\n" if $opt{verbose};
-       RRDs::update $rrd, "$this_minute:$sum{sent}:$sum{received}:$sum{bounced}:$sum{rejected}" unless $opt{'only-virus-rrd'};
-       RRDs::update $rrd_virus, "$this_minute:$sum{virus}:$sum{spam}" unless $opt{'only-mail-rrd'};
-       if($m > $this_minute+$rrdstep) {
-               for(my $sm=$this_minute+$rrdstep;$sm< $m;$sm+=$rrdstep) {
-                       print "update $sm:0:0:0:0:0:0 (SKIP)\n" if $opt{verbose};
-                       RRDs::update $rrd, "$sm:0:0:0:0" unless $opt{'only-virus-rrd'};
-                       RRDs::update $rrd_virus, "$sm:0:0" unless $opt{'only-mail-rrd'};
-               }
-       }
+        system("gmetric -c /etc/gmond.conf --name=mail_sent --value=$sum{sent} --type=int8 --units=messages");
+        system("gmetric -c /etc/gmond.conf --name=mail_received --value=$sum{received} --type=int8 --units=messages");
+        system("gmetric -c /etc/gmond.conf --name=mail_bounced --value=$sum{bounced} --type=int8 --units=messages");
+        system("gmetric -c /etc/gmond.conf --name=mail_rejected --value=$sum{rejected} --type=int8 --units=messages");
+        system("gmetric -c /etc/gmond.conf --name=mail_virus --value=$sum{virus} --type=int8 --units=messages");
+        system("gmetric -c /etc/gmond.conf --name=mail_spam --value=$sum{spam} --type=int8 --units=messages");
+
        $this_minute = $m;
        $sum{sent}=0;
        $sum{received}=0;

I’m sure there are plenty of other ways to do this, but this seems like the most straightforward.