Suggested Next Steps from IMA Presenter

John Tynan, March 28th, 2007

Just got off the phone with Seth Gotlieb (formerly of optaros.com, now at contenthere.net ) he had presented at IMA2007 as part of the discussion on choosing a cms.

Seth had some great advice that helped me form my thinking about how I should proceed as a technologist as well as how the folks rallying together at pubforge.org might best proceed as a group.

As someone who has built a good part of a station site using a particular brand of open source technologies (let’s say, I’ve chosen to drive our station around in the open source equivalent of a Ford), I will be facing a decision, given that there seems to be some considerable intertia in the Chevy camp. But now may not be the time to jump from one moving car to another, at least not yet.

Seth suggested that some good first steps would be for us to:

  • Identify group of stations (or individuals) who are willing to work together around a specific (technology) or goal.
  • Arrange for a week-long training session for the group in a single physical location. Either decide which city you would like to hold this as a group, or decide the city based on where the training is being held. (For plone users, he suggested contacting Joel Burton about a Plone Bootcamp — for drupal users, he suggested talking with Jeff Robbins at lullabot.com).

He went on to say that the benefit of getting together in the same place would:

  • be an indicator of commitment - those who would be willing to travel would be more invested
  • Getting out of the office would allow us to focus better
  • It would be an opportunity to forge bonds socially and increase networking opportunities

He suggested we identify which projects are currently in development (such as the drupal stations modules project, or find/start a broadcasting equivalent to the ploneforartists project). He suggested we identify which aspects of these projects we would like to see improved or added upon. He suggested that we could add an economy of scale by either collaborating on code as a group, or by pooling our cash to pay for additions to the codebase.

He suggested that we check into the pricepoints for training. If we have x number of participants, what will it cost us?

He suggested, in looking for people who would be willing to attend the training, that we should start with the folks who initially put the module together, for instance the drupal station modules were originally designed for KPSU, a college radio station in Portland, Oregon. Maybe this station would be a good place to start with a partnership, and then look outward from there.

I guess that leads to the question, is there a listing of folks from the latest IMA conference who were interested in using Drupal, Plone or alfresco (or perhaps frameworks such as jboss or ruby, or django — or even closed source cms’ like Jack Brighton’s work with expression engine) the list goes on? Do you think such a list should be put together at pubforge.org?

To get a better idea how these discussions might be beneficial to Seth in his work, I asked “what was in it for him?” He replied that he wanted to keep tabs on the progress of these initiatives, that he would be interested in helping us form an organization, for helping us decide how such an entity would be structured, and how we are going to go about making decisions. His emphasis is in identifying the requirements for a product, in product selection, in enabling developers to work together and enabling companies work together using collaborative techniques / open source tools. Perhaps we’ll draw on his expertise again further down the road?

PBCore for publishing, sharing, and preservation

Jack Brighton, February 28th, 2007

(I’m moving this from the comment section of John Proffitt’s post “RSS a good start, but a federated PBCore-based metadata archive would be better” at his suggestion. Comments are perhaps getting buried, but please do see that thread for more context and great points by all participants. Of course I edited this since I can’t leave anything alone…)

John’s and Dale’s ideas here about using PBCore are excellent, and this is a great place to discuss shaping new practices with media and metadata. I do think XML is the key to unlock access to content, and to expose metadata in whatever flavor and variety is wanted for particular purposes. So an RSS 2.0 feed is good for one purpose, and a PBCore XML record works for something a little more full-blown. In the latter case, we could use PBCore records to connect the dots in a federated collection of public media content at a highly granular level, by developing applications to parse, sift, search, and serve the data. It could look like one collection, but the content could be anywhere. This model is becoming more common in the library world where an XML protocol like OAI-PMH is used. (See http://www.openarchives.org/ )

With this in mind I recently developed templates in my content management system to output various XML formats, including RSS, Atom, PBCore, and Dublin Core. You might have seen me fumble thru a demo of this at the IMA Tech Session Show and Tell. You can see the beta version of this at http://will.atlas.uiuc.edu/index.php/prairiefire/ . Scroll down to find the Syndication menu in the left nav. The PBCore link will generate PBCore XML for the latest 10 episodes of this show we produce called Prairie Fire. When you are on one of the Episode content pages, the PBCore URL reflects just that episode. Same for Segment PBCore URLs. The URL calls the template to display the specific record or set of records, so it becomes the key to everything.

To what end? Right now, it’s just a demonstration or proof of concept. Eventually this could be used by Content Depot or NGIS to suck in metadata and media objects for system-wide syndication. (You know, as in Syndication.) In this case, the primary media item would be a broadcast-quality file, not a streaming archive. Then you’d also have a reference to the streaming archive as part of the PBCore record, along with other versions and related assets like a thumbnail image, etc. But I’m not sure PBCore is the right format to wrap up related media assets, so we could use standards like like MODS or METS which can include PBCore records as nested elements. In fact, when people begin using our media we’ll want to harvest tags and trackbacks, which add valuable metadata to the existing record. So we’ll want a way to encode this metadata and allow the total package to evolve. PBCore can be the item-level metadata format, but all related items might best be encoded in something else. Then everything can live and breathe as an item, a collection of related items, and a collection of collections. (Am I getting too meta here?) I’m suggesting that this method leads to media objects that harness collective intelligence, with metadata records that evolve with use. Our technical systems should allow for preservation of this metadata along with the media object at its core.

So what to do next? I’m going to finish building out my little CMS implementation and see where it leads. There are zero actual PBCore applications that can use this stuff, far as I know. But this is really easy to do, and it might lead to some other easy ideas…which I think are often the best kind!

RSS a good start, but a federated PBCore-based metadata archive would be better

John Proffitt, February 27th, 2007

I’d like to echo Dale’s posting, and expand upon it just a bit more.

First off, I agree that the political hurdles to implementing a standardized and centralized media back-end for the public media world are daunting. Further, I think what we see as “public media” is going to shift around rapidly in next couple of years, so determining who is “allowed” into the fold will becoming increasingly difficult (e.g. can a library join, or do you have to be a broadcaster with an active high-power FM or TV license?). There are other challenges as well, but let’s leave that issue alone for the moment. Back to the tech…

I think a centralized storage system is probably a bad idea, or at least one that would be difficult to achieve for all kinds of reasons. It’s also unnecessary. Why does everything have to be stored together, under one roof? The storage can be anywhere. It’s the live, searchable content index that would be most useful to the public, to other stations, to search engines and more. Let’s just remember that storage and indexing do not have to occur at the same place.

Now, about RSS. I think RSS is a great syndication system for short-form and linked media for recently published items. But RSS strikes me as insufficient as a deep-catalog syndication system. For example, how would I syndicate — using RSS — a catalog of 50,000 items or 100,00 items, in which the items are drawn from a variety of subjects and media formats and sources, each with various rights and authors associated with them? Theoretically, RSS could do this, as it’s just a string of XML. However, RSS 2.0 in its baseline configuration doesn’t carry all the data a centralized search system would need. Sure you can extend RSS with your own additional XML tags (just look at iTunes), but it still sounds a little silly to me to do it that way.

What I would propose is the establishment of a standard metadata description and storage pointer language, based on the PBCore schema (which is pretty complete already). Each public media entity would then expose its metadata index and its digital media archive to the public, to other stations, and to a centralized repository that would periodically accept updates from the edge storage and indexing systems. Access to the data could be tiered as desired, exposing only those items you wish to expose to various users or partners.

Using this metadata standard would allow the proposed central index to gather information from repositories both inside and outside the public media world.

In this way, we have the local control required (for whatever reasons) over media assets, yet the central searchability of our content is not impaired. Local entities would be required to meet certain metadata standards (and tests) before being accepted into the central indexing system. And getting into the system would be a high priority for any media companies wanting to be “found” online, especially in areas beyond the reach of any legacy transmitters.

The big plus is that while there would have to be an entity building and maintaining the indexing service, the various players would only have to meet a baseline standard protocol, mostly eliminating the politics. Yes, fights break out at the IEEE from time to time, but in the end, they do reach broadly interoperable standards.

Or… and here’s a subversive bit… do we just implement the metadata standard and then call up Google and tell them how and where to index all our content?