Managing Metadata: Common Issues for Publishers and Librarians

Half a million. That is the number of additional records per year major book wholesalers Baker & Taylor and Ingram estimate they are processing in these days of digitization format proliferation: half a million records on top of the approximately 200,000 new books each year. That is a lot of metadata, and it is more important than ever at every step of the book supply chain.

Book metadata often needs to contain much more than title, author, ISBN, and price to make the leap from warehouse to reader—or database to device. Tables of contents, cover images, detailed subject headings, reading level, available formats, and reviews: all help consumers, retailers, and librarians discover and procure new (and old but relevant) books. The trick, for everyone in the book world, is creating and sharing accurate metadata for all of those millions of records.

The burgeoning challenge of book metadata was the subject of a recent symposium and white paper sponsored by OCLC Online Computer Library Center. In March 2009, they gathered experts and interested parties from the publishing, library, and standards worlds in Dublin, OH, to discuss common problems and potential solutions. Judy Luther was at that time completing research for the paper "Streamlining Book Metadata Workflow," commissioned by OCLC and the National information Standards Organization (NISO).

While clearly an "interested party" rather than an expert, I was invited to speak to the group about the general experience of university presses dealing with metadata. Of course, in a community that ranges from presses publishing less than 20 to more than 2000 titles per year, and where the term "metadata" has not yet been fully adopted to describe bibliographic and marketing information, a general picture is not so easily taken. Before trotting off to Dublin, I spoke with several members, including Johns Hopkins University Press, a member with large book and journal publishing programs, and two presses who fall near the AAUP average: Cornell University Press, producing up to 140 new titles per year, and the University of Georgia Press, publisher of about 80 new titles per year. Not unexpectedly, the processes of metadata creation and management differed considerably. Johns Hopkins' in-house database has an ONIX component and pushes data to both the press web site and trading partners (via either ONIX or spreadsheet). Both Cornell and Georgia were at the time researching ONIX solutions, including off-the-shelf software and service providers such as NetRead or Firebrand, and were providing data via spreadsheets or online interfaces to key sales channels.

Despite their differences, all three presses mentioned the same difficulty with providing ONIX. "The standard just isn't standard enough," I said to the OCLC audience. That choice of phrasing raised some eyebrows (and maybe a few hackles), but we cleared up the vocabulary. There are so many flavors of ONIX being requested from publishers—almost every channel has its own requirements as to which ONIX elements and tag variations are preferred (if they even accept ONIX). For example, while the Book Industry Study Group (BISG) recommends best practices, and will certify the quality of publishers' ONIX feeds on 30 core elements, Barnes & Noble requires tailored compliance on half-again as many data elements to be classified as a top-grade ONIX supplier2.

But these retail-chain ONIX issues were only one small part of what was discussed in Dublin. The real crux of the symposium was the misalignment between the standards that have grown up separately in the library and publishing communities. MARC records (Machine Readable Cataloging) serve libraries' needs from ordering to online catalogs. In many cases, librarians require at least basic MARC records in advance of purchase, and more and more expect MARC records to be provided with purchased titles (particularly with e-book collections). Even subject classification schemes differ between these two sides of our community. From the publishers' end, BISAC codes are heavily weighted to trade books and were designed to help with store placement rather than broad consumer discoverability. Library of Congress (LC) subject headings are highly detailed, but provide much greater authority control.

Though these classifications and standards were designed to serve different needs, each side of the market has an even greater need for the metadata created on the other. The authority-controlled subject and author data from LC and MARC records can only help digital discovery and sales of publishers' works. The book marketing information provided through ONIX to the retail supply chain is now just as important for library patrons, and in the growing adoption of purchase-on-request policies, library collections specialists. Crosswalks between MARC and ONIX for Books will be needed to combine this data into effective and sharable information flows. OCLC is particularly interested in that concept, and recently undertook a pilot project to experiment with ingesting publishers' ONIX records, matching and enhancing the data with existing WorldCat records, and feeding back optimized metadata. That project has led to a new suite of metadata services for publishers. A second symposium, to move not just the conversation, but also the possible, forward, is planned for next year. Broadening representation, and easing metadata reuse and collaboration will be goals for the next meeting.

In the meantime, standards continue to change and evolve to serve the book communities' needs. In April of 2009, ONIX for Books 3.0 was released, and is not backwards compatible with previous versions. The ISTC or International Standard Text Code, is being promulgated as "a global identification system for textual works"—that is, to identify a text rather than a product or format, as the ISBN is used. Progress is being made on the International Standard Name Identifier (ISNI) to help in the correct identification of authors, a task that is required not just for better discovery but also in royalties and rights systems (such as the proposed Book Rights Registry from the Google settlement). In July 2009, CrossRef announced it had registered 1.7 million DOIs (Digital Object Identifiers) for book chapters and references. While the complexity of metadata standards is growing, so too are the support systems for producing and sharing accurate metadata. In the coming months, AAUP is planning to survey its membership about shared problems and needs in this area.

OCLC Publisher and Librarian Symposium Reports
Metadata White Paper: Streamlining Book Metadata Workflow
BISG Product Metadata Information and Best Practices

Brenna McLaughlin
Electronic & Strategic Initiatives Director, AAUP