bite-size

Interesting readings from around the web:

“Because wide-spread full text indexing abounds, the problem of find is not as acute as it used to be. In my opinion, it is time to move away from the problem of find and towards the problem of use. What does a person do with the information once they find and acquire it? Does it make sense? Is it valid? Does it have a relationship other things, and if so, then what is that relationship and how does it compare? If these relationships are explored, then what new knowledge might one uncover, or what existing problem might be solved? These are the questions of use. Find is a means to an end, not the end itself. Find is a library problem. Use the problem everybody else wants to solve.”
Eric Lease Morgan, “Next-generation library catalogs, or ‘Are we there yet?’”

“My favorite worlds have always been natively game-like. In their basic world rules you immediately want to interact with them. When you know that Anne McCaffrey’s Pern has five types of colored dragons, you immediately want to match yourself to one. When you know that in Piers Anthony’s Xanth every person has a unique magical talent, you want to pick out a talent for yourself. These rule structures are very game-like and enhance the poetry of a world. In addition to making it accessible, they give you a framework that exposes the theme and meaning in a world much more clearly than worlds that do not have these structures. Character classes are extremely powerful things.”
author and game designer Erin Hoffman in an interview with Clarkesworld Magazine

“It’s strange, but start talking to hard-bitten, seasoned executives about information in the enterprise and they automatically switch off their critical faculties. They’ll believe anything. Really. Like, information and how it is used in your organisation can be understood by a piece of software, out of the box. Like, you don’t need to actually understand your information environment in order to manage it. Like, the best people to ask about making your information generally accessible, are narrow subject matter specialists. Like, you can fix your information environment once, and it’ll stay fixed forever without paying any more attention to it. In this article we explore three fairy tales about taxonomies that executives seem particularly prone to believing:

1. That you don’t need taxonomies if you get a good search engine;
2. That taxonomies can look after themselves or can be delegated piecemeal to non-taxonomists;
3. That the best people to advise on taxonomy development are subject matter experts.”

-from Innotecture, citing Taxonomy Times No. 6, April 2011

Things I wish I could attend

ASIS&T 2010‘s conference theme is “Navigating Streams in an Information
Ecosystem”. The full-day SIG CR workshop detailed below will “give participants a chance to reflect on essential questions related to information classification, representation and organization while exploring the future of the field.”

The morning session will include papers from theoreticians and practitioners
in the field, including:

Molly Tighe, Time Capsules Project Cataloguer, the Warhol Museum,
Pittsburgh, PA. Ms. Tighe will describe her work at the Warhol Museum, where
she is involved with a project to arrange and describe over 600 boxes of
items contained in the Andy Warhol Time Capsules.

Grant Campbell, Associate Professor in the Faculty of Information and Media
Studies at the University of Western Ontario. Professor Campbell will
present a paper “New Life for an Old Theory: Italo Calvino, the Future of
the Web, and the Theory of Integrative Levels” This presentation will use
Italo Calvino’s analysis of creativity and cybernetics to suggest that the
growth of sophisticated semantic networks in the Web of the future depends
on a process that Feibleman identified years ago with his theory of
integrative levels.

Joe Tennis, Assistant Professor at the School of Information at the
University of Washington. His paper “Form, Intention, and Indexing: The
Liminal and Integrated Conceptions Work in Knowledge Organization” will
propose a dual conception of “the work” in knowledge organization.

Tim Spalding, Founder of LibraryThing. In this presentation, Mr. Spalding
will discuss the intersection of traditional and social cataloging,
specifically how LibraryThing for Libraries allows librarians to harness the
“wisdom of the crowd” in unprecedented ways. Traditional library OPACs
currently lack the mechanisms for collecting the knowledge and preferences
of library patrons. Although the traditional cataloging and classification
model – where a small group of specialists describe materials for the
general public – works well enough for the job for which it was designed,
the expectations of users have changed with the advent of web 2.0
technologies like Wikipedia, flickr, and Amazon recommendation systems.
(*Note: this is a change from the original speaker from LibraryThing)

The afternoon session will build on the ideas presented in the morning
session and will be devoted to small group and general discussion regarding
the limits of classification research.

Specific questions include:

– Where is classification research headed?

– How can we best communicate our ideas and theories to researchers,
students, and practitioners?

– What are some of the strengths of our current research methods, and what
are our weaknesses?

– Are we working under any unexplored assumptions or biases?

– What are the goals of classification research?

Attendees will be asked to break into small groups in the afternoon to
discuss these questions, then return for general discussion towards the end
of the workshop.

Important Information:

EARLY REGISTRATION ENDS: September 17, 2010 (register and make hotel
reservations by this date)

( http://www.asis.org/asist2010/index.html )

For more information:

http://www.asis.org/asist2010/workshop-SIGCR.html

Enthusiasm for word clouds –> musings on indexing

Have you heard about Wordle?  I think I’m behind the times again; I’ve used wordclouds and always liked having a tag cloud on my blog, but I didn’t realize you could go to a site and paste text into a box and get a customizable cloud.  I just pasted one of my college papers in and *pooof* out comes a beautiful word cloud:

I had to use IE instead of Firefox for it to work, but the fun factor is worth it.   Next, I put in the paper I wrote on metadata standards for cultural heritage collections:

What I tried to do but couldn’t really accomplish was to paste in text from all the reference questions I’ve answered in the past months.  It would be cool to see which terms would predominate.  Probably just “database”, but it could be something surprising.  These wordclouds could easily be tweaked to make a nice blog header…and I feel like there are probably lots of excellent ways to use them just lurking in the back of my brain. It’s so tempting to think that if you can just see which terms are most frequently used, you’ll know the subject of the document. Like…if you want to quickly present the major concepts in any chunk of text, this is ideal, right?  Wellll….

I was trying to think through what types of content could usefully be wordclouded, other than blogs.  I (half-seriously) started thinking that maybe in addition to an abstract you could have a mini wordcloud at the beginning of a scholarly article, so people could decide even more quickly if they wanted to read it.  But really, the idea behind wordclouds — representing content with simple words or phrases — is already applied to most scholarly articles.  The cloud is just arranging terms in an attractive way, and -what I think is most useful – translating things like term frequency into color and/or size.   The question is whether a cloud contains automatically-generated keyword terms based on word frequency, if it is based on user-created keyword terms (tags),  or if it contains indexing terms that came from a thesaurus and have somehow been verified (by a human) to correspond to the actual subject of the document, not just the words in it.

I think it’s great that we can use algorithms to do part of the work of assigning subject terms to content, but I worry about people (perhaps non-indexers/catalogers/librarians) forgetting that it’s not enough on its own.  You know that I wrote a paper about gorgons and Amazons, but you have no idea what I said about them or what my paper was really about.  In this paper from 1961, Bernier & Crane make a similar point:

On the average fewer than 1% of the words of a chemical document are required to subject-index it. Thus, more than 99% of the words that the author uses are useless and even undesirable in a subject index. They are useless to the index user because they do not guide him to the new information reported by the author; they are undesirable because they dilute the useful [index terms] to make for confusion and time-wasting reading.[…] The detection of subjects and their translation into index language is the function of the subject indexer.

Although this article is from the age of print indexes, the same principles still apply in our world of full-text searching and electronic documents. Word frequency algorithms can help subject indexers by showing them terms that might be useful for translation into controlled indexing terms. The indexer must assign terms that cover what the author of a document is actually talking about, not just what they’re saying. If you only index the words and not the subjects, you’ll never be able to connect (or collocate) documents that talk about the same things using different terminology. This seems obvious but I think it’s easy to forget and/or be confused about. Even Bernier and Crane say that “New indexers often have difficulty in understanding the difference between subject- and word-indexing.”

In the course of thinking about this stuff, I realized I was associating wordclouds ONLY with natural language processing / automatic indexing. But wordclouds (or other word frequency visualizations) don’t necessarily have to be wedded to automatic indexing with all its weaknesses. The clouds themselves are not the enemy; it’s the lack of vocabulary control going into most of them that is causing this unexpectedly long blog post to spill out of me. Obviously you could generate a wordcloud from terms pulled from a controlled vocabulary, and I know a number of people (e.g. librarians) are doing this with LC or AAT subject terms in digital collections. There’s a difference between the visualization itself, and what you put into it. Garbage in, garbage out, eh? The same thing goes for clouds based on tags, unless you’ve got some fancy term combination stuff going on like LibraryThing does. I get kind of worked up thinking about library catalogs that have implemented user tagging and word clouds, but without integrating any of the subject headings in the catalog into the word cloud, or mapping the most popular tags onto subject headings in some way (is this possible? I hope so….). Just think how many end-users probably assume that when they click on a tag, whether in a word cloud or in a record display, they’re getting results for everything in your collection on that topic. Cringe.  (But maybe users of library catalogs don’t even think that anymore?  Did patrons know the benefits of using subject headings to search catalogs before we started offering tags and keyword searching? Did they ever think that they could use subject headings to get a really thorough list of results? Maybe the only people who think something is being lost or obscured are people like me who are obsessed with authority control?)

In the end, what I’m thinking is that wordclouds — whether based on controlled index terms or just pure word frequency — are probably most useful for groups of documents (even though I had fun putting my term papers into Wordle).  Collection-level clouds can give users a glimpse of the coverage and strengths of a collection, just like document-level word clouds can indicate (SORT OF) the topic of a paper. But in the end,  it doesn’t really make sense to create a wordcloud for an individual document because 1. if you’re applying controlled index terms to it (which obviously you should, if the document is part of a collection), there probably won’t be enough of them to justify needing “visualization”, and 2. if you’re making a wordcloud using the entire text (aka just word frequency) of the document, you’re not really accurately representing what the document is about. On the other hand, though, you have to admit a wordcloud for a document would be prettier than something like this:

(From a LexisNexis document: a list of index terms (in caps) with relevance indicated in percentages instead of color or font size. Near the end are the keyword terms – possibly automatically generated – in lower case.)

For collections of documents, I think it’s useful to show people a wordcloud of all your indexing terms/subject headings, especially because you can then visually indicate which subject terms you attach to documents most frequently.  This benefit of providing a casual glimpse of a collection’s coverage is probably why tag clouds are popular for indicating what a blog (a collection of “documents” , or posts) is “about”.  I wonder if more people think of tag clouds in this way, as content indicators, or if they think of them primarily as tools for accessing blog content via tags.

In praise of local newspaper indexing

“To the newspapers of our communities the historian turns to interpret the mind of the people during any crisis. For this quest the newspaper index is the indispensable key to the treasure house of facts locked away in newspaper files…Yet in the United States there is but one such index, published by The New York Times.”

–Paul P. Foster, librarian of the Philadelphia Inquirer, March 2, 1935. (in a Special to The New York Times)

In many cities, the newspaper with the most readers and the largest distribution area cannot possibly report on the happenings in the myriad neighborhoods and communities it covers. For example, in a 1995 article on collection building, John Yewell wrote that the Minneapolis Star Tribune “is no different from most other large city newspapers. The news decisions they make are intended to appeal to a wider audience. Their withdrawal over the years from local politics in Minneapolis has left a void that several small, scrappy newspapers are attempting to fill.” In Pittsburgh, we have the excellent City Paper, but also an impressive number of publications dedicated to reporting on happenings in our many neighborhoods and communities. The Greenfield Grapevine, the Spirit of Bloomfield, and the Pittsburgh Senior News are just a few examples.

While many large city newspapers (and even some small ones) have adapted to the Web and provide keyword searchable content online, most community publications remain confined to the world of print (or PDFs, if you’re lucky). Few small newspapers possess the funding or expertise to mount their archive of print content online. Their content is not included in news databases, and libraries often neglect to collect them at all. This is unfortunate, because community and neighborhood newspapers can provide journalistic detail and a unique historical perspective that is increasingly absent from larger newspapers. Now that news information is more accessible and ubiquitous than before, community-produced information about local history and current events can be easily lost in the chaos or overshadowed by more easily accessible digital resources.

While in library school, I became especially interested in local news indexing practices and history. Indexing “small, scrappy newpapers” arguably requires knowledge of local history and language, in addition to familiarity with the needs and interests of potential users of the index. Since community publications may never make it to the Web, an index may be the only way to provide subject access to the years of history contained in their pages. It’s hard to argue for the time-intensive effort of indexing when digitization seems so quick and supposedly timeless. Despite the many benefits of news indexes, librarians and archivists have often found it difficult to convince users to use them, and library administrations to support their production at a local level. The format of indexes can be especially daunting for users accustomed to keyword searching, but these users have been found to appreciate indexes once they are made aware of their usefulness (Knapp, 2008). The issue of funding and staffing for the production of a local news index is a larger problem. I admit that researching this topic is a little self-indulgent. Still, you never know what can be accomplished with some tenacious grant-writing. Even if a small newspaper is capable of offering full-text content online, its users could still benefit from the improved retrieval accuracy that a good index can provide. (On a related note, see this article by Heather Hedden on book-style indexes for websites).

Journalists don’t use standardized terminology within a single issue of a paper, much less over a number of years. Indexing with the use of a controlled vocabulary (or “heading authority”) brings together all references to constantly changing personal, organizational, and community names. Local news indexers often use an already established heading authority like SEARS or LCSH, and modify it for local use. “To accurately assign subject headings requires an intimate knowledge of the community and its history”(Weaver, 2006), and indexers with local expertise can better recognize associative relationships that should be created between concepts and entities in the index. Community news indexers may also develop a list of recommended or modified headings designed to reflect local terminology and respond to the needs and viewpoints of anticipated users (Sholtys, 1984). Genealogists, local historians, reporters, authors, business people, county agency staff, students, and historical preservation society members are all potential users of a local news index, and their information-seeking habits should be considered in the wording of terms and their variants.

The main reasons to create an index are to save users’ time and facilitate their access to information. In 1982, one librarian wrote that the microfilming of local news archives resulted in “readers [being] less able to browse among the yellowing pages of bound volumes and [being] forced instead to endure the microfilm, where soonest found, soonest done” (J.D.L., 1982). This librarian’s indexing project was valuable because it enabled patrons to go directly to an article without having to deal with browsing and switching between reels of dreaded microfilm. I imagine that an electronic index could be especially useful for the community newspapers that exist in Pittsburgh, which aren’t necessarily archived in local libraries or even accessible to the public. I think a lot of them are stored at community centers, where (for all I know) there might be lovely filing cabinets or even a dedicated volunteer toiling away at an index. (I’ve been meaning to find out about this for the last year!) The time-saving principle of indexes applies to the Web environment as well as to print. Online, indexes have been shown to direct users to answers an average of 2 minutes faster than keyword searching (Knapp, 2008) (who knew?!). Users benefit from not having to browse through results only to realize they are irrelevant, and an index reduces the need to perform multiple time-consuming searches using different terms in hopes of getting all results on a topic.

During the 1970s, there was a flourish of local newspaper indexing projects. The interest was not limited by international boundaries nor by library type: projects were initiated in public and academic libraries in Scotland, Great Britain, New Zealand, the United States, and elsewhere. The majority of these projects were grant-funded, and staffed by one dedicated librarian with a few volunteers. Training the volunteers was time consuming and difficult, and volunteers were often not able to stay interested long enough to complete the project. This was a frequent problem with projects attempting to index a historical collection of newspapers. Projects that began indexing current issues seemed to fare better because they didn’t have such a daunting body of work facing them. In order to make indexing workflows as efficient as possible, one should consult the literature before taking on a project. Though they are outdated in many ways, articles from the 1970s and 1980s provide valuable tips on developing a name and subject authority file, determining depth of indexing, and dealing with staff and money shortages.

People (especially librarians and archivists) who want to begin indexing their neighborhood or community newspapers may be able to advocate for a project by invoking some practical benefits, which may especially appeal to administrators and funders:

  • A local news index is tangible proof of library activity (Dewe, 1972).
  • Librarians working on an index will be acutely aware of trends and issues that are of interest to the community. This is important for almost all aspects of public library operations, including programming, collection development, and marketing.
  • The production of an index is a chance for the library to form partnerships with other institutions. Librarians in Napa, CA used their indexing project as an opportunity to achieve closer cooperation among a community college, two small city libraries, four high school libraries, a private parochial college, and a historical society (Vierra & Trice, 1980).

The Web seems to lead many information-seekers to neglect the importance of information that is not available online. Local newspapers provide unique information about people, places, and events that may never appear on the Web or in any major publication. The time and money it takes to produce an index are significant, and it is harder than ever to convince non-cataloging types that such “old-fashioned” efforts are still worth it. However, without any way of facilitating the retrieval of information from small local publications, their “treasure house of facts” will be “locked away” when it could instead be helping people access relevant information about their communities and their histories.

Works Cited + Further Reading:
American Society for Indexing. “ASI Publications.” http://www.asindexing.org/site/asipub.shtml

Aslin, P. (2001). Raising Rochester history: The history of an index. Key Words 9 (3), 76-82.

Beare, G. (1989). Local newspaper indexing projects and products. The Indexer 16(4), 227-233.

Dewe, M. (1972). Indexing local newspapers. American Libraries 3(4), 59.

J.D.L. (1982). Local newspaper indexing in the UK. The Indexer 13(2), 103.

Knapp, C. (2008). Breakout session C1: indexes and the Google Generation: what you don’t know CAN hurt you. 2008 CALI Conference Report. Key Words 16(3), 95.

Knee, M. (1982). Producing a local newspaper index. The Indexer 12(2), 101-103.

Napier, K. (1982). Indexing a local newspaper. New Zealand Libraries 43(12), 197-199.

Sandlin, L. (1985). Indexing of smaller-circulation daily newspapers. The Indexer 14(3), 184-189.

Sholtys, P. (1984). Adapting Library of Congress Subject Headings for newspaper indexing. Cataloging & Classification Quarterly, 4(4), 99-102.

Special to The New York Times. (1935, March 2). Newspaper indexing urged as library aid. New York Times (1857-Current file), p. 13. Retrieved November 12, 2008, from ProQuest Historical Newspapers The New York Times (1851 – 2005) database. (Document ID: 97144991).

Vierra, B. & Trice, T. (1980). Local newspaper indexing : a public library reports its experience. The Serials Librarian 5(1), 87-92.

Weaver, C. (2006). The Indexer as consultant: collaborative indexing of community newspapers. Key Words 14(1), 18-33.

Yewell, J. (1995). Why libraries must subscribe to –and preserve—the neighborhood and community press. Collection Building 14(2), 47-48.

FYI: much (but not all) of this post was distilled from a paper I wrote in 2008 as part of my MLIS coursework. Please give me credit if you’re citing or using any part of it.

Horrific serial title changes

Ever since I started working with serials I haven’t been able to help noting some of the bizarre and seemingly unwise title changes that are all too common. Yesterday I was faced with what was probably the worst case of publisher indecision I’ve ever laid barcodes on.

At some undetermined point in history, Electronic Purchasing changed its title to Electronic Business. Then, one fine September in 1993, the title changed again to become Electronic Business Buyer. The complications involved in changing the title in the middle of a volume didn’t seem to concern anybody (except probably the catalogers who had to deal with it). Either way, that title lasted only for a mere two and a half volumes; it became Electronic Business Today in September 1995. Then (perhaps they were influenced by the fast rate of change in the industry they were covering?) the title changed AGAIN in November 1997….back to where it started: welcome back, Electronic Business! It appears that this title was finally laid to rest – or subsumed by something else – in 2007.

That’s just the most recent example. Here are some others I’ve accumulated on my list of shame:

Working women leave us tongue-tied
Executive Female Digest became The Executive Female which became Working Woman which became Executive Female (again) which then became EF : Executive female – or is it NAFE Magazine? It seems like when things get too crazy the common solution is USE ACRONYMS. Case in point:

Training and Development Journal became Training and Development which became plain ol’ T+D.

Mo’ money mo’ problems
Bulletin for International Fiscal Documentation became Bulletin for International Taxation

National Public Accountant and the PA (!?) became NPA Magazine, which became Tax Magazine. Or maybe it became CPA Magazine? Or maybe NPA and CPA have combined to form Tax Magazine?! Hopefully the next issue will bring a revelation.

Banking Law Journal split into Business Law Journal and Bankers Magazine (1964). The latter became United States Banker. But there’s two other titles that are also listed as becoming United States Banker: FutureBanker and United States Investor/Eastern banker. Now the title appears as USBanker on the cover.

We’re board with our title
Management Record and Conference Board Business Record became Business Management Record. That then became Conference Board Record, which combined with another publication, Focus, to become Across the Board, which eventually became Conference Board Review.

Focus on people
Human Resource Planning became People & Strategy
Sales & Field Force Automation became Sales & Marketing Automation which is now CRM: Customer Relationship Management. Interesting how disciplines change the way they describe themselves over time.

Did you know BusinessWeek (now Bloomberg BusinessWeek) was called System in 1900, and it was published in Muskegon, MI? Wikipedia says Chicago, but whatever. Gotta represent for Michigan, and there has to be some reason the catalogers listed that as the publication locale. Apparently it became Magazine of Business in (1929), but most sources say that’s when “BusinessWeek” started. (“BusinessWeek up for sale.” The Online Reporter (2009:July 17):21.)

I’m not even gonna touch Best’s Insurance News.

The titles on my list can’t even compare to many of those featured during the glorious years of the ALCTS Worst Serial Title Change of the Year Committee. From 1984 to 2003, the Committee gave awards for the worst title changes based on criteria such as:

“a frivolous title change for no apparent reason and producing no advantage; the unnecessary change of an old, respected title; repeated changes, the latest being no better than any earlier ones; and the “Snake in the Grass” or “Et tu, Brute?” category for library publications.” (Serialist archives)

Librarians and catalogers could also suggest their own categories for special awards when submitting titles for consideration. Lest this all be perceived as nitpicky whining, let me highlight a good point made by Mary Curran (writing about e-serials) in the February 2008 issue of The Serials Librarian:

“Publishers should hesitate before significantly changing the title of one of their publications, not only because of the inconveniences it causes librarians and users, which are well enunciated in Louise Cole’s
article entitled “A Journey into E-Resource Administration Hell,” but also because it may temporarily influence the journal’s impact factor. ISI, now Thomson Scientific, notes this affect in reference to the 8,700
periodicals included in its database:

‘A title change affects the impact factor for two years after the change is made. The old and new titles are not unified unless the titles are in the same position alphabetically. In the first year after the title change, the impact is not available for the new title unless the data for old and new can be unified. In the second year, the impact factor is split. The new title may rank lower than expected and the old title may rank higher than expected because only one year of source data is included in its calculation.'”
-Curran, Mary. “The Worst E-Serials Tracking of the Worst Serial Title Change of the Year Award Goes to…” The Serials Librarian 53.4 (February 2008):47-57.

In other words: mo’ titles, less IMPACT FACTOR.

Public domain photos from the National Media Museum and the Powerhouse Museum on Flickr.

Resources on cyberscholarship; classification systems

A few weeks ago I attended a talk by Geoffrey Bowker, professor and “Senior Scholar in Cyberscholarship” at the University of Pittsburgh School of Information Sciences. This was part of the Digital Libraries & Cyberscholarship Colloquium Series, which continues next Tuesday and again in December.

I don’t have time to write a post that would do justice to Prof. Bowker’s talk, so I just decided to link to his website, which can provide a better idea of the topics he has been investigating (data sharing, interoperability, classification, standards, meta-narratives…). The main subject of the talk I attended was “social and organizational features of emerging scientific cyberinfrastructures”, but now that I’ve skimmed this paper, “How things (actor-net)work: Classification, magic and the ubiquity of standards“, I’m more curious about the book Bowker co-authored with Susan Leigh Star: Sorting Things Out: Classification and its Consequences (MIT Press, 1999).

What I find most interesting about Prof. Bowker’s approach (as it was represented in his talk at SIS) is how historical it is, and how he draws his examples from various disciplines and makes connections between them, especially in terms of “storytelling” and organizational behavior. I think this type of analysis is especially enjoyable when applied to classification systems, since they have such a long history and are such dependable vessels of ideology. In fact, I’ve read another book that takes a similar approach: Glut: Mastering Information Through the Ages by Alex Wright (National Academies Press, 2007). Wright’s website offers an annotated bibliography to his book; this is a great resource for anyone interested in the history of human efforts to represent and organize knowledge and information.

the MARC market

This study of the North American MARC records market, commissioned by the Library of Congress, is pretty interesting. An excerpt:

…for the moment MARC remains central for libraries. In large measure this is due to the installed base of library systems, which expect and work well with this data exchange format. This will continue to be true until the next generation of discovery and inventory systems are in place. But its limitations are increasingly clear. […] This will undoubtedly change over time, but for now, most libraries will continue to need cataloging records delivered in MARC format—it is the only usable solution.
There remain strong arguments for use of standard cataloging principles‐‐‐controlled vocabulary, classification, subject analysis, and authority control—packaged and delivered in a consistent format. While MARC records may need to be extended, embellished (supplemented with full text, flap copy, excerpts, user tags), for now they provide a common standard and a cooperative infrastructure that controls costs. In the long term, there may emerge better solutions. For at least the next 5‐10 years, however, continued savings can be realized by improvements to the production and distribution systems for cataloging records.

I especially liked the diagram on page 32 depicting the traditional and non-traditional “tiers” of organizations involved in the larger information resource market.

Fischer, Ruth, & Lugg, Rick. Study of the North American MARC Records Marketplace. Washington, DC: Library of Congress, October 2009.
http://www.loc.gov/bibliographic-future/news/MARC_Record_Marketplace_2009-10.pdf.

A back-of-the-book index to images of ancient Greek vases

Before I dive into this, here’s my experimental image index that is discussed in this post. You can see some sample pages from the text using Amazon’s “look inside” feature.

Background
Over the summer I took a class on indexing and abstracting. As part of my final project, I indexed some of the images of Greek vases in the book The History of Greek Vases, by John Boardman (London: Thames & Hudson Ltd, 2001). It seems kind of quaint to produce a back-of-the-book image index; maybe that’s just because I’m too steeped in digital stuff. Indexes are crucial for print materials, and I know it would have been useful to my art historical research to have the subjects, vase types etc. indexed. However, this probably isn’t the most practical of endeavors. Any good index to the text would probably index image captions, or the locators would at least get you close enough that you could find that one image you remembered that showed Perseus with a detached Gorgon head in hand. This exercise was more just to see what would happen if indexed multiple attributes but proceeded as if I were just creating a traditional subject index to a text. What would the index look like? What would the cross-references be like, and how would the image index differ from an index to the text?

Method and Meanderings
I didn’t index drawings, maps, or photographs of sections of vases; I focused on images depicting vases in their entirety, and on the descriptive information about them that was included in captions. The locators in my index are for page numbers, although each image was numbered in the text. It just seemed easier to navigate to a page than to an image number.

When deciding what aspects of the images to index, I took inspiration from the access points used by the Beazley Archive at Oxford University. Some of the elements by which one can search the Beazley pottery database are:

  • Fabric
  • Technique
  • Shape name
  • Date range
  • Inscription type
  • Inscription
  • Artist name
  • Scholar name
  • Decoration description
  • Collection name
  • Publication name

There are even more than that, and it’s pretty impressive. Not all of these categories would be useful for my purposes (a back-of-the-book image index), but several of them (inscription type and vase shape name, for example) are especially useful for various types of art historical research in this field.

I looked at the VRA Core 4.0 metadata schema and noted which of its elements might correspond to those used by the Beazley Archive. The Beazley’s “fabric” element combines VRA’s “cultural context” and “style/period” elements. “Technique” corresponds to VRA’s “material” and “technique”. VRA does have an element for “inscriptions”, but it’s not clear to me if the “type” attribute for the “text” sub-element could be used to indicate the type of inscription (e.g. epoisen or egraphsen signatures). I was mainly using VRA Core as a point of reference, to get an idea of the types of attributes generally deemed important in creating descriptive metadata for images. (at the time, I didn’t know about Cataloging Cultural Objects, but since VRA Core is based on it I don’t see that as a big deal). I also considered the facets of AAT, and what they indicate about elements that can be combined (e.g. style and period).

The attributes I finally chose to index were:

  • artist name
  • vase shape
  • technique
  • decoration
  • inscription
  • subject (both “things” depicted (e.g. warriors) and mythological figures (e.g. Achilles))

I don’t think any of the vases in my sample set ended up having inscriptions. I chose to index only these six elements because I had a limited amount of time to devote to this. If I was indexing images in an online setting I would definitely want to use more access points.

It was difficult to differentiate between imagery that could be both a subject and a decorative element. Many of the vases I was indexing were decorated with rows of animals in a repeating pattern. This is a common motif, so I needed to decide on a policy for how to index it. At first I was making very specific subheadings indicating the type of vase on which the pattern was appearing, but then I realized that this was creating too much work, and it probably wouldn’t be all that useful. It was also starting to conflate the “fabric” and “technique” elements with the “subject”, and I wanted to keep them separate in hopes of having a less chaotic index.

So, for a vase that had a motif of lions in a row, I decided to just give the locator after the heading (“lions, 30-31”). However, if there was a vase with a lion in any other, non-decorative context, I made a more specific subheading (“lions – being hunted, 20”). I worried that it would be confusing to have some locators listed after the heading, and then a subheading with more locators. So I decided that if any subheadings were required, I would instead list the decorative appearances of the subject with the subheading “as decorative element”. Hopefully more examples will make this clearer:

Sphinxes were only used as decorative elements in the vases I indexed. So the entry for sphinxes is:

sphinxes, 20, 33, 41, 45, 46

Lions were used as decorative elements on some of the vases, but sometimes they were part of the narrative scene. So my entry for lions is:

lions
  being hunted, 20, 23, 30-31
  as decorative element, 29, 33

Problems / Discussion
Many of my locators appear multiple times within the same entry for a couple of reasons. First is the fact that one page could contain multiple images of different vases with the same subject matter, technique, or fabric. Second: most vases have multiple sections of varying imagery. I may have been indexing too deeply, but I thought that failing to index all the different aspects of the subjects depicted would be akin to giving the researcher a list of undifferentiated locators. A good example of this is the heading for “warriors”. Since I was creating a heading for “hares – being hunted by warriors” it made sense to have “warriors – hunting hares” instead of just listing all the pages for warriors. Why not give the user as much information as possible? Additionally, my entry for “hares” has duplicate locators because in one section of the same vase (on page 33) the hares are a decorative motif, and in another section they are part of a narrative scene in which warriors are hunting them. (confused yet?!)

As mentioned above, deciding to index both “subject” and “decorative elements” caused problems for me when the two were hard to differentiate. I wanted my imaginary user to be able to use the index to find lion freizes and not have to go through a bunch of locators just to find irrelevant images of lions NOT in freizes. But then again, I always expect too much of my information resources.

In hindsight, I think my index would be more useful if I had created separate indexes for each attribute I was indexing. But there are pros and cons to having the index divided by facets instead of in one long alphabetical sequence. One of the books I examined had one index for mythological figures and one for objects that are commonly attributed to specific gods/goddesses (I thought this was awesome). The book I was indexing had three separate indexes: (1) Artists, Groups, and Wares; (2) Mythological and Divine figures; and (3) General (three indexes in a mere 3 pages!). Though it makes sense to divide the index this way, it could lead to confusion if it spanned more than 3 pages. A user could be looking for Odysseus in the wrong index and not know it unless each page was very clearly marked with a header (I think I remember committing this error as an undergraduate). Another benefit to having one long alphabetical index is that it didn’t force me to always differentiate between the aforementioned troublesome decorative elements and subjects, which overlap so often on the same pot.

In real life, it would probably be best to keep the index as simple as possible, only providing sub-headings when the list of locators for certain topic/shape/artist got excessively long. Nevertheless, if my methods in this exercise could be used on a larger scale, the index would be very useful. Of course that’s unrealistic because of how time-consuming it is, and how “everything is online” now. Back-of-the-book image indexes aren’t unheard of, though. I inspected the indexes in the back of Boardman’s book and another book on Greek art. One of them only listed very general topics (“women” or “fighting scenes”). The other gave some sparse subheadings (and I think this one was the better index overall, though neither of them had any “see” references!).

Conclusion
Indexing a set of images in a book is a good way to become familiar with the basic issues of image description and index construction. It requires decisions that appear simple on the surface, but force you to carefully consider the nature of your subject and the needs of your potential users. You’re forced to make decisions about what to index, how deeply to index, and how to best express what you’re indexing…and those are all just as important in the online environment. So this is a good exercise for a rainy day when you need a rest from computer eye strain.

Also, I think it would be interesting to survey scholarly texts in any image-focused field and try to get an idea of how people deal with images in back-of-the-book indexes. Projects for the future!