As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites

This old website and all of its content will stay on as archive – http://archive.ifla.org

64th IFLA Conference Logo

64th IFLA General Conference
August 16 - August 21, 1998

Code Number: 007-126-E
Division Number: IV.
Professional Group: Cataloguing
Joint Meeting with: -
Meeting Number: 126.
Simultaneous Interpretation: No

Cataloguing vs. Metadata: old wine in new bottles?

Stefan Gradmann

Pica, Leiden, Netherlands

Abstract:

The 'metadata' approach reflects a need to re-think the relation between descriptive data and referenced elements specifically in the field of electronic Document Like Objects (DLO). In this field, a certain number of basic parameters seem to undergo major transitions. The article identifies some of the fundamental differences between traditional cataloguing activity and metadata in the areas of metadata production, the context of usage and the relation between metadata and the objects referenced by metadata and cataloguing records respectively.

The paper argues, that recognition of these differences is a fundamental requirement for a re-definition of the role of librarians in a newly emerging and rapidly evolving information paradigm

Paper

Introduction

A paper from R. Heery published in 1996 (and thus a long time ago considering internet standards and their development) has it, that: "The familiar library catalogue record could be described as metadata in that the catalogue record is 'data about data'." (HEERY 1996a) If this statement remains valid (and this - at least semantically - seems to be the case), a correct, but somehow naïve reaction from the librarian point of view might be to just consider cataloguing a specific type of metadata creation activity and leave alone all the non librarian fuzz around this buzzword: basically go on cataloguing as if nothing had happened.

One of the specific goals of the present paper is to give some indications concerning the inadequacy of such a reaction: the basic aim is to identify some of the points of concern the metadata issue is likely to create for libraries in the near future.

This paper thus is by no means an introduction to the metadata issue assuming a basic knowledge of the metadata issue. Such knowledge is easy to obtain in the WWW: starting points such as the "Metadata Resources" area (http://ifla.inist.fr/II/metadata.htm) provided by IFLA or UKOLN's metadata site (http://www.ukoln.ac.uk/metadata/ ) provide extensive information regarding all aspects of metadata. Anyone familiar with these information sites of with the subject of metadata generally will be ready to understand, why I have to narrow the focus more than slightly here: this paper will not attempt to cover all metadata standards and activities but rather concentrate on one example, maybe the most prominent one at the moment: the so called "Dublin Core" (DC) set (for background information first see http://purl.org/metadata/dublin_core).

Neither is this paper attempting a contribution to relevant standardising processes in the field of DC or of existing/emerging library cataloguing rules and formats (such as ISBD(ER)) or arguing in favour of either of these working models: there are contexts more suitable for this (such as the respective mailing lists) and there surely are specialists in both fields, who are entitled to such contributions to a much higher degree than the author of this paper.

What I am concerned with here is rather the question of the possible mutual relationship between the cataloguing and the metadata approach with only very timid and tentative attempts to give any answers. It has been maintained, that metadata and 'conventional' cataloguing records are complementary to some extent, whereas the main point I would like to make in this contribution is, that they are fundamentally different, if not conflicting working models, and that the working concepts underlying both models differ substantially, too.

There are, after all, a few good reasons - some explicit, others implicit - why the metadata community did not start off proposing MARC amendments but created a completely new frame of attributes. Some of the reasons for this have their roots in the outside view of what librarians are doing: a vital point to reflect on for librarians.

On the other hand, the metadata approach today benefits from the bonus of any fresh start - once this is over, metadata based activities are likely to rediscover some of the problems and pitfalls librarians have been experiencing during the past 30 years: while reinventing wheels may even sometimes be justified (and has been current practice in the field of library automation until now, anyway) there are good reasons to at least avoid errors already made by others.

This contribution is intended to provoke and stimulate discussion: I thus apologise for all the necessary simplifications and analogies I am going to use in this context: they are as wrong as any simplifications and analogies ...

Who does it, and How is it done?

When taking a look at the typical results of DC compliant metadata production a tempting thought - at least from a librarian point of view - is to consider DC-metadata some kind of simplified cataloguing format. Such a view is encouraged by definitions of metadata like the following one: "Metadata is data about data, and therefore provides basic information such as the author of a work, the date of creation, links to any related works, etc. One recognisable form of metadata is the card index catalogue in a library; the information on that card is metadata about a book. Perhaps without knowing it, you use metadata in your work every day, ..." (MILLER 1996)

This perfectly complies to a similar point made very early in the DC discussion context by P. Caplan. In an attempt to answer the question ""What is Metadata, Anyway?" she asserted that "Metadata really is nothing more than data about data; a catalog record is metadata; so is a TEI header, or any other form of description. We could call it cataloging, but for some people that term carries excess baggage, like Anglo-American Cataloging Rules and USMARC. So to some extent this is a "you call it corn, we call it maize" situation, but metadata is a good neutral term that covers all the bases. (CAPLAN 1995) (1)

In another attempt at giving an overview of metadata formats R. Heery still places cataloguing and DC within the same continuous paradigm but indicates a difference in complexity:

A variety of formats have been placed in this table, positioned along a continuum from simple records (Band One) to complex, rich records (Band Four). The variety of record types identified in the bibliographic control process can be placed on this continuum as shown below.

 
Band One		Band Two	Band Three	Band Four

Proprietary   		Dublin Core	MARC		ICPSR
  simple records:

NetFirst 		IAFA		TEI 		FGDC
					independent 
					headers

[...]			[...]		[...]		[...]

Publishers'  		CIP MARC	EDI messages	
  CIP forms

(HEERY 1996a)

All this seems to indicate, that the basic concern of this paper in fact is a non-issue, a mere matter of slightly changing terminology and variants of complexity.

A more than slight difference, however, can be perceived in the following definition given by T. Berners-Lee: "Metadata is machine understandable information about web resources or other things." - and this passage continues: "The phrase "machine understandable" is key. We are talking here about information which software agents can use in order to make life easier for us, ensure we obey our principles, the law, check that we can trust what we are doing, and make everything work more smoothly and rapidly." (BERNERS-LEE 1998)

This already differs sensibly from the "We could call it cataloging"-position: while the overall objectives could be claimed those of cataloguing activity, too (reliability and authentication of meta-information) the context of information usage is different (software agents rather than library users) and the explicit concern for efficiency actually implies, that things are intended to "work more smoothly and rapidly" - than cataloguing!

The difference gets even clearer, once we take into account another aspect that initially led to the DC initiative and that has recently been recalled by Stu Weibel "One of the original motivations for the DC workshop series was the notion that authors could supply their own descriptions." (WEIBEL 1998) (2) - not only does the production flow differ, but the originators of meta-information basically are not library cataloguers.

An additional aspect to keep in mind is the fact, that another original focus of the DC initiative was "to facilitate resource discovery in a networked environment" (LAGOZE 1997) and thus not primarily resource description. The metadata approach thus only accidentally fits within the descriptive paradigm of library cataloguing.

In fact, all this comes down to a clearer notion of the explicit and implicit assumptions connected to the term metadata: these are intended for a context of usage different from library catalogues, they are typically not created by professional cataloguers, they are intended to be produced more efficiently than cataloguing records, they cover a specific kind of material (electronic resources) and - this point is intended to be made further down - the relation between metadata and the resources referenced differs substantially from the relation between a cataloguing record and a book held by a library.

Even though the results of metadata production, the actual DC records, may be semantically similar to a simplified cataloguing record (and can easily be mapped to a MARC format (3)), the whole context of production and usage of this information is substantially different and driven by the intention to bypass the traditional cataloguing paradigm. Considering the process of metadata creation to be some kind of simplified cataloguing thus probably would be a serious misunderstanding.

For whom is it done? And how is it used?

Cataloguing records as traditionally produced by libraries are rather generic in that sense, that they make relatively few assumptions concerning the potential users of these records. The future context of usage (integration in an OPAC or placing of printed cards in a sequentially organised catalogue) until now had very little impact on the actual way such information was created in the cataloguing process and did not influence the semantics (as formalised in cataloguing rules such as AACR2) to a very high degree. This fact can be seen as an advantage - however, the library community currently is more and more aware of the drawbacks of this lack of end-user orientation of the cataloguing activity and is forced to reconsider some of its principles and be it only because of increasing cost awareness in the political context.

The same is not true for DC and other metadata initiatives: one of their main characteristics seems to me, that they are driven by very specific enduser requirements to a very high degree. This could be seen as a disadvantage since changes in enduser behaviour and the context of usage are likely to affect such an approach fundamentally with the risk of lacking continuity - however, this characteristic probably is considered a positive aspect today. Whenever DC is introduced, arguments are developed with specific kinds of resources in mind (electronic objects in the WWW environment), they come with specific assumptions regarding the context of usage (enhancement of precision in the context of internet search engines for example is one of the recurrent arguments in this context) and they are often developed having a specific user group in mind: the 'digital tourist' metaphor cherished by the DC community is significant in this sense.

This is true to some extent already for DC semantics. To give just one example, one of the basic assumptions here seems to be the uniqueness of resources, not accounting for the fact, that a 'work' (in the 'Functional Requirements' terminology) may have different representations / manifestations, and that copies of these may exist - the result is a 1:1 relation between metadata and physical resources tailored to the 'flat' information paradigm of the WWW (4). This fact gets even more tangible in the context of the corresponding syntax proposals, which are clearly oriented at a WWW usage environment.(5)

This fundamental difference may perhaps best be illustrated in comparing the respective relations between cataloguing records and books and between metadata and the resources referenced by these.

In most local library systems bibliographic records are typically complemented with copy records containing the 'pointers' (i. e. shelf-marks) indicating the location of a book. Such a 'pointer' is then typically using the library systems' proprietary circulation functionality as mediating instance and often even requires additional human activity from library staff to provide the user with his object of desire, the actual book or document. The basic point is, that this context of usage has little or no consequences for the bibliographic record and the cataloguing activity.

The situation is fundamentally different for metadata, as already pointed out by R. Heery:

Metadata also differs from traditional catalogue data in that the location information is held within the record in such a way to allow direct document delivery from appropriate application software, in other words the record may well contain detailed access information and the network address(es)."

Metadata such are part of a specific technical information infrastructure, and this is true to some extent even for the semantic level, that was originally intended to be context free: the actual value of a metadata record is determined to a very high degree by the fact, that the access pointers contained in the record actually work (this explains the high concern of the DC related discussion about the 'broken links' problem and its necessary involvement with URN- or other identifier-related standardising processes), and that these access pointers comply to the technical requirements of the application software used for the access to information. In simplifying this aspect very much one could say, that a metadata record containing an invalid resource pointer is almost worth less than no record at all.

The conclusion of this section thus is, that metadata not only belong to a different production paradigm, but that they also are intended to be part of a usage context different than that of cataloguing records, and that they are technically linked to this context to a very high degree. While this may seem to simplify things enormously (enabling direct document access using standardised pointing methods) this fact paradoxically complicates things at same time, since the role of metadata records in this information infrastructure depends on the rapid evolution of quickly evolving and changing internet standards (making clear, that this point, too, is a mere fact, and not intended to be read as a criticism of the metadata approach).

... and a chance for librarians?!

Some of the fundamental differences between bibliographic records and metadata as well as the respective production paradigms should have become clearer by now. Clear enough, anyhow, to understand, that both approaches are part of different information infrastructures and necessarily react to these, even though there may be touching areas and similarities.

It may of course be possible to tentatively combine both information paradigms as in the proposal of using the library OPAC as a gateway to access the metadata repository made by XU (1998). I do not want to discuss this in detail, even though I have my personal doubts regarding its immediate practicability. This is, however, one important direction to investigate for librarians, and some of the recent and current work being done in my institution - Pica - goes in the same direction combining library automation and internet information techniques as we did in our WebDOC project or in our DELTA project.

There are, however, other areas, where the metadata community may benefit from specific librarian expertise and experience (or where this is already the case, due to the presence of many persons representing the library world in this community), and this probably is true for so called 'qualified DC' to a much higher degree than for 'simple DC'. I am thinking of examples such as the use of repeatable items and the lessons possibly to be learned from the MARC experience and its subfield architecture or the use of controlled vocabulary, which may lead to discussions strangely resembling those in the library world about authority forms in the past. There are more areas of this kind, where the necessary reinvention of the metadata wheel may (and already does) avoid problems already identified in earlier contexts.

I would like to end this paper by indicating two areas, where substantial and continuous contributions from the library world may be especially useful for the metadata approach. One participant in the meta2 mailing list recently stated:

"My own experience shows that what allows for better search results in library catalogs is not so much the format itself, as the information that is put into the format. Librarians have traditionally followed the concept of consistency when they create library records (a consistent form of name, of title, and of subject analysis). I readily admit that being able to search "Green" as a name separately from "green" in the title is a large step, but it pales in comparison with being able to select the correct "David Green" from a multitude of names."

Consistency assurance and authority control have indeed been areas of major concern for librarians, and there may be a systematic role for them in the metadata context again to specifically contribute to the overall consistency of production results - making very clear, that the intention here can not be to reconvert the metadata approach into some kind of traditional cataloguing activity!

The second area I am thinking of is closely related to this and concerns the problem of metadata authentication. The recent report on the EC Metadata Workshop in Luxembourg states, "that the current take-up of Dublin Core is slow and that there is a lack of critical mass". Among other reasons one of the problems underlying this fact is the relative little use search engines like AltaVista are currently making of metadata beyond mere keyword indexing and the lack of metadata authentication in turn has been suggested to be one of the major reasons for that. A message from S. Weibel to meta2 reacts to this problem in stating:

"But I have come to believe that we are moving from a where-do-I-click mentality to a who-do-you-trust position. As a representative of the library community, I see this as an opportunity as much as a problem, given that the public trust is among our most important assets.

Other formal communities also are positioned to provide trusted resource description... museums, governments, publishers, professional and trade organizations. There is room for abuse in any such system, and there will be (already is) in the metadata realm. This just makes it that much more critical that those with a mission to provide reliable resource description find common conventions (including means for validation) on which we may build the future we envision." (WEIBEL 1998)

The following suggestion has been made in that context:

We suspect that navigational meta-information will be brokered by trusted third parties. I would expect it to develop as something like Yellow Pages -- it will cost money to describe your resources; the more you pay, the more listings you can have. I'm referring here to commercial listings; I would expect services like Altavista, Yahoo! and so forth to continue their no-charge services, but I wouldn't be surprised if they focus away from describing resources that are for sale.

I am not sure whether this is a promising or desirable path: there may be a way of involving public institutions like libraries in this necessary process of information brokering. While I agree, that trusted third parties will be needed in this process, I am not sure whether all of us would be happy to entirely depend on the brokering services of commercial institutions in this vital context of information validation. Even if this idea is anticyclic in the sense of moving against the current wave of deregulation I do think that this is an important point to consider.

Taking up again the title of this paper: it should have become clear by now, that metadata indeed is not a mere buzzword and not at all old wine in new bottles. The approach this term is synonym of stems from an information paradigm differing from that of library cataloguing activity and I think, that libraries should feel invited to follow its evolution intensely and not perceive it as a possible threat but rather as a chance of re-defining their role in the context of newly emerging information paradigms.

References

Arnett, Nick: Re: authentication of metadata. meta2@mrrl.lut.ac.uk (23 Jan. 1998) (= ARNETT 1998)

Berners-Lee, Tim: Metadata Architecture. Documents, Metadata, and Links. Last edit Date: 1998/02/06 17:06:46. http://www.w3.org/DesignIssues/Metadata.html (= BERNER-LEE 1998)

Caplan, Priscilla: You Call It Corn, We Call It Syntax-Independent Metadata for Document-Like Objects. In: The Public-Access Computer Systems Review 6, no. 4, 1995. http://info.lib.uh.edu/pr/v6/n4/capl6n4.html (= CAPLAN 1995)

Heery, Rachel: Metadata Formats. December 1996. Deliverable D1.1 - Work Package 1 of Telematics for Libraries project BIBLINK (LB 4034) http://www.ukoln.ac.uk/BIBLINK/wp1/d1.1/ (= HEERY 1996a)

Heery, Rachel: Review of Metadata Formats. In: Program, Vol. 30, No. 4, October 1996, pp. 345-373 (= HEERY 1996b)

Lagoze, Carl: From Static to Dynamic Surrogates. Resource Discovery in the Digital Age. In: D-Lib Magazine, June 1997. http://www.dlib.org/dlib/june97/06lagoze.html (LAGOZE 1997)

Miller, Paul: Metadata for the Masses. In: Ariadne, 5, Sept. 1996. http://www.ariadne.ac.uk/issue5/metadata-masses/ (= MILLER 1996)

Metadata Workshop, Luxembourg - 1-2 December 1997. Workshop Report. http://hosted.ukoln.ac.uk/ec/metadata-1997/report/

Miller, Paul: An Introduction to the Resource Description Framework. In: D-Lib Magazine, May 1998 (= MILLER 1998)

Olson, Nancy B. (Ed.): Cataloging Internet Resources. A Manual and Practical Guide. Second Edition. http://www.oclc.org/oclc/man/9256cat/toc.htm (= OLSON)

A User Guide for simple Dublin Core. Draft version 4.0 (15/05/1998) ; http://128.253.70.110/DC5/UserGuide4.html (= USER GUIDE)

Weibel, Stuart: Re: authentication of metadata. meta2@mrrl.lut.ac.uk (23 Jan. 1998) (= WEIBEL 1998)

Weibel, Stuart and Hakela, Juha. DC-5: The Helsinki Metadata Workshop: A Report on the Workshop and Subsequent Developments. Official report of the Helsinki DC Meeting. In : D-Lib Magazine, February 1998, http://www.dlib.org/dlib/february98/02weibel.html (= WEIBEL/HAKELA 1998)

Weinheimer, James: Re: authentication of metadata. meta2@mrrl.lut.ac.uk (23 Jan. 1998) (= WEINHEIMER 1998)

Xu, Amanda: Metadata Conversion and the Library OPAC. In: The Serials Librarian 33 (1-4) (Spring 1998), http://web.mit.edu/waynej/www/xu.htm (= XU 1998)

End Notes

In the context of this definition, the just emerging DC drafts are put on one level with other "standards defining metadata element sets, from AACR2 to GILS" (CAPLAN 1995)
Correspondingly, professional publications from the library context such as OLSON do not mention DC as a relevant production context.
As is demonstrated in Mapping the Dublin Core Metadata Elements to USMARC. OCLC Discussion Paper No. 86. May, 1995. (http://ifla.inist.fr/documents/libraries/cataloging/dublin1.txt) and elsewhere.
This principle is still held up even with clear consciousness of the "complexity of relationships among related works resists coherent explication" in WEIBEL/HAKELA 1998.
The current proposals for using XML based RDF syntax are a good example for this; cf. MILLER 1998.

64th IFLA General Conference August 16 - August 21, 1998