As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites
This old website and all of its content will stay on as archive – http://archive.ifla.org
Much literature has been written speculating upon how classification can be used in online catalogs to improve information retrieval. While some empirical studies have been done exploring whether the direct use of traditional classification schemes designed for a manual environment is effective and efficient in the online environment, none has manipulated these manual classifications in such a w ay as to take full advantage of the power of both the classification and computer. It has been suggested by some authors, such as Wajenberg and Drabenstott, that this power could be realized if the individual components of synthesized DDC numbers could be identified and indexed. This paper looks at the feasibility of automatically decomposing DDC synthesized numbers and the implications of such decomposition for information retrieval.
Based on an analysis of the instructions for synthesizing numbers in the main class Arts (700) and all DDC Tables, 17 decomposition rules were defined, 13 covering the Add Notes and four the Standard Subdivisions. 1,701 DDC synthesized numbers were decomposed by a computer system called DND (Dewey Number Decomposer), developed by the author. From the 1,701 numbers, 600 were randomly selected fo r examination by three judges, each evaluating 200 numbers. The decomposition success rate was 100% and it was concluded that synthesized DDC numbers can be accurately decomposed automatically.
The study has implications for information retrieval, expert systems for assigning DDC numbers, automatic indexing, switching language development, enhancing classifiers' work, teaching library school students, and providing quality control for DDC number assignments. These implications were explored using a prototype retrieval system.
So it seems logical to ask: can DDC synthesized numbers be decomposed automatically? It would be very significant if they could be because the result could then be applied in any library throughout the world where the DDC is used. In my dissertation study I found out that DDC synthesized numbers could be decomposed automatically. Because of the time restriction here, I will not discuss the stu dy in detail, but briefly describe the algorithms I developed for decomposing DDC synthesized numbers and conclusions of my study.
There are basically two types of rules classifiers use to synthesize DDC numbers: Add Notes and Standard Subdivisions. I examined all the Add Notes and instructions for standard subdivisions in the main class 700 (Arts) and the DDC. The purpose was to classify these Add Notes and Standard Subdivisions into a few categories based on their characteristics and patterns so that the resulting decomp osition rules would be small enough to be manageable and efficient. After examination and analysis, I was able to classify Add Notes into 13 groups and Standard Subdivisions into four groups. I then defined 17 decomposition rules, 13 for Add Notes, and 4 for Standard Subdivisions. The following illustrates a typical decomposition rule:
RULE for Add Note 1 (AN1)
If AN1, concatenate the number after "the numbers following" and the target number minus the base number; insert a decimal after the third digit, if the concatenated number has more than three digits and does not have decimal; match the number against the Schedule.
Example: Decompose 751.422436 (Painting little landscapes: small-scale watercolors).
I developed a computer system called DND (Dewey Number Decomposer, see Appendix B) to manipulate these algorithms and used it to decompose 1,701 DDC synthesized numbers . From the 1,701 numbers, 600 were randomly selected for examination by three judges, each evaluating 200 numbers. The decomposition success rate was 100% and it was concluded that synthesized DDC numbers can be accurately decomposed automatically.
Not included in the statistic were forty-two numbers that could not be decomposed. These 42 numbers were sent to the DDC Division of the Library of Congress for clarification. The DDC Division verified that in fact the 42 numbers that could not be decomposed were incorrectly constructed. Although the Dewey Number Decomposer was not designed to detect errors in classification, it seems an extra bonus that it exhibited this unplanned ability to identify incorrect DDC numbers already in the OCLC bibliographic database. Identifying incorrect DDC numbers already in a database as large as OCLC would be a task too monumental to accomplish without computer aid. With the Dewey Number Decomposer , however, we can easily and quickly find out if there are any incorrect DDC numbers in a database that consists of millions of bibliographic records.
Although I examined only DDC synthesized class numbers in the main class Arts (700), I strongly believe that synthesized numbers in all the DDC main classes can be accurately decomposed automatically, because the synthesizing rules for the Arts (700) are representative of the rules throughout the schedule. Synthesizing rules for the main classes Literature (800) and Language (400) may differ som ewhat from those for other classes. However, I looked briefly at the synthesizing rules for these two classes and believe the differences are not significant and that with some modification or augmentation, the decomposing rules that were defined for the 700’s could be extended to synthesized numbers in all DDC main classes. Nevertheless, a next logical step is to extend and apply the methodolo gy to other DDC classes. Future research on the automatic decomposition of synthesized numbers in the other nine main DDC classes would validate and supplement what I have done so far.
While the chief practical implication of automatic decomposition relates to the retrieval options that can be realized, there are implications also for automatic indexing, switching language development, enhancing classifiers' work, teaching library school students, and providing quality control for DDC number assignments.
To explore these implications, I created two test bibliographic databases, one with decomposed DDC numbers and the other with synthesized DDC numbers. Both databases were created from 11,662 bibliographic records collected from the OCLC Union Catalog. After removing records containing numbers that were not synthesized or assigned according to 20th edition, 3,749 records remained; this was the nu mber in each of two databases. The DDC numbers were decomposed and, along with pointers to their original records, stored in a separate inverted file for searching purposes.
The decomposition of DDC numbers can improve information retrieval in a way not possible before by providing multiple access points represented by component numbers and enabling different combinations of component numbers. To illustrate this, I designed a prototype retrieval system utilizing the power of decomposed DDC numbers. Now I will demonstrate a search I performed using the prototype sys tem, which illustrates how component numbers can improve information retrieval.
Figure 1 is a DDC schedule display. The number to the right is the number of records associated with the class on the left. I was interested in Photographs, so I started my search by selecting the class Photographs from the Schedule Display. At this point, I could either browse titles classed under Photographs or request a display of aspects from which Photographs are treated. Because many ti tles are associated with the class Photographs, as we can see from the number on the right, which is 215, I wanted to improve precision and therefore, I requested a display of aspects to narrow my search. The system then responds with an aspect table of Photographs, again showing the number of associated titles (see Figure 2).
Because the component numbers are decomposed, the DND can enumerate aspects by which a given topic is treated in a particular database. For instance, in the sample database there is a record with a synthesized DDC number 779.082, which decomposes into 779 for Photographs and 082 for women. When constructing the aspect table for Photographs, the system can use the fact these numbers are associat ed as components and include the aspect Women in the table (see Figure 2). Now, if I select the aspect Women, I will retrieve all records with DDC numbers containing both component numbers 779 for Photographs and 082 for Women. At this point, I have three options:
Suppose I want to find out if the database contains any work on Photographs of the United States; I highlight Historical, geographical, persons treatment (associated with 185 titles) and select the second option. The system responds with a display of subaspects of Historical, geographical, persons treatment (see Figure 3). The same three options as above are provided. I move the cursor to highlight the United States and select the first option, which results in a display of the six titles on Photographs of the United States (see Figure 4).
Improving retrieval precision using number decomposition could proceed in a number of ways. I could refine the search to retrieve a particular kind of photographs, such as photographs of Men, Animals, Plants, or Landscapes. For example, instead of browsing titles after selecting the United States, I could select the third option to narrow the search with another aspect. After the system respon ds with a list of aspects, I could highlight the aspect Landscapes and then request to browse the titles, which would retrieve three titles on Photographs of Landscapes in the United States (see Figures 5 and 6).
What this shows is that using the automatic decomposition, it is possible to retrieve a small set of documents consisting of only works on Photographs of landscapes in the United States. Precision of this sort would have been very difficult, if not impossible, without such decomposition. For example, given only synthesized numbers, one would have to search the truncated class number 779 for Photographs, which would retrieve 215 records in the test database, and try to discover those three relevant records by browsing through the 215 records retrieved. One could argue that the truncated class number 779.36 for photographs of landscapes should be searched instead of 779. However, considering that the number 779.36 is a synthesized number itself obtained by adding two class numbe rs together, it is probably safe to say that no one but professional classifiers could perform such a search. Moreover, even such a search like this would still retrieve more irrelevant documents than relevant ones since United States is only one of hundreds of countries covered in DDC. In fact I did perform a truncation search on 779.36 and retrieved 15 records from the test database; as we kn ow only three of them are relevant (see Figure 7).
In addition to improving precision, decomposition can also improve recall. Supposed that a student of comparative literature wishes a comprehensive listing of works about symbolism in poetry . In a retrieval system using synthesized DDC numbers, for such a listing he has to search the class number 809.1915 for general and comparative studies of symbolism in poetry as well as class numbers for symbolism in poetry in all possible countries, e.g. 811.00915 (for American poetry), 841.00915 (for French poetry), etc. It would be very difficult for anyone to perform this search with a reasonable recall. However, with decomposed numbers, one needs to search only the number 1009 (for history, description, critical appraisal of poetry) or 809.19 and 15 (for symbolism, allegory, fantasy), and all works about symbolism in poetry will be retrieved. This is how the use of component numbers can promote high recall. Also their use makes a search for scattered information much easier and simpler.
In addition to improving recall and precision, the automatic decomposition of DDC numbers can facilitate retrieval by bringing together related materials through any component number. It has always been a problem in the design of a classification to decide which component or facet is most important when determining citation order. Citation order is the order in which components or facets are se quenced. Normally, in a manual environment only the first facet can be used to bring together related materials. For example, we can retrieve all materials about Photographs through the number 779, because 779 always occurs as the first facet of a synthesized number. However, we cannot easily retrieve materials about a particular country, say, the United States, because the number 73 for Unite d States does not normally occur as the lead facet. Traditionally, it has been assumed that no system can please all the people all the time; all that a system can do is to bring together related materials that fall into the first facet, the one deemed most important by classification designers and, hopefully, also by most users. In this regard decomposition can please all people all the time. With the use of decomposed component numbers, we can either retrieve all materials on Photographs by searching the number 779, or bring together all related documents on the United States by searching the number 73.
Appendix A. Figures System Browse Search Decompose Report Adjust Quit DDC Schedule- Titles, - Aspects _______________________________________________________________________________ . Photography and photographs 293 . . Philosophy and theory 0 . . Miscellany 5 . . Historical, geographical, persons treatment 15 . . Techniques, procedures [formerly also 770.28], apparatus, equipment, materials 1 . . Special processes 1 . . Specific fields and special kinds of photography, and related activities 36 . . Photographs 215 . Music 540 . . Philosophy and theory 1 . . Miscellany 50 . . Education, research, performances, related topics 31 . . History and description of music with respect to kinds of persons 28 . . Historical, geographical, persons treatment 80 . . Principles, forms, ensembles, voices, instruments 0 Figure 1. DDC Schedule Browsing, With Photographs Highlighted Aspects of Photographs - Titles, - Subaspects, - Another Aspect ______________________________________________________________________________ Historical, geographical, persons treatment 185 Other specific subjects 34 Landscapes 15 Museums, collections, exhibits [formerly also 069.9] 14 Animals 10 Human figures and their parts 9 Women 7 Men 6 Nudes 5 Children 4 Architectural subjects and cityscapes 2 Plastic arts Sculpture 1 Plants 1 Nature and still life 1 Jazz. . Principles, forms, ensembles, voices, instruments 1 Erotica 1 Figure 2. Aspects of Photographs Historical, geographical, persons treatment of Photographs - Titles, - Subaspects, - Another Aspect ______________________________________________________________________________ Persons 128 United States 6 Metropolitan Toronto 5 Union of Soviet Socialist Republics (Soviet Union)####Russia 2 Collected treatment 2 1980-1989 2 1950-1959 2 Los Angeles 1 Santa Barbara County 1 Western United States 1 Southwest central counties of Lower Peninsula 1 Illinois 1 Orleans Parish (New Orleans) 1 Birmingham 1 District of Columbia (Washington) 1 Figure 3. Subaspects of Historical, geographical, persons treatment of Photographs, with United States Highlighted Titles Display Photographs AND United States ______________________________________________________________________________ Best friends : #ba pictorial celebration #cby the winners of Between home and heaven : #bcontemporary American landscape Measure of emptiness : #bgrain elevators in the American Motion and document, sequence and time : #bEadweard Muybridge The changing face of America #cPeter C. Jones. Typologies : #bnine contemporary photographers #corganized by Figure 4. Title Display for Photographs of United States United States of Photographs - Titles, - Subaspects, - Another Aspect ______________________________________________________________________________ Historical, geographical, persons treatment 185 Other specific subjects 34 Landscapes 15 Museums, collections, exhibits [formerly also 069.9] 14 Animals 10 Human figures and their parts 9 Women 7 Men 6 Nudes 5 Children 4 Architectural subjects and cityscapes 2 Plastic arts Sculpture 1 Plants 1 Nature and still life 1 Jazz. . Principles, forms, ensembles, voices, instruments 1 Erotica 1 Figure 5. Choosing Another Aspect for United States of Photographs, with Landscapes Highlighted Photographs AND United States AND Landscapes ______________________________________________________________________________ Between home and heaven : #bcontemporary American landscape Measure of emptiness : #bgrain elevators in the American The changing face of America #cPeter C. Jones. Figure 6. Title Display for Photographs of Landscapes in United States 779.36? _______________________________________________________________________________ *Between home and heaven : #bcontemporary American landscape Desert landscape : #bphotographs #cby Len Jenshel ; [editor, Keep it simple : #ba defense of the earth #ctext and Light on the land #cphotography, Art Wolfe ; text, Art Light on the land #cphotography, Art Wolfe ; text, Art *Measure of emptiness : #bgrain elevators in the American On second glance : #bMidwest photographs #cby Larry Kanfer ; Southern light #cphotography by James Valentine ; text by *The changing face of America #cPeter C. Jones. V svitlovomu koli #cDavyd Firman = In a circle of light / V svitlovomu koli #cDavyd Firman = In a circle of light / Wave Hill pictured : #bcelebration of a garden #cby Jean E. West coast impressions : #bthe dynamic British Columbia Wilderness scenario : #bpeaceful images of the wild #cby Pat Figure 7. Title Display for 776.36? System Browse Search Decompose Report Adjust Quit Figure 8. Explanation of the Decomposition of 796.323640979494 Appendix B. DND Systems Flowchart DND: Dewey Number Decomposer
Anderson, P. F. "Expert Systems, Expertise, and the Library and Information Professions." LISR. 10:367-388; 1988
Bates, Marcia J. "Subject Access in Online Catalogs: A Design Model." Journal of the American Society for Information Science. 37, no. 6 (November 1986): 357-376.
Borko, Harold. "Artificial Intelligence and Expert Systems Research and their possible impact on Information Science Education." Education for Information. 3:103-114; 1985.
Burton, Paul F. "Expert Systems in Classification." In Expert Systems in Libraries, ed. Forbes Gibb, 50-66. London: Taylor Graham, 1985.
Byrne, A.; Micco, M. "Improving OPAC Subject Access: The ADFA Experiment." College and Research Libraries. 49 (1988): 432-441.
Chan, Lois Mai. "Library of Congress Classification as an Online Retrieval Tool: Potentials and Limitations." Information Science. 37: 357-376; 1986. Technology and Libraries. 5, no. 3 (September 1986): 181-192.
Drabenstott, Karen Markey. "Subject Searching Experiences and Needs of Online Catalog Users: Implications for Library Classification." Library Resources and Technical Services. (29):34-51; 1985.
Drabenstott, Karen Markey; Demeyer, Ann N. Dewey Decimal Classification Online Project. Dubin, Ohio: OCLC Online Computer Library Center, 1986.
Drabenstott, Karen Markey. "Class Number Searching in an Experimental Online Catalog." International Classification. 13, no. 3 (1986): 142-150.
Drabenstott, Karen Markey. "Searching and Browsing the Dewey Decimal Classification in an Online Catalog." Cataloging and Classification Quarterly. 7(3):37-68; 1987.
Drabenstott, Karen Markey. "Experiences with Online Catalogs in the USA Using a Classification Scheme As a Subject Searching Tool." In Tools for Knowledge Organization and the Human Interface (Advances in Knowledge Organization, vol), ed. Robert Fugmann, 35-46. Frankfurt/Main: Indeks-Verl., 1990.
Duncan, Elizabeth; Williams, James G.. "A Rule Based System for Translating Dewey Decimal Numbers." A Project Report. 1991.
Endres-Niggemeyer, Brigitte; Schmidt, Bettina. "Knowledge Based Classification Systems: Basic Issues, a Toy System and Further Prospects." International Classification. 16, no. 3 (1989): 146-156.
Fenly, Charles. Expert Systems: Concepts and Applications. Cataloging Distribution Service, Library of Congress, press. Washington, D.C., 1988.
Jewitt, Clement. "A Subject Indexing Engine." In: Proceedings of 8th International Online Information Meeting, London, December 4-6, 1984. Oxford and New Jersey: Learned Information, 1984, 151-160.
Liu, Songqiao. "Online Classification Notation: Proposal for a Flexible Faceted Notation System (FFNS)." International Classification. 17, no. 1 (1990): 14-20.
Liu, Songqiao; Svenonius, Elaine. "DORS: DDC Online Retrieval Systems." Library Resources & Technical Services. 35, no. 4 (1991): 359-375.
Mandel, C. A. "Enriching the Library Catalog Record for Subject Access." Library Resources and Technical Services. 29 (1985): 5-15.
Mandel, C. A. "Computer Age Classification: Applications for Library Practice." In: Classification Theory in the Computer Age, Proceedings from the Conference, November 18-19, 1988. Albany, New York: University at Albany, 1989, 89-92.
Matthews, Joseph R.; Lawrance, Gray S.; Ferguson, D. K. Using Online Catalogs: A Nationwide Survey. New York: Neal-Schuman, 1983.
Satija, M. P. and John Comaromi. Introduction to the Practice of Dewey Decimal Classification. New Dehli: Stering Publishers Private Ltd, 1987.
Schultz, Lois. "Designing An Expert System to Assign Dewey Classification Numbers to Scores." In: Proceedings of National Online Meeting, New York, May 9-11, 1989. Medford, NJ: Learned Information, 1989, 393-397.
Seetharama, S. "Compatibility among Classification Systems: A Case Study in the Classification of Cardiovascular Diseases." International Classification. 12, no. 2 (1985): 80-86.
Sharif, C. A. Developing an Expert System for Classification of Books Using Micro-Based Expert System Shells. London: British Library, 1988.
Svenonius, Elaine. "Use of Classification in Online Retrieval." Library Resources & Technical Services. (January/March 1983): 76-80.
Svenonius, Elaine. "An Ideal Classification for an On-line Catalog." In: Classification Theory in the Computer Age, Proceedings from the Conference, November 18-19, 1988, Albany, New York. Albany, New York: The School of Information Science and Policy and the Professional Development Program of Rockfeller College, University at Albany, 1989.
Wajenberg, Arnold S. "MARC Coding of DDC for Subject Retrieval." Information Technology and Libraries. 2 (September 1983): 246-251.
Waterman, Donald A. A Guide to Expert Systems. Reading, MA: Addison-Wesley Pub. Co, 1986.