62nd IFLA General Conference - Conference Proceedings - August 25-31, 1996

Contemporary Classification Systems and Thesauri in China

Zhang Qiyu

Liu Xiangsheng

Wang Dongbo


In China, the development of research, compilation and application of classification systems and thesauri (CS&T) as tools for organizing knowledge can reflect certain aspects of present development of Chinese librarianship.

The article describes the general progress of CS&T in China over a period of 46 years ( especially in recent 15 years ), relating to brief history and current status of classification systems, brief history and current status of thesauri, classified thesauri-- the tools for organizing knowledge integrating classification systems with thesauri, general trends of the development of CS&T in Ch ina, Chinese natural language searching method, research and teaching in the field of CS&T in China, etc.


In China, the development of research, compilation and application of classification systems and thesauri (hereafter referred to as CS&T) as tools for organizing knowledge, can reflect certain aspects of present development of Chinese librarianship.

I Classification Systems

The classification systems (hereafter referred to classification(s)) as main tools for organizing knowledge are used in libraries of China. The tradition of using classification dates back to the ancient times (2,000 years ago).

The modern classifications in China, with the marked Chinese characteristics, were influenced by DDC ( Dewey Decimal Classification) in the principles of compilation, the techniques of presentation and the systems of division. DDC itself, however, has been unable to be popularized in China.

The classifications used in contemporary China (after October, 1949) were mainly compiled after the founding of the People's Republic of China. Chinese socialist government enables the scientific, cultural and educational undertakings to become the absolutely necessary parts of the socialist construction. The task that libraries face is to change their work styles to serve the new society. Th e classifications with characteristics of nation and times had to have corresponding changes. In the early days, there were some classifications compiled on the basis of the old systems. The government realized that it was necessary to compile a more completely new classification for general libraries because the method of "new wine in old bottles" couldn't solve the problem. As Zheng Zhenduo, fo rmer vice minister of the Ministry of Culture, pointed out "new classification must be compiled by a group, not by individuals.

In more than 40 years, the compiled comprehensive classifications are listed in historical sequence as follows:

  1. Zhongguo Ren Min Da Xue Tu Shu Guan Tu Shu Fen Lei Fa (Library Classification of People's
  2. University of China), the 1st edition in 1953 and the 6th edition in preparation;
  3. Zhong Xiao Xing Tu Shu Guan Tu Shu Fen Lei Fa Cao An (Draft of Library Classification for Medium and Small Libraries), the trial edition in 1957 and out of print at present;
  4. Zhongguo Ke Xue Yuan Tu Shu Guan Tu Shu Fen Lei Fa ( Library Classification of the Chinese Academy of Sciences), the 1st edition in 1958 and the 3rd and the latest edition in 1994;
  5. Wuhan Da Xue Tu Shu Fen Lei Fa (Library Classification of Wuhan University), the 1st edition in 1959 and out of print at present;
  6. Zhongguo Tu Shu Guan Tu Shu Fen Lei Fa Cao An ( Draft of Chinese Library Classification), also entitled Da Xing Tu Shu Guan Tu Shu Fen Lei Fa (Classification for Large Libraries) , Volume 2: part for natural sciences, was published in 1963, and Volume 1 hasn't been published;
  7. Zhongguo Tu Shu Guan Tu Shu Fen Lei Fa (Chinese Library Classification), the trial edition in 1973, the 1st edition in 1975, the 3rd and the latest edition in 1990;
  8. Zhongguo Dang An Fen Lei Fa ( Chinese Archive Classification), published in 1978.

These classifications, which applied advanced compilation techniques, have the common characteristics of their new systems, Chinese nation and times.

Chinese Library Classification (hereafter referred to as CLC) is one of the most important and complete classifications. At present, it has been used in over 90% of libraries and information institutions of every types in China, especially all of public libraries. CLC began to be compiled by 36 main libraries and information institutions in 1971. Its trial edition was published in 1973, the 1st edition in 1975, the 2nd edition in 1980, the 3rd edition in 1990. Now it has formed a series, including basic edition (ca. 30,000 classes), enlarged edition (also entitled Zhongguo Tu Shu Zi Liao Fen Lei Fa (Classification for Monographs and Materials), ca. 50, 000 classes), abridged edition (ca. 4, 000 classes) , CLC for Children Libraries, CLC for Newspapers and Magazines, User's Manual, C ompared List of CLC and Chinese Thesaurus(also entitled Zhongguo Fen Lei Zhu Ti Ci Biao (Chinese Classified Thesaurus), with the function of relative index of CLC), etc. Now, several classifications for specialized subjects compatible with CLC is being compiled, some of which have been published, e.g. Zhongguo Tu Shu Guan Tu Shu Fen Lei Fa Jiao Yu Fen Lei Biao ( CLC for Education). CLC is divided into 5 categories, 22 divisions, and among them, the division of industrial technology has very detailed classes with 16 subdivisions. CLC is convenient to the users because its system is consistent with the development of contemporary science and technology, its classes are arranged reasonably, using mixed notation of letters and numbers. In 1985, CLC won the first-class prize of the National Award for the Advancement of Science and Technology. The abridged edition of CLC is published in Uighur language and Mongolian language, its basic edition (the 2nd edition) has already translated into Japanese and published by the Konno Institute for Chinese Language (in Japan).

Library Classification of People's University of China and Library Classification of the Chinese Academy of Sciences are used in such libraries and information institutions as libraries subordinated to Chinese Academy of Sciences and Chinese Academy of Social Sciences, some college and university libraries, some special libraries.

Chinese Archive Classification (CAC) consists of three kinds of classification schedules, one of them is main body, entitled Chinese Archive Classification, which is only used for classifying the archives of People's Republic of China, the others are supplements, entitled respectively Qing Dai Dang An Fen Lei Biao (Classification Schedule for Archives of Qing Dynasty) and Min Guo Dang An Fe n Lei Biao (Classification Schedule for Archives in the Republican Period).

CAC is an actually a series of classifications used for archives at all levels and all kinds all over the country. At present, several specialized subject classifications enlarged and adapted according to the main body of CAC are being compiled and published successively.

There are more than 10 such specialized classifications as Jun Shi Tu Shu Zi Liao Fen Lei Fa ( Classification for Military Science) in addition to above-mentioned general classifications. Quite a few of them, however, are not published publicly. Moreover, there are many other classification systems used in abstracting and indexing periodicals.

Some foreign classifications have been translated into Chinese. Universal Decimal Classification (UDC) used to be used in the information institutions of science and technology in 1960's and is now used for indexing Chinese standard documents. Dewey Decimal Classification (DDC) is used in several libraries for parts of their collections, DDC Chinese Edition is expected to be translated and p ublished in China in the near future. International Patent Classification ( IPC) is now used for indexing Chinese patent documents.

II Thesauri

In China, there were few libraries that created subject catalogues before 1960's. Hang Kong Ke Ji Zi Liao Zhu Ti Biao (Subject Headings of Aviation Science and Technology), published in 1964, was the first one taken into practice.

In 1974, "Project 748", a system project on Chinese Information Processing, began to be carried out. One part of the Project is the compilation of Han Yu Zhu Ti Ci Biao (Chinese Thesaurus, hereafter referred to as CT) planned for computerized information retrieval systems. CT, one of the largest thesauri in the world, has 91,158 descriptors and 17, 410 non-descriptors. Headed by Institute of Scientific and Technical Information of China and National Library of China, 1,378 compilers in the specific subject field from 505 institutions were in charge of its compilation. CT was published in 1980, and won the second-class prize of the National Award for the Advancement of Science and Technology in 1985. Because of compilation of CT, The Chinese tools for organizing knowledge got into a new era of simultaneous development of CS&T.

CT is mainly used in general libraries and information institutions. Its low specificity of terms in thesaurus is revealed when it is used in specialized libraries and information institutions, especially in specialized databases of journal papers. Later, specialized subject thesauri providing more specialized subject terms, compiled on the basis of CT by some institutions that participated in the compilation of CT have come out one after another for the needs of specialized databases. Since then, more and more thesauri comes into being. There are more than 100 thesauri already compiled so far. These specialized subject thesauri, being published publicly one after another and covering almost all subject fields, have substantial quantity of terms. It is affirmed that thesauri are the main indexing tools used in the Chinese databases.

There are two other general thesauri in China, Zhongguo Dang An Zhu Ti Ci Biao (Chinese Thesaurus for Archives) and Jun Yong Zhu Ti Ci Biao ( Military Thesaurus). Military Thesaurus is a series of large-scale thesauri, with a dictionary of descriptors as one volume.

III Tools Integrating CS&T for Organizing Knowledge - - Classified Thesauri

China is the country with the tradition of using classifications. Library staff members are used to indexing and organizing materials with classification. Meanwhile, they feel somewhat difficult in indexing with thesaurus, which become an obstruction to popularize thesaurus. For this reason, library staff members hope to find a simple method of transition from classification to thesaurus, a t the least, find a tool for organizing knowledge that links class numbers and terms, thus lower the difficulty degree of subject indexing. Integration of CS&T, that is to say, establishment of a two-way corresponding list between class numbers and descriptors is a scheme to solve the problem. Such scheme is inspired by "thesaurofacet", but is different from "thesaurofacet". In China, It is most practical to integrate two main tools for organizing knowledge--CLC and CT. It is conceivable that the establishment of corresponding list of class numbers and descriptors is more difficult than the construction of thesaurofacet because of pre-coordinate class numbers and post-coordinate descriptors. However, It is confirmed that there is a great possibility to have an integration of CLC and CT t hrough our research and experiment.

In the end of 1986, 40 institutions, headed by National Library of China, began to compile Chinese Classified Thesaurus (hereafter referred to as CCT). The compilation of CCT is a huge and complicated project, and was finished in 1994. The printed CCT, divided into 2 parts and 6 volumes, 6,215 pages, was also published publicly in the same year.

CCT is a multifuctional indexing tool, it is not only a complete CLC and Classification for Monographs and Materials, but also a revised and enlarged CT. Through two-way corresponding of class numbers and descriptors, the part of classification schedule can be taken for category index and hierarchical indexes of CT, the part of thesaurus can be taken for class relative indexes of CLC and Cla ssification for Monographs and Materials. It is greatly convenient for classifying and subject indexing and retrieving. Subject searching in the bibliographic database with class numbers and without descriptors, linked with machine-readable version of CCT is demonstrated to have satisfied retrieval effectiveness.

There are two other similar classified thesauri published in the field of the Medical Science and Education. There are also two thesaurofacets, Educational Thesaurofacet, and Retrieval Vocabularies of Social Sciences and Humanities, a large- scale multi-disciplinary thesaurofacet.

IV Development Trends in the Field of CS&T

In the development of modernization of Chinese librarianship, people in the library field are aware that CS&T must attain standardization, compatibility, systematization, computerization, combination and integration in order to meet with the needs of network of information retrieval.

The China National Technical Committee for Standardization of Information and Documentation (CNTCSID), founded in 1979, corresponding to ISO TC46, sets up the 5th Subcommittee engaged in standardization of CS&T and indexing. Since 1980, the Subcommittee has been doing its work vigorously and doing research on the standardization in this field, for example, recommendation of CLC and CT as nat ional standards; quality inspection on CAC; formulation of national standards as follows:

  1. Wen Xian Zhu Ti Biao Yin Gui Ze (Documentation--Guidelines for Determining the Subject and
  2. Choosing Terms);
  3. Wen Xian Fen Lei Gui Ze (Documentation--Guidelines for Determining the Subject and Choosing Class Numbers);
  4. Han Yu Xu Ci Biao Bian Zhi Gui Ze (Documentation--Guidelines for the Establishment and Development of Chinese Thesauri);
  5. Duo Yu Zhong Xu Ci Biao Bian Zhi Gui Ze (Documentation--Guidelines for the Establishment and Development of Multilingual Thesauri);
  6. Dang An Fen Lei Gui Ze ( Documentation--Guidelines for Classifying Archives);
  7. Tong Lei Shu Pai Lie Yong Shu Ci Hao Bian Zhi Gui Ze (Documentation--Guidelines for Book Number in the Same Class), etc.

All of them except (ii) and (vi) have been promulgated as national standards. These standards are consistent with international standards.

The problem of compatibility of thesauri is also paid close attention. More than 100 specialized subject thesauri have been compiled since 1980's. These thesauri are in consistence not only with ISO2788 Documentation --Guidelines for Establishment and Development of Monolingual Thesauri and GB13190 Documentation--Guidelines for Establishment and Development of Chinese Thesauri in the constru ction techniques, but also with CT in the choice of terms. In order to achieve the compatibility more conveniently, a project called National Term Bank is being carried out. So far, many thesauri have been inputted into the Bank as a compatible center. In addition, the term bank of Military Thesaurus is being set up.

Another way to achieve the compatibility is to develop a series of classifications and thesauri. CLC, CAC, and Military Thesaurus etc. , which consist of a series of classifications or thesauri respectively, are compatible large scale knowledge tools. Such practice may be the characteristic of China.

In China, computer application in the library and information work is wide, and the computerization of CS&T is the inevitable trend. The computer is used popularly in the production of printed CS&T. In addition, the use of computer reaches a higher level in auto-generation of some parts of CS&T and in their management. For example, Educational Thesaurus, CCT, Retrieval Vocabularies for Socia l Sciences and Humanities, Military Thesaurus, etc. There are some progresses on automatic assigned indexing of class numbers. These actual experience will speed up the development of computerization of CS&T.

The combination reform of enumerative classification, namely increase of combined component, become a trend, because this scheme can eliminate the disadvantages of having limited capacity of concepts and the contradiction between centralization and decentralization, and develop the retrieval functions more effectively in the computerized information retrieval systems. Library circle held two nationwide seminars before and reached a common view on the problem. At present, the better reform scheme in the techniques is being explored.

Because of possibility of practical verification, the integration of CS&T will be an important trend in the field of tools for organizing knowledge in China in the future.

V Research of Natural Language Application into Information Retrieval

To solve the problem of timely processing and effective use of vast amount of materials, library and information scientists have been doing researches in the use of natural language ( Chinese) in the information in the recent ten years. Automatic language processing, i.e. automatic term extraction is taken as core of the use of natural language in information retrieval. Unlike the sentences in English, French, German and Russian, there is no separation marks in Chinese sentences. A Chinese character can be combined with many other Chinese characters to form words and phrases which are different in meaning. It is difficult for computer to recognize which is a Chinese character or which is a word made up of several characters, thus to separate them automatically, and it is difficult t o draw a distinction exactly between useful word and useless word. In the retrieval using Chinese natural language directly, therefore, it is necessary to solve the technique that the words can be separated automatically from Chinese sentences by computer. This technique is called Chinese word separation technique. Researches in this field have been made, and many proposals on term separation hav e been offered in the recent years. Generally speaking, some of them can meet actual needs, thus have been used in the system. One of the practical systems is Word Extraction by Component Dictionary. Most of them, however, are still in the stage of experiments. It is because automatic Chinese term extraction is difficult, contrary to Euro-American, there are few keyword indexes created automatica lly by computers and information retrieval systems on the basis of technique of automatic term extraction in China. It can be said, however, that it is not too far to solve the problem of automatic Chinese term extraction.

Besides the technique of automatic Chinese term extraction, other ways using natural language for searching are also being explored, for example, inputting one or several Chinese characters or one or several words formed by several Chinese characters and then searching in the machine-readable text, which is the inherent function of computer word processing and is slow. There is a way to quic ken the searching, called the single Chinese character searching, i.e. index according to Chinese character, or non-indexed system. In this way, every Chinese character from title, abstract or text is put into index (useless word also can be excluded). Then several Chinese characters can be combined when searching. In addition, there is a way of keyword extraction by man-machine interaction and a way of free indexing without word list, etc. Compared with automatic Chinese exaction, these ways is all easier to realize and have already been used in some systems.

The disadvantages of all ways using natural language are that there are a great deal of synonyms, near synonyms, polysemies, ambiguous meaning and lack of semantic connection between words. These factors affect not only the recall ratio, but also the precision ratio. Therefore, the control of these factors, i.e. post-control of them, is still needed. The research of post-controlled vocabular ies is given more and more attention by the researchers of CS&T, and there has been some good results.

VI Research and Teaching of CS&T

In China, there are many institutions engaged in scientific research or organizing work of research activities. Among them are National Library of China, Institute of Scientific and Technical Information of China, Documentation and Information Center of Chinese Academy of Sciences, China Society for Library Science and its branch at the provincial and municipal level, China Society for Scien tific and Technical Information, Chinese Archives Association, Index Science Association of China, 5th Subcommittee of the China National Technical Committee for Standardization of Information and Documentation (CNTCSID), and several departments of library and information or departments of information management in universities and colleges. The Classifications and Thesauri Section, subordinated to National Library of China, is not only editorial and management body of CLC, CT and CCT, but also an important organizer of many research activities and sponsor of many academic conferences.

The activities for study and discussion of CS&T in China can be divided into two main fields: (i) researches on the compilation, revision and evaluation of CS&T, publishing essays by nationwide scholars and library staff members and organizing seminars; (ii) researches on the theories in the field, mainly conducted by teachers and postgraduates in universities and colleges. In China, the active researchers in CS&T are Pi Gaopin, Zhang Qiyu, Li Xinghui, Bai Guoying, Liu Xiangsheng, Zhao Zongren, Qian Qilin, Hou Hanqing, Qiu Feng, Wang Yongcheng, Chen Shunian, Zeng Lei, Dai Weimin, etc.

More than 40 departments of library and information science or information management in universities or colleges offer courses on CS&T, Postgraduates are educated in Wuhan University , Peking University, Air Force Political College (in Shanghai), East China Normal University ( in Shanghai) , Beijing Normal University, Zhongshan University (in Guangzhou), the Documentation and Information Ce nter of the Chinese Academy of Sciences, etc.

In China, CS&T are called by a common name "Information Retrieval Language". The discipline of comprehensive study of methodology for organizing knowledge such as CS&T etc. is generally called "Information Linguistics". The discipline of separate study of classification is designed "Classification Science". There are many published monographs such as "Xian Dai Xi Fang Zhu Yao Tu Shu Fen Lei Fa Shu Ping (A Review on Main Modern Western Library Classifications)", by Liu Guojun, in 1980, "Tu Shu Fen Lei Xue ( Classification Science)" , by Bai Guoying, in 1981, "Qing Bao JIan Suo Yu Yan ( Information Retrieval Language)", by Zhang Qiyu, in 1983, "Zhu Ti Fa Yu Zhu Ti Biao Yin (Subject Indexing Language and Subject Indexing)", by Liu Xiangsheng, in 1985, "Han Yu Zhu Ti Ci Biao Biao Yin Sh ou Ce (Guide to Chinese Thesaurus)" , by Qian Qilin, in 1985, "Qing Bao Yu Yan Xue Ji Chu ( The Fundamentals of Information Linguistics)", by Zhang Qiyu, in 1987, "Qing Bao Jian Suo Yu Zhu Ti Ci Biao (Information Retrieval and Thesaurus)", by Qiu Feng, in 1988, "Tu Shu Fen Lei Xue (Classification Science)", by Zhou Jiliang et al., in 1989, "Zhu Ti Fa Dao Lun ( Introduction to Subject Indexing Lan guage)", by Hou Hanqing et. al., in 1991, "Dang Dai Fen Lei Fa Zhu Ti Fa Suo Yin Fa Yan Jiu (Research on Contemporary Library Classification, Thesaurus and Indexing)" by Hou Hanqing et al., in 1993, etc. There are also more than 5,000 articles on the field of CS&T.

The changes that had happened in the recent ten years in the research on CS&T, can be summed up as the following four aspects:

  1. the direction of research has been changed greatly. The research focus has been transferred to how to improve and use CS&T precisely in order to promote the retrieval effectiveness.

  2. The range of research has been broaden largely, and people are probed deeply into the common laws in organizing knowledge.

  3. The method of research has been improved, the researchers' minds become wider and wider.

  4. Taking in overseas research achievement actively, especially paying attention to overseas new technology and method.

The development in this period have raised the theoretical level of China CS&T more rapidly and greatly, thus narrowed down the gap between China and other advanced countries.


About author

ZHANG Qiyu, professor of the Department of Library and Information Science, Air Force Political College, former professor of the School of Library and Information Science and former director of the Institute of Library and Information Science, Wuhan University; former director of the department of Library and Archival Science, Air Force Political College.

LIU Xiangsheng, research librarian of the National Library of China; Standing Council Member and Secretary-General of the China Society for Library Science; Editor-in-Chief of Chinese Library Classification; member of China Technical Committee of Standardization for Information and Documentation

WANG Dongbo, associate research librarian and deputy director of Department of Cataloging for Chinese Monographs, National Library of China.