IFLANET home - International Federation of Library Associations and InstitutionsAnnual ConferenceSearchContacts

63rd IFLA General Conference - Conference Programme and Proceedings - August 31- September 5, 1997

DanBib - a union catalogue applied for user friendly flexible querying

Troels Andreasen,
Associate Professor,
Dept. of Computer Science,
Intelligent Systems Laboratory,
Roskilde University


Tommy Schomacker,
Head of Computer Department,
Danish Library Centre


DanBib is a union catalogue for all libraries in Denmark. This paper presents an approach for applying data mining and fuzzy logic in evaluation of queries to this database. In the approach a fuzzy semantic network relating keywords from the database is used. The network exploits the actual content of the database and is obtained by pure statistical means. Even so the relations between words resemble human association, sufficient to extend answers in an intuitive manner. The approach has been implemented as a web-interface to DanBib, and an experiment has been carried out.



In 1996 the Danish Library Centre (Dansk BiblioteksCenter - DBC) accomplished an evaluation of fuzzy logic, to see if it could be used in the construction of a more user friendly search interface to DanBib. The evaluation was performed in co-operation with the Intelligent Systems Laboratory at Roskilde University. ISL carried knowledge on fuzzy logic, and had performed several studies on the use of this technique for practical purposes. By combining knowledge of the ISL with the bibliographic expertise of DBC it was pointed out that fuzzy logic and related techniques could be very useful. Some of the possibilities were implemented in a www-based test model, which in a very convincing way supported the conclusion that fuzzy logic is a very relevant tool in user friendly search interfaces.

As a result of this comprehension a new project was formulated, now to construct a prototype of a user friendly interface to DanBib. The prototype is made as a plug-in module to the DanBib system. The project team has consists of scientists and programmers from ISL and librarians and programmers from DBC.

In the prototype a query is evaluated by flexible interpretation in two respects, partly by applying fuzzy evaluation of each single criterion stated and partly by applying a tolerant interpretation of the compound query by adjusting the connection of all criteria within a range, taking disjunction as the least and conjunction as the most restrictive connection.

DanBib - the vision of a union catalogue for all libraries in Denmark


Until the beginning of 1994 there were two main on-line catalogues in Denmark:

The Danish Library Centre (DBC) was responsible for BASIS and The Computer Department for Danish Research Libraries (FEK) was responsible for ALBA.

The main part of the Danish bibliographic records in BASIS was created and maintained by the Danish Library Centre as an improvement of the Danish national bibliography, while the majority of the foreign records were delivered by the public libraries. Re-use of these data was made by weekly standardized data deliveries on floppy discs to public and school libraries for use in their purchase of materials - with succeeding reporting to BASIS, so that the database could be used for localization. The research libraries’ re-use of the national bibliographic records was made through weekly transfers from BASIS to ALBA.

ALBA was maintained by bibliographic records from Danish research libraries and national bibliographic data from Denmark and foreign countries. Herewith ALBA could use for localization purposes as well as a reservoir for re-use for cataloguing.

As producer of the Danish national bibliography the Danish Library Centre had a close cooperation with the two legal deposit libraries - the Royal Library and the State and University Library. This cooperation, however, was difficult due to the fact that the three institutions were cataloguing to there separate databases - with partly different cataloguing practices.

It seemed evident that the registration process as well for the national bibliography and the legal deposit administration as for the re-use in the libraries could happen quicker and more rationally if the work could take place in one common database.

A working group established for evaluation of the legal deposit administration also came to this conclusion in its final report in April 1991.


The realization of this evident idea was temporarily hindered, by the above mentioned differences in cataloguing levels and working routines in the individual institutions. Yet, a welcome occasion of hurrying up this matter occurred as it - parallel with the considerations of a legal deposit database - was realized that both central systems were in need of renewal due to outdated techniques. The calculations showed that on procurement and operation of edp equipment alone, great gains were obtainable by combining the systems.

At a common library meeting in September 1990 this idea was specifically formulated and was generally supported. Then the work started with formulating a data model for the common database, i.e. in reality a common reference framework for the libraries. An important element was the determination of the principle “one title - one record” - in other words, contradictory to earlier, where there for each title was a record for each library, there should only be one bibliographic record to a given title. This main principle was kept by the implementation, but the chosen data model still allows the participating to add local data to the title record.

Parallel to the considerations of the data model, which was of great importance to the internal work and cooperation between the libraries, a pure edp technical evaluation of the demands to the new common system took place. The basic idea in these considerations was open standards.

The search for a suitable system began as an invitation to submit tenders, but after a long and complicated process, it was early 1993 ascertained that none of the offered systems were suitable for this specific task. Hereafter it was determined to develop the system on the “building block model”, i.e. piecing the system together by existing final modules together with own development.

The third element of the solution - next to the data model and the edp system - was the organizational part. It was decided to merge the DBC and FEK and hereto assign the task of developing and running of the common system DanBib.

After an intense developing effort in 1993 DanBib opened in January 1994 - first only with ALBA adjusted. Version 2 - with BASIS included - opened July 1995. There is a long line of ongoing and planned developments of DanBib.

DanBib contains approximately 10 million records and serves a total of 2300 users (up to 500 concurrent users). Half a million interlibrary loan orderings are transmitted in a year.

The use of DanBib

DanBib is used for many purposes, of which the three main purposes are:

Central to all tasks is the search facility which today is based on the international standard ISO 8777:1993 Information and documentation -- Commands for interactive text searching - also called Common Command Language (CCL). The basis for this is Boolean logic, which can be an efficient tool to the experienced user of the system. For the less experienced user, however, there is a need for more flexible search tools, which is the reason for the interest in the opportunities of using fuzzy logic.

In CCL the search expression is FIND followed by a search argument. The search argument may consist of search elements combined with Boolean operators, e. g. “FIND gold or silver”. In this paper we will use the term “compound query” instead of “search expression” and “constraints” instead of “search elements”.

Subject indexing in DanBib

In the prototype for flexible querying the starting point is the content of keywords of the database. Assignment of keywords to the Danish records goes back to 1981, where principles for indexing were determined. After a testing period the keyword assignment started in 1986, in the beginning only to books. The practice has gradually expanded to other types of materials, and now the majority of all new titles are provided with keywords. We are speaking of so-called postcoordinated controlled keywords. The indexing practice is according to the international standard ISO/DIS 2788:1986 Documentation -- Guidelines for the establishment and development of monolingual thesauri.

Flexible querying - applying fuzzy logic

Fuzzy logic and fuzzy query evaluation

The approach described in this paper is based on fuzzy evaluation of queries, that involves a fuzzy logic rather than a classical Boolean logic interpretation of queries. Fuzzy logic generalizes Boolean logic in distinguishing degrees of truth in between the extreme values false as true. Truth values are in fuzzy logic denoted from the interval [0,1], with 0 denoting false, 1 denoting true and values within ]0,1[ denoting true to some degree. While classical logic is based on classical set-theory, fuzzy logic is based on fuzzy set-theory. In classical set-theory we can only have an element as either fully a member or not at all a member of a given set. In fuzzy set-theory an element may be member of a given set to a degree, usually denoted by a value in the interval [0, 1], with degree 0 corresponding to not at all member and degree 1 to fully member, while e.g. 0.5 corresponds to member to some degree.

In natural language we often use words and concepts that are better modeled with fuzzy logic than with classical two-valued logic. For instance to express that a book is new, using only two-valued logic, it is necessary to decide on an exact limit on age, to distinguish "new book", say, books published within the latest 3 years are new, and books older than that are not new. Using fuzzy logic it is possible to get a lot closer to the intention with the concept. We could decide on a simple function as shown in figure 1, to express that, books published within the latest two years are new, books published within two years and four years ago are new to some degree, and books older than that, are not new.


Fig. 1. The concept "new book" modeled as a fuzzy set. The function shown is called the membership function for the concept.

Apart from introducing new concepts, we can use fuzzy logic to generalize equality into similarity. For instance the expression "equal to 3" can be relaxed into "around 3" and the expression "between 2 and 4" can be relaxed into "around 2 to 4" as shown in the two membership functions below.


Fig. 2. Relaxing equality into similarity.. The expression "between 2 and 4" relaxed into "around 2 to 4" : Shown are membership functions for the two expressions.

In fuzzy querying one important logical aspect is this kind of similarity for elements of a given domain.

For domains with a natural order it is straight-forward to define similar values for a given value; these are the values closest to the given value according to the order.

Problems arise when the domain in question is without ordering. In that case similarity must be explicitly defined in some kind of network relating similar elements. Of major interest in our approach are domains of words used in describing properties of objects, i. e. bibliographic entities. In a bibliographic database these may be authors keywords, cataloguers keywords, words appearing in titles, notes, abstracts or other textual attributes attached to objects.

Examples of explicitly defined similarity on domains of words are common natural language thesauri or word (and concept) taxonomies. Special cases of the latter are commonly used in the library cataloguing process. Such similarity on domains of words or concepts can, just as it is the case for simple numeric domains, by relaxed into a kind of graded similarity. For instance a relaxed thesauri could express:


Fig. 3. A relaxed thesaurus

In the approach described here we mainly focus on a third kind of similarity that gives graded relations between words of the domain expressing associations between words.

While a thesaurus relates words that are similar – to some degree – with respect to meaning, an association structure relates words that are similar with respect to context:


Fig. 4. A relaxed association structure

We call explicitly defined similarity for semantic networks and, when grades are attached to relations, fuzzy semantic networks. We apply in our approach a statistical based fuzzy semantic network, denoting a special interpretation of associations.

Some methods for fuzzy querying in a bibliographic database

The bibliographic databases used in this project are of quite complex database schemes called the danMARC-format. However, to explain the general idea we need only to focus on a simplified subset of the scheme:

That is, bibliographic objects (bibliographic references) described with 6 attributes, where 3 of these can take multiple values (indicated with *). Of the attributes mentioned, especially the multi-value attribute Keyword is of interest. This attribute holds one or more words selected from a classification system or a controlled list of keywords and attached by the cataloguer.

Our approach to fuzzy querying extends conventional Boolean querying in two respects:

The former leads to a tolerant aggregation of constraints, whereas the latter is about relaxing a single constraint into a more tolerant counterpart based on predefined parameters delimiting the fuzzyfication.

We present below first the fuzzy interpretation of a compound query. We introduce then a special semantic network applied in the prototype, which is purely based on statistics on the state of the database. Finally we describe the overall principle of query evaluation and give examples of queries evaluated against the small database used in the experiment with the implemented prototype.

Simple weakening of a compound query

The implemented prototype is prepared for general fuzzy aggregation of query constraints. However to keep calculations simple and easy to check, we use only simple arithmetic average for aggregation. Further studies need to be done in order to investigate the influence of more advanced aggregation on the capability to compute good answers to queries.

The principle of aggregation of a compound query is as follows.

Thus the truth value of "O satisfies Q" is the average of the n single truth values of "O satisfies Ci".

Notice that this principle of query evaluation may be applied without any other kind of weakening. If all single constraints are crisp (can take only the values true or false, that is, 1 or 0), then the interpretation becomes: n satisfied constraints is considered better (is graded with a better truth value) than n-1 satisfied constraints, which in turn is considered better than n-2 satisfied constraints, etc.

Even though this principle is simple it may appear quite powerful in practice. It supports a new and better means for users querying the database. It is no longer necessary to minimize the set of constraints posed in a query as normally done by users who prefer non-empty answers. Suddenly it becomes fruitful to express all potentially relevant constraints describing the users intention, since the process of minimizing the set of constraints posed in a single query, which is usually performed by the user who prefer non-empty answers, is so to say replaced by the simple weakening performed by the system.

Semantic networks

Our main interest is as mentioned the association type of relation of words and a major issue in the project is how to establish such association networks. Since we did not have access to any useful manually created association networks apart from the available classification systems and since these constitute only very sparse networks (networks with only very few edges), we started early to focus on automatically discovery of associations by, so to say, mining data in the database. We have discussed an analyzed a range of different ways to perform such a task, but we do by no means consider our analysis as exhaustive. A lot more work has to be done in this direction. However we did deduce rules of thumb during discussion and experiments with controlled data mining for semantic networks.

For this purpose the most interesting domains in the bibliographic database is the set of words used in different descriptions of objects in the database, e.g. in titles, abstracts, notes, related keywords and even author.

For the experiment discussed in this paper we have chosen to build a network based on cataloguers keywords with no influence on associations from words as used in other fields describing objects. Thus even titles are ignored when associating words. We have chosen a simple function that relates two words A and B based on the frequencies with which they appear in descriptions of objects in the database. The function is asymmetric and takes as arguments:

The word A relates to B to the degree:

B relates to A similarly by:

We should emphasize here that this function has been chosen as an ad hoc first solution and that further studies has to be done to investigate the problem of statistical association.

Thus with the frequencies of the words A and B, as shown in figure 4a, that is, #(A) = 7, #(B) = 19 and #(A and B) = 4, we obtain the association from A to B as #(A and B) / #(A) = 4 / (7+1) = 0.5 and the association from B to A as #(A and B) / #(B) = 4 / (19+1) = 0.2.


Fig. 4. (a): Frequencies of words A and B, (b): Resulting association between A and B

As an example of what this means of deducing associations between words might lead to, consider the following small section of a semantic network, derived from a real world database used in the experiment:


Fig. 5. A small section of the network of associations derived from the bibliographic database in the experiment

The approach described in this paper is, as far as softening of words as query constraints are concerned, solely based on semantic networks as described above. So an important question is of course: What are these kind of associations and how can their meaning be described?

The intention is to capture relations that resemble aspects of human associations. As compared to the other extreme to achieve this; knowledge acquisition by interviewing an expert in the domain in question, one major drawback appears to be obvious. Words that are closely related by association may also be closely related by meaning, as for instance parent and mother. When two words are closely related by meaning the typical description will include only one of these words, and therefore we obtain no or only a weak association between the two words in this case.

However, since our semantic networks are based only on the set of attached keywords and since each word in this set is carefully chosen to be with a distinct meaning, this drawback have only little influence on the quality of the resulting network.

Overall fuzzy evaluation of queries

The evaluation of queries applied in this approach is basically a general fuzzy evaluation. However we have deliberately chosen to narrow to a simplistic variant in the experiments carried out, because the main focus is not to obtain the best possible solution, but rather to reveal the extend to which automatically derived domain knowledge might correspond to real world human knowledge about a domain, and to indicate the power of applying such knowledge in fuzzy query evaluation. The simplistic approach is the following.

A query is restricted to be conjunctive, thus of the form

The general form of a simple query constraint Qi is Ai Fi Vi, where Ai is a database attribute, Fi is a comparison operator and Vi a value. However we shall, to keep things simple, ignore constraints other than on the attribute "keyword" using the "="-operator and then take each Qi to be just a value for keyword.

A query, such as

is evaluated under two threshold values. A threshold TA that delimits the extend to which the aggregation can be relaxed and a threshold TC that delimits the extend to which a single constraint can be relaxed. With TA=1 and TC=1 the query is interpreted as a Boolean query. Keeping TC=1 and lowering TA to say 0.75, matching objects will include those that meet only 3 out of the 4 constraints, while TA=0.5 accept 2 out of the 4 constraints. The influence of lowering TC is as follows. If we set TC=a then each constraint Vi is relaxed to be matched by the set r(Vi), listing ("matching keyword", "grade"), r(Vi) = {(Vi,1), (Vi1,s1), … (Vik,sk)}, that includes all values Vij that Vi associates to, to a grade sj>=a.

Thus for the query

we get, when referring to figure 5 with TC=0.8

while TC=0.6 leads to

We use as mentioned average for aggregation, thus, when still referring to figure 5, we get objects satisfying Q to the grades:

	Objects including one of the subsets as keywords	  match to grade

	{Death,Childhood}	                                        1.00
	{Death,Children} {Grief,Childhood}	                        0.90
	{Grief,Children} {Children,Childhood}	                        0.85
	{Parents,Childhood}		                                0.80
	{Death},{Childhood}	                                        0.50
	{Grief},{Children}		                                0.45

Fig. 6.  Average aggregation over objects described by keywords 
including one of the subsets shown

Now setting thresholds TA = 1 and TC = 1 is the conventional Boolean query and results only in objects including {Death,Childhood} as keywords. Keeping TA=1 and lowering TC does not change anything, since TA delimits the aggregation, which is average and thus falls below 1, when one constraint is satisfied below 1. Keeping TC=1 and setting TA=0.5 corresponds to the simple weakening principle described above and adds to the answer objects that include either {Death} or {Childhood} as keywords. Setting TA=0.85 and setting TC=0.7 or lower leads to an answer with objects including as keywords either of the subsets listed in the 3 first lines of figure 6.

To give an impression of how this works in practice an answer to the query Q = Death and Childhood is shown in figure 7 below. The query is evaluated on the about 6000 objects database, which is also the basis for the fuzzy semantic network shown in figure 5. Without relaxing the query, thus with TA=1 and TC=1, the answer is empty. For the answer shown the chosen thresholds are TA=0.80 and TC=0.80. Only titles and query grades for the books are listed.

1	0.95	Only a broken heart knows
2	0.95	Baby's death – "life's always right"
3	0.95	We have lost a child
4	0.95	Loosing a child
5	0.95	Cot death
6	0.95	When parents die
7	0.95	To say good-bye
8	0.95	We have lost a child
9	0.95	A year without Stig
10	0.95	Taking leave
11	0.95	Grief and care in school
12	0.89	Can I die during the night 
13	0.89	Children's grief
14	0.89	Children and grief 

Fig. 7.  The answer to the query Q = Death and Childhood, when evaluated on the database in the experiment.

We should emphasize that the database used in the experiment is not fake. The content is descriptions of "real world" books on issues related to children, school and education. All descriptions are made by cataloguers, that have carefully examined the books and chosen distinct, describing keywords from a list of controlled keywords and according to generally accepted international guidelines (ISO 2788 and ISO 5963).

The resulting network is thus the implicit work of the cataloguers.

Other methods considered

As mentioned the approach described above is a result from a project on “Flexible querying to DanBib” performed in cooperation between the Danish Library Centre and Roskilde University. Within this ongoing project a range of methods have been suggested of which, until now, only the above approach have been tried out on real bibliographic data.

Of other ideas under current discussion we can mention the following.

First of all we consider domain knowledge of different forms as basis for advanced query processing against the bibliographic database. Fuzzy semantic networks as exemplified above may be complemented with thesaurus structures that capture narrower and broader terms and with structures that cluster synonyms for uniform interpretations in query specifications. Further we intend to include morphological conversions of terms to the standard forms used in the chosen vocabularies. We are doing experiments with semantic networks based also on titles, notes and authors from bibliographic objects and especially for networks based on these attributes, the handling of synonyms and morphological conversions are needed.

As a second type of knowledge to be applied as basis for advanced query processing we are considering users profiles. A user profile represent what the system knows about the user and may be applied by the system to customize the answer. The profile may in part be given by the user and in part be adapted by the system from the users behavior when using the system. A profile can include knowledge about topics of interest, preferred settings of for instance display formats and of default constraints for queries, measures for determination of relevance and assumptions that becomes interesting when violated.

Based on the associations drawn from data catalogued under two or more systems we believe to be able to construct cross system relations, that represents vague, but useful conversions among the systems. Thus when a satisfactory large data set is available we can build a concordance for the classification systems used. It is in practice, by no means, possible to construct a complete concordance, but we should stress here that the result of an automatically generated concordance must be expected to be "more incomplete" than what could be expected as a result of the work of experts on the systems involved in he concordance. Even so, an automatically generated concordance can be used as a guide and, as maybe the most important property, when the concordance is available, keywords from one system can be used in queries to data that is not catalogued in that system, because translations can be performed based on the available concordance.

As a further refinement of query specifications we are experimenting with importance weighting of individual constraints in the compound query. While importance of course are annoying when it is difficult or even irrelevant to decide on these, they can be very helpful in cases where for instance a few constraints are almost ultimate while a number of additional constraints may be quite relevant to describe the need, but only when they can be subject for relaxation.

To handle answers enclosing to many objects we suggests methods clustering to groups, each representing a topic. Two approaches are considered here, both based on cataloguers keywords attached to descriptions of objects. One approach is based on generalization to broader terms from the union of keywords in the answers, the other is based on eliciting the most frequently associated words, also from the union of keywords.

Finally we can mention that we are experimenting with generation of explanations of small or empty answers to queries. Such explanations could be given as a guide to the user and include for instance information, concerning the posed query, about single constraints that are unsatisfied or sets of constraints that conflicts in the current state of the database.


The approach described above was implemented in a www-based environment as a stand-alone test-model based on an Oracle-database for handling domain knowledge and bibliographic data and evaluated with a range of different databases of small to medium size (5000-100000 objects). In conclusion of this experiments it was decided to implement the approach as an interface to the DanBib union catalogue, still however as a prototype-system. This task is expected to be finished during summer 1997.

Summary and concluding remarks

We have presented above ideas and preliminary results from an ongoing project on introducing flexible interpretation of queries to the on-line catalogue DanBib. The general idea is to apply fuzzy query evaluation based on domain knowledge derived from the database state, by mining for "statistical associations".

The functions used in this process are deliberately chosen as simple and naive, partly because we want the principles to be as easily comprehensible as possible, and partly because further research has to be done to reveal and identify properties that make functions suitable and more accurate.

Even so, we conclude that the derived associations in general appears to be meaningful and that when used in the described processing of queries, the associations leads to meaningful and quite useful extensions of answers to queries.

The derived associations are based on cataloguers keywords and is therefore a special indirect result of the cataloguing process, as a kind of inversion of the cataloguing data.

The keywords are specifically chosen as “descriptors” from a continuously controlled vocabulary on approved words, that have a very low degree of ambiguity (most words/concepts have unique meaning).

We must give the credit for the useful derived associations to the high quality of the available cataloguing data.


Andreasen T.: Dynamic Conditions, Datalogiske Skrifter, nr. 50, 1994, Roskilde University

Andreasen T.: On flexible query answering from combined cooperative and fuzzy approaches. in Proc. VI IFSA World Congress, Sao Paulo, Brazil, 1995.

Andreasen T. and Pivert O.: Improving answers to failing fuzzy queries. in Proc. VI IFSA World Congress, Sao Paulo, Brazil, 1995.

Andreasen T. and Pivert O.: On the weakening of fuzzy relational queries. in Proceedings ISMIS '94, EIGHT INTERNATIONAL SYMPOSIUM ON METHODOLOGIES FOR INTELLIGENT SYSTEMS. 1994. Charlotte, North Carolina: Springer Verlag, Lecture Notes in Artificial Intelligence.

Christiansen H., Larsen H. L. and Andreasen T.: Proceedings of the second workshop on Flexible Query-Answering Systems. Datalogiske Skrifter no. 62. 1996, Roskilde University.

Emneord på bøger og artikler: indekseringsregler. Version 2.0. Dansk BiblioteksCenter, 1996 (internal guidelines)

Friis-Hansen J. B.: Vejledning i indeksering (informal translation of ISO 5963:1985)

ISO 5963:1985 Documentation -- Methods for examining documents, determining their subjects, and selecting indexing terms

ISO 8777:1993 Information and documentation -- Commands for interactive text searching

ISO/DIS 2788:1986 Documentation -- Guidelines for the establishment and development of monolingual thesauri

Klir G. J. and Volger T. A.: Fuzzy Sets, Uncertainty, and Information. 1988, Printice Hall.

Larsen H. L.: Recognition of implicit concepts in unstructured queries. Proc. VI IFSA World Congress, Sao Paulo, Brazil, 1995, Vol. 1, pp. 233-236.

Larsen H. L. and Andreasen T.: Proceedings of the first workshop on Flexible Query-Answering Systems. Datalogiske Skrifter no. 58. 1995, Roskilde University.

Larsen H. L. and Yager R. R.: An approach to customized end user views in information retrieval. In J. Kapcprzyk and M. Fredrissi, Eds., Multiperson Decision Making Using Fuzzy-sets and Possibility Theory, Kluwer Academic Publishers, pp.128–139, 1990.

Larsen H. L. and Yager R. R.: Query Fuzzification for Internet Information retreival. Datalogiske Skrifter No. 60, 1996. To appear in Fuzzy-Set methods in Information Engineering: A Guided Tour of Applications, John Wile & Sons.

Larsen H. L. and Yager R. R.: The use of fuzzy-relational thesauri for classificatory problem solving in information retrieval and expert systems. IEEE J.on System, Man, and Cybernetics 23(1):31–41, 1993.