As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites

This old website and all of its content will stay on as archive – http://archive.ifla.org

64th IFLA Conference Logo

64th IFLA General Conference
August 16 - August 21, 1998

Code Number: 114-134-E
Division Number: VI.
Professional Group: Statistics
Joint Meeting with: -
Meeting Number: 134.
Simultaneous Interpretation: Yes

A Decade of Experience in Measuring Academic Database Use

Harry East

Database Resources Research Group
The City University
London, United Kingdom

Abstract:

The paper traces the changes in supply of databases and their services to the UK academic community and the necessity of modifying the appropriate methods of data collection. The commercial suppliers are restrictive in the data made available: it has been necessary to turn to user institutions for the appropriate data. Academic networking make statistics of use more accessible, but the protection of the user's privacy brings additional problems.

Paper

Background

Although the Group I belong to is not strictly speaking a statistical unit, we have nevertheless been involved for over ten years in collecting statistical data and assessing methods for doing so. For the past six years we have concentrated entirely on the British higher education system, assessing how academic institutions have been incorporating externally-produced electronic reference databases into their service activities. This was not, however, our original starting point: initially we wanted to get a broad picture of the supply side of the electronic component of the information industry. We soon realised that this was too large and difficult a task. In 1988 one experienced observer of the field wrote:

I know of no other industry which is quite so difficult to research. You can buy a plastic kit of the US Air Force's most secret plane, and yet cannot get any information on the revenues, users, profits, ownership, computer installations etc of any European online database host, with the notable exception of West Germany. Anyone trying to gauge the size and growth of the industry has to fall back on various market research reports. Some of these are appalling catalogues of errors and omissions, and few begin to get close to the 'truth'.[1]

In 1988 we were concerned almost exclusively with the academic market for online services supplied by commercial online hosts. We wanted an indicator for "use". Defining what exactly is a measure of "use" is almost a philosophical problem. Ultimately we settled for an indicator proposed by Schwuchow who concluded that the best is one based on revenues in that it is :

the only indicator which reduces the size of the market for online services . to a common denominator.[2]

Expenditure on commercial online services

In fact Martha Williams in the USA had described in 1985 her method for capturing use and revenue data for the online database industry [3].Her data were obtained by the analysis of copies of invoices sent by services suppliers to a representative panel of user organisations. We adopted this method, with a few modifications, for UK higher education by setting up representative panels of universities and polytechnics from whom we received regular copies of their invoices. By capturing - and re-keyboarding (a very labour intensive task) - selected data from these copies, we were able to obtain annual summaries of expenditure by host and by individual databases. Fig 1 illustrates some of our condensed tables for universities in years 1988 and 1990.

Table

In the event, our measurement of commercial online service supply proved to be of diminishing relevance. Information services are traditionally regarded as a free public good in academia. Commercial hosts charged mainly on a "pay as you go" sessional basis, and at the outset most academic access was mediated by librarians. This method of payment created problems for library managers: (a) were such services to be regarded as value-added and therefore chargeable back to the user or his department and (b) how did libraries deal with the unpredictability of demand and concomitant cost? The availability of databases in the CD-ROM format significantly modified the nature of demand. Fig. 2 illustrates how academic expenditure shifted to the CD-ROM medium from the commercial online hosts. It also persuaded us to discontinue eventually the labour-intensive task of analysing online invoices: it was no longer cost effective to do so.

Table

The advent of CD-ROMs

Libraries were able to acquire databases in the CD-ROM format through a fixed annual subscription. This practice fitted more comfortably with the traditional library accounting and satisfied a desire for predictable expenditure. Moreover, CD-ROM technology relieved the librarians from much of the task of mediation: endusers could do their own searches, initially in the library and later by accessing the databases elsewhere on the campus through local area networks. Searching returned to the enduser.

By 1989 we had begun to collect data on CD-ROM acquisitions, using the same representative panels, but asking the respondents to complete a questionnaire in an annual survey. Initially we collected only how much institutions were spending on the medium and which titles they were buying. As the take-up of CD-ROMs progressed, we also requested information on which titles were stored on hard disc, which were available through campus networks and the title cancellation rate. Increasingly libraries have CD-ROM holdings that are now financed externally from departmental funds (ie not acquired directly through general library funds), and we collect this information also. <> Fig 3 illustrates the number of current CD-ROM holdings in 1996 and the extent to which they are networked [4]

Table

The annual questionnaire surveys present an opportunity to get more general indications from service librarians as to how the balance of different forms of media delivery is modifying in the light of technological change. Recent evidence suggests that there may be a move towards subscription to online services supplied through the World Wide Web, though it is too early to suggest a definite trend in this direction.

Networked services from university datacentres.

A very significant development came with the availability of databases (of commercial origin) from university datacentres through the Joint Academic Network (JANET). In 1992, the government higher education funding councils - which are responsible for public money invested in universities - created a Joint Information Systems Committee (JISC). This body is responsible for overall academic computer developments and networking actives. One of the early decisions of JISC was to invest in database acquisition for the benefit of the academic community as a whole [5]. The first resource to be acquired was the ISI (Citation Indexes) database: this was installed at a centre in the University of Bath. Access to this database (and all subsequent database acquisitions) was possible to users in all universities via the JANET network.

These developments produced a fundamental change in one aspect of our statistical activities. Each JISC acquisition was made through a single negotiation with the database owner and on the assumption that the information would be available to any bona fide member of the academic community. The licence fee paid reflected, to some extent, predicted levels of use. Individual universities (ie JANET sites) were required to pay a fixed annual subscription which was the same for each institution, regardless of the level of use. (1) The JISC policy was that access to the service would be "free to endusers at the point of use".

The effect of this strategy was that, provided that a university paid its annual subscription, any of its users (both staff and students) who had a terminal connected to the JANET network could have unlimited access to a particular database, free of any personal charge. (2) Relieved from the pay-as-you-use restriction, it was now possible to get a clearer indication of actual use. For commercial online services, and for subscriptions to agencies marketing CD-ROMS, our indicators were based solely on the revenues due to the supplier. This is arguably a somewhat imperfect measure of actual use (though it is, of course, of significance to library budget holders).

There are now three major datacentres (in Bath, Manchester and Edinburgh) providing access to databases that have been acquired through JISC negotiation. Each produces regular service statistics of use. The most significant measure is the number of connects made to the database service. (The amount of time spent connected to the service is also a useful indicator. We have shown that, in practice, there is a high correlation between the number of log-ins and the mean time spent during connection.). Also recorded are the number of connections made for each individual site: in this way the volume of use per university is measurable.

Each datacentre generates usage statistics of its services. For those of us who study the statistics of service growth, this is a considerable improvement both in accessibility and consistency of the data. Fig. 3 gives an example of statistics derived for networked usage of the International Bibliography of the Social Sciences (IBSS). In this table the usage attributed to various academic departments over two years is indicated in terms of the number of sessions conducted, the total elapsed time of the connections, the number of retrieved references displayed and the number of output articles selected.

Such data, centrally collected and readily retrievable in computer readable form, is a distinct improvement on our earlier methods of gathering commercial host statistics, which succeeded almost by stealth. Unfortunately, more detailed information on the nature of the users (whether they are students, researchers or teaching staff, and what are their stated subject interests) is no easier. To obtain such information, it is necessary to sample users by a questionnaire. Fortunately the nature of the centralised system allows us to sample in real time. This is further examined in the section called "identifying users"

Table

Interpreting statistical findings

Statistics often lead to further questions. The ISI database has been an outstanding success, measured by the number of institutions that register for access to it, and its volume of use. There are 107 registered sites making collectively, on average, in excess of 9,000 accesses per day. Examination of the level of use by individual university sites shows, not surprisingly, considerable variations. High use of the service seems to be a feature of older, well established institutions. The database itself has a wide range of coverage, mostly consisting of the journal literature relating to the sciences, humanities and social sciences, and particularly to those journals publishing highly cited research papers. It seemed reasonable to assume that heavy use was related to universities with a pre-eminent record in research.

In the United Kingdom, universities are evaluated approximately every five years in terms of their research outputs: the so-called "research assessment exercise" (RAE) (3). We correlated the extent of usage of the ISI database (ie number of sessions) of each university with the universities total RAE score. Fig. 4 indicates the high correlation obtained. [6]

One of the significant points of this finding was that the uniform - and relatively high - annual subscription for this particular database was favouring research-oriented universities. Universities not in this group might tend to not renew their subscriptions when they considered their relatively low level of use.

Identifying users

The operators and evaluators of any database service, and particularly one that is widely networked, will want to identify the characteristics of its users and where they are. Unfortunately there is a contrary and wholly understandable desire on the part of users that their privacy is not compromised. For each JANET database service, each registered user (or group of users) is required to provide evidence that he is authorised to access it: the conventional user identification codes (userid) plus a password are required for entry. The password contains no meaningful data about the user; the userids contain codes which tell something about the user, but they are both limited and variable.

Each userid has a three-letter code which indicates the site at which the user is registered. (For example, mine contains the letters "CIT", which indicates that I am a City University user.) Depending on the local site administrator, the remaining five digits of the code may contain additional personal information. In some cases, the code is used to indicate the department to which the user belongs (eg PHY for a physics department) and in rarer cases the status of the user is also identified (eg U for undergraduate, S for Staff). But there is no unified approach across the various sites.

Thus, it is easy to tabulate the level of usage at a site (and much more about the nature of the search conducted) but not the characteristics of the user. In early analyses we attempted to use what data was available from userids, by approaching each site administrator in turn and asking what (if any) was the local "translation" of the codes. This was a tedious process, also unsatisfactory in yielding only a sample (and not a particularly random sample) of user information. Moreover, useful data from a particular site rapidly became out of date, due to the inevitable turnover in the academic community. Nevertheless, we were obliged to use this approach in early analyses. This was how the figures were obtained in the examination of networked usage of the International Bibliography of the Social Sciences (IBSS): see Fig 3.

In a more recent survey we tried an alternative approach, which has proved to be much more effective. It was tested experimentally on the Bath University ISI service. We designed a short questionnaire requesting information from the user. This was loaded into the system software to appear on the terminal screen of a (random) sample of users as they logged in to the system. It was designed for the web version of the system, where the user is only required to "point and click" using a mouse. The design used radio buttons, check boxes and pull-down menus. This was to reduce size, control ways in which questions could be answered and generally to maintain clarity. Users were, of course, given the opportunity to say "no" to our questionnaire and depart immediately to their searching. Nevertheless 47% of the questionnaires were completed which is a fairly respectable return for a survey of this type.

We ran the questionnaire, without a break, for three weeks late last year (1997) and obtained a convincing "profile" of users consisting of data about their age and sex, status (eg undergraduate to professor) and area of subject specialisation. Additional data was gained on the location of the terminal they were using (eg in a library, computer centre, a private office or at home) and their frequency of use of the service. Nobody, of course, was asked to identify themselves by name.

Fig 5 is a small sample of the rich returns we were able to obtain through this technique. This particular example was concerned with the findings of the above "location of use" question. This indicated that very nearly half of the searchers were accessing the database from terminals other than those located in "public space" locations. In other words, nearly half of the users of this database were not going to the library or computer rooms as a place to access this type of information; the majority use equipment installed in private offices or laboratories. Not surprisingly, it is the undergraduate students who resort most to public access locations.

Table

Potential user populations

As the availability of centrally acquired databases increases, it is not only useful to know who is accessing them, but also the population of potential users of a particular database. In the survey of the ISI database users described above, data from the Higher Education Statistics Agency - which provides a breakdown of the number of students registered for a particular subject - was compared with the actual number of student users. Fig. 6 shows the result of the comparison.

The ISI database, while covering a wide range of academic interests, is particularly strong in the scientific literature. The high level of use of both physical and biological sciences is consistent with this. In future research we hope to look at each database in turn and, depending on its subject scope, determine the extent to which the service is reaching its target population. In other words, we want to be able to estimate the degree of penetration of a particular service to different sectors of the academic community

Future prospects

One of the developments associated with the World Wide Web is the availability of the full test journal articles from their publishers direct to the end user. Such services are already in place and being further developed within the academic community. In the UK they are likely to be mediated by the library. This adds a new chapter to their portfolio of their electronic services and additional challenges for statistical assessment .�� Examining the tangible stock, and its activity, of a traditional library was reasonably straightforward. In the expanding resources of the "virtual" library, different measures are needed. Changes in the mode of technological availability have been rapid in the last decade, each innovation necessitating a modification of approach. There is little reason to assume that this process will not continue. One of the main dilemmas for the future will be the need to balance the users desire for anonymity with the service suppliers need for feedback on how the service is used.

References

White, MS. Don't you know, or have you something to hide? Information World Review, No 24, March 1988, 16.
Schwuchow, Werner. The development of the international market for online information services. 44th FID Conference and Congress, August 28 - September 1, 1988 [Helsinki]. Participants' edition, Part 3, 166-178.
Williams, Martha E. Use and revenue data for the online database industry. Online Review, 9(3) 1985, 205-210
Leach, Kathryn. A constant demand for new services: a survey of CDROM acquisitions in UK academic institutions, 1996. The Database Resources Research Group, The City University, 1998
http://www.soi.city.ac.uk/informatics/is/drrg/survey 98/index.html
Law, Derek. The development of a national policy for dataset provision in the UK: an historical perspective. Journal of information networking, 1(2), 1994, 103-116.
Ajibade, Badekale and East, Harry. Research assessment exercise and usage of BIDS-ISI, The Database Resources Research Group, The City University, 1997.
http://www.soi.city.ac.uk/informatics/is/drrg/rae97/rae97.html

Endnotes:

Sites were also required to commit themselves to a five-year agreement for access to a particular database. This has been one of the least popular conditions as far as the libraries are concerned.
In almost all cases, annual subscriptions are paid from library budgets.
Funding councils use the RAE scores to determine their size of awards to individual universities for future research funding.

64th IFLA General Conference August 16 - August 21, 1998

A Decade of Experience in Measuring Academic Database Use

Abstract:

Paper

Background

Expenditure on commercial online services

The advent of CD-ROMs

Networked services from university datacentres.

Interpreting statistical findings

Identifying users

Potential user populations

Future prospects

References

Endnotes:

64th IFLA General Conference
August 16 - August 21, 1998