Attachment J Interoperability and Conformance Issues in the Development and Implementation of the Government Information Locator Service (GILS) Cecilia M. Preston and Clifford A. Lynch May 22, 1994 INTRODUCTION The Government Information Locator Service (GILS), as described by Eliot Christian [Christian, 1994], defines a initiative to implement a distributed system of autonomous, cooperating database servers attached to the Internet that provide locators for federal government information resources. Users of the GILS locate government information by retrieving records from these database servers; such searching is accomplished by client software that may run locally on workstations at GILS user sites or on host machines accessible to GILS users through the Internet. Government agencies participating in the GILS program will develop or acquire appropriate server software conforming to the GILS profile [McClure & Moen, 1994a] which will make their locator records accessible through the GILS. This server software will typically run on federal agency and other federal government computers.[1] The client software base used to access these GILS databases will be more heterogeneous in nature and diverse in origin. It will include client software designed specifically to provide access to the GILS; such client software, which might be developed either by the private sector and/or the federal government, is likely to be the most capable, incorporating the ability to navigate transparently among multiple GILS locator databases on behalf of the user, as well as perhaps the ability to connect users to at least some resources once located through a GILS locator. Because the GILS system is based on the American National Standards Institute (ANSI)/National Information Standards Organization (NISO) Z39.50 protocol for information retrieval [NISO Z39.50-1992], other client software already in place or under development for other purposes (for example, clients developed for access to the Wide Area Information Server (WAIS) system [Kahle et. al., 1992], or clients developed to communicate with bibliographic databases)[2] should be able to provide at least some access to GILS locator resources immediately even without explicit knowledge of the GILS profile. It is hoped that over time the capabilities of this already- existing software base will be upgraded to provide more extensive support of the GILS through explicit inclusion of support for features defined in the GILS profile. As part of the GILS program, technical work was carried out by a group of experts led by William Moen and Charles McClure of Syracuse University to develop technical specifications for the GILS. This work resulted in a formal applications profile [McClure & Moen, 1994a], a standards document that is making its way through the National Institute of Standards and Technology's OSI Implementor's Workshop (OIW) under the auspices of the OIW Library Automation group and will likely ultimately become a Federal Information Processing Standard (FIPS). The applications profile is supplemented by several technical papers [McClure & Moen, 1993; McClure & Moen 1994b]. This effort focused on: o The development of an architectural model for the GILS system and definition of participant roles and responsibilities for record creation and propagation within the GILS. This was based on the functional specifications defined in [Christian 1994] and also drew from prior work on locator systems such as [McClure et al., 1992]. o The definition of data elements, interchange formats, and semantics for the locator records that form the contents of the GILS. o The definition of the computer-to-computer communications protocols that are used to search and retrieve records from GILS servers. GILS uses a layered suite of protocols. At the lower layers, GILS uses the standard Transmission Control Protocol/Internet Protocol (TCP/IP) that is ubiquitous throughout the Internet; on top of TCP/IP the GILS employs Z39.50, the ANSI/NISO standard for computer-to-computer information retrieval.[3 ] Z39.50 is a very general purpose protocol, and Z39.50 Applications Profile (essentially, a specialization or restriction of Z39.50) is used to define the specific Z39.50 functions and parameters that are required to implement the GILS. GILS clients and servers will be developed by many different organizations; a marketplace (in the broad sense of both public domain and commercial server and client software) that explicitly supports the GILS profile is expected to come into being as the GILS initiative moves forward within the federal government. In addition, a specific goal of GILS design is that GILS servers be usable, at least at a limited level, by a substantial base of already existing and deployed client software that supports the Z39.50 protocol; this will greatly enhance the availability of information in the GILS for the general public, particularly in the early stages of the initiative. Because of these objectives, the ability of a diverse base of clients and servers to work together, or interoperate, successfully is a central concern in the development of the GILS specifications and the implementation of the GILS. If agencies implementing GILS servers and users that wish to employ various GILS software clients to access them cannot have a reasonable expectation of such interoperability[4], it is likely that the GILS enterprise will fail. Further, if non-GILS Z39.50 clients cannot also interoperate successfully with GILS servers, the impact of the GILS effort in facilitating access to government information resources will be lessened. This paper discusses various approaches that can be taken to increase the likelihood of successful interoperation among GILS servers and a variety of clients, including both clients designed to implement the GILS profile and other Z39.50 based clients. It includes both a discussion of the theoretical frameworks employed by standards development organizations to address interoperability considerations and also (and perhaps more importantly) approaches and experience by various implementor communities, such as the Z39.50 implementors, in the development of interoperable distributed systems and applications. The emphasis here, however, is not on the technologies and methodologies of interoperability or conformance testing, but rather what these approaches can contribute to the success of the GILS effort. In this connection, a number of additional means of promoting the development of interoperable systems, such as testbeds and reference implementations, are also discussed. It is vital to recognize that GILS is a system specification, not just a specification for a Z39.50 applications profile. In this sense the use of the term "profile" may be somewhat confusing or misleading; GILS goes far beyond the usual sort of profile covered by an International Standardized Profile (ISP), for example (see [Ledrick & Spring, 1990] for a discussion of ISPs). In our view, this is a real strength of the GILS technical work: it is focused on providing a comprehensive, pragmatic blueprint for real interoperability that also takes account of the context of a large installed base of related systems. The GILS specification weaves together the use of several standards and specifies how they interrelate. The GILS specifications also address record content and its meaning. Because of this broad scope, this paper includes a section on issues related to information semantics, interoperability, and quality as well as the coverage of system and protocol interoperability considerations. CONFORMANCE TESTING AND INTEROPERABILITY TESTING Basically there are two approaches to testing implementations to ensure that they work effectively together. One approach is conformance testing, in which a single implementation is compared to the standard to be sure that the implementation does what the standard specifies; the theory behind conformance testing is that if implementations all conform to the abstract standard they should interoperate with each other, although in practice this is not necessarily the case, as discussed below. The other approach is interoperability testing, in which two or more implementations are tested directly against each other, with the standard used as a primarily as a reference to adjudicate problems and incompatibilities, and secondarily as a guide to the functions to be tested and the general behavior to be expected. The objectives and expected results of these two types of testing are somewhat different. As Herbert Bertine and others define these differences: "Protocol specifications are used to develop products and services. Conformance testing verifies that these products and services comply with their specifications. Interoperability testing supplements conformance testing by verifying the end-to-end behavior of specified complex configurations". [Bertine et al.., 1990] There is considerable debate about the value of conformance testing as a means of achieving interoperability. To some extent the disagreement has followed cultural lines, with the Internet community emphasizing a culture of running code and interoperability testing, and the traditional formal standards community (including the Open Systems Interconnection (OSI) developers operating within the framework of the national standards bodies such as ANSI and NISO in the United States and the International Organization for Standardization (ISO) internationally) and some academics advocating conformance testing as the primary approach, with interoperability testing as a relatively less important secondary activity. "There is one school of thought which believes that in order to have true interoperability it is necessary to test each OSI implementation to ensure compliance to the appropriate standards and profiles... "Many feel that conformance testing is not very effective. From a theoretical perspective, program verification is an unsolved problem, and it is simply beyond that state-of-the- art to produce programs that can guarantee that other programs are correct. There are plenty of instances of two implementations, each able to pass a conformance test, but not being able to interoperate with each other. Conformance testing therefore inspires little confidence in the user community. In contrast, conformance testing potentially provides benefits to an implementor during the early stages of development Ñ as a simple check. "From a practical perspective, conformance testing is probably the wrong approach. Consider: when an implementation is put under test, in effect it must interoperate with the test system. A conformance test is nothing more than an interoperability test with another implementation, the test system. The test system isn't placed in the user's environment Ñ it isn't end-user equipment. The user doesn't buy test systems, the user buys systems to get work done. Interoperability testing against equipment the user will never buy Ñ what's the point?... "Interoperability testing is painful, but it is necessary. It is the only guarantee of working open systems." [Rose, 1990, p.588] The recent Federal Internetworking Requirements Panel (FIRP) report [FIRP, 1994] has also endorsed the more pragmatically oriented approach of interoperability testing: "However, the Panel is concerned that testing should be pragmatic (focused on demonstrating real interoperability), rather than theoretical (focused on conformance to specifications). Interoperability testing may consist of multivendor interoperability testing or interoperability testing against a reference implementation." [FIRP, 1994, section 4.5] The FIRP report goes further, however, and begins to express a desire to attempt to link definitions of testing procedures, the performance of these procedures, and the interpretation, documentation and publication of their results to the procurement process for the federal government: "Conformance testing can be a very valuable tool to the developer, but can be difficult and expensive to develop and execute definitively with rapidly evolving and integrated products. Instead, pragmatic tests that prove real world interoperability are required, to decrease the overall costs of testing and decrease the time to deliver tested products to market. On the other hand, once multivendor interoperability testing becomes the agreed criteria, agencies should insist on it, both through formal proof that products have indeed been tested against other products and reference implementations, as well as penalty clauses in system contracts for products that later prove not to be interoperable as advertised. NIST must help here by determining how to identify good, pragmatic interoperability tests and the appropriate procurement language to use them. Agency procurements should give preference to products which have demonstrated interoperability in accordance with the NIST program. "For conformance or interoperability testing, test suites and approved means of testing must be available early, preferably at the same time as the standard or profile. Pragmatic testing criteria should be defined for all standards, including IPS and proprietary protocols. The cost of required testing must be proportionate to the value, and registers, of all tested products should be available in a publicly accessible on-line database." [FIRP, 1994, section 4.5] The reader of the passage above should recognize that the FIRP report is specifying directions for new research and development efforts; the current state of the art in defining such tests is quite limited. Immediately after stating the objectives above, the FIRP report again returns to identify (and to a great extent to endorse) current pragmatic approaches such as testbeds (discussed in detail later in this paper): "Recently, there have been a number of multi-vendor interoperability testing groups formed, often sponsored by an independent organization, and with significant end user participation (e.g., the FDDI interoperability test lab at the University of New Hampshire, the OSPF interoperability group). These groups focus on testing multiple vendors' implementations against each other, looking for bugs and areas of less than desirable robustness, etc. The Panel views these sorts of efforts as a step in the right direction by the vendor community. These types of practical testing efforts can improve the quality of real world implementations fielded by commercial vendors, especially when large end user organizations can also participate and bring test scenarios to the table." [FIRP, 1994, section 4.5] This paper will examine the current state-of-the-art in both interoperability testing and conformance testing. Interoperability testing will be explored first; while it is a much broader perspective than pure conformance testing, current work in interoperability testing (and its limitations) provides a helpful context for understanding the even more serious limitations to the conformance testing approach. Note that both conformance and interoperability testing can be performed by a number of different groups, including vendors and developers, system users (perhaps as part of a procurement process) or by neutral third parties that may offer some form of product certification or simply report test results back to the user and/or vendor communities, as suggested by the quotation from the FIRP report above. INTEROPERABILITY TESTING AND ITS LIMITATIONS While a precise definition of interoperability is somewhat elusive, functionally the meaning is clear: components of a system such as GILS communicate with one another effectively, correctly and provide the expected services to the user of a GILS client. In a very real sense, users don't care why components of a system like GILS fail to interoperate, or what component is at fault; while there can be many causes for failure, a successfully functioning operational system is clearly demonstrable to users. Further, users will view GILS as a totality; while there are a large number of standards and agreements involved in making the GILS work (each with its conformance and interoperability issues), users are only concerned that the entire constellation of standards, agreements and system components interoperate together effectively. While interoperability testing among implementors can address these concerns it is essential to recognize that they transcend individual standards, and hence the work that standards developers do on conformance and interoperability for individual specific standards cannot, by definition, address the full range of interoperability concerns that are central to the success of a system such as GILS. Several other points about interoperability are of vital importance and need to be emphasized here; they all relate to the limitations of interoperability as a guarantee of developing a successful system. Interoperability is a necessary but not a sufficient criteria for successful development of distributed systems, and standards alone are not a sufficient basis for the development of a successful systems. o Performance is a separate issue from interoperability. Systems may successfully interoperate but still suffer from devastating performance problems. o Just because systems interoperate does not necessarily mean that they perform the functions that the user needs; it is entirely possible to identify and/or define standards and then develop a range of interoperable systems that implement these standards, only to discover that the system specification fails to successfully solve the problems or provide the functions that it was intended to provide due to errors in problem definition or solution specifications. o Interoperability is concerned with communication between distributed computing systems; it does not speak to implementation issues such as user interface design or reliability of a given implementation. Interoperable implementations of a set of standards may still be rejected by their user community simply because they are badly implemented, hard to use or unreliable. Issues such as how well a given implementation provides help and diagnostic facilities to its users are not interoperability questions usually, but they are critical acceptance factors. TESTBEDS AS AN APPROACH TO INTEROPERABILITY TESTING One approach that has been used successfully in distributed systems implementation is the testbed. Here a focused effort is made over a fairly short period of time to develop a number of implementations based on a set of standards that define a distributed system, and to experiment with using these implementations to interoperate with each other. It is important to recognize that while one of the primary purposes of a testbed is to explore interoperability issues, a testbed typically takes on a broader role as a large scale experimental prototype for validating a system design. Note that at least in our view testbeds imply active participation by software developers and vendors and standards developers, plus perhaps neutral organizers and facilitators and representatives of the user community; the focus is on system development and testing rather than simple validation and is perhaps most appropriate for products implementing relatively new standards. This is slightly different from the "multivendor interoperability testing groups" discussed by the FIRP report, which tend to be emphasize testing by third parties rather than the active participation of product and standards developers. Testbeds were used in the early development of the TCP/IP protocol itself and as an aid to the development of mature, high quality implementation of this protocol; at that time they were called "connect-a-thons". The Z39.50 Interoperability Testbed (ZIT), sponsored by the Coalition for Networked Information (CNI) and involving about ten implementors of the Z39.50 protocol, played an important role in the development of interoperable Z39.50 implementations as the standard emerged in the marketplace in the early 1990s and also in identifying problems with the standard (in terms of ambiguous language in the standard, design problems in the standard, and missing functionality in the standard that was vital for real- world deployment of systems based on the standard). Testbeds can be difficult to manage, however; they require substantial ongoing commitments of resources and energy on the part of the participants. Testbeds also call for a considerable level of trust and goodwill among the participants (particularly when the implementations involved include what will become commercial products, or are prototypes for potential commercial products from vendors that will later compete with one another in the marketplace, as opposed to experimental implementations from the research and education community) because of the need to share not only information about implementation but also because this is an environment that often reveals implementation problems not only to the implementor but to all participants in the testbed. Because of these issues, successful testbeds are typically organized under the auspices of some organization that is perceived as "neutral" by the participants (in the sense that the sponsoring organization does not favor one participant over another, and is not itself involved in the development of specific products for the marketplace, although it may well benefit from the successful creation of a range of high quality market offerings). A part of the sponsoring organization's role may be diplomatic, sorting out concerns and disagreements among participants. The sponsoring organization can also play a key role in publicizing standards and showcasing interoperable implementations for a possibly skeptical or poorly informed user community; because of its "neutral" position the organizer may be more credible to the user community than the participating vendors would be alone. Because the emphasis is on implementations, testbeds lead to a "whole system" approach to testing rather than one focused on individual standards conformance or interoperability and can be very useful not only in dealing with problems directly related to a given standard but in identifying problems that arise from the interaction between different standards or at the boundaries between standards and implementor agreements often needed to produce real-world interoperating systems. They are also helpful in identifying potential performance problems, functional errors in problem specifications and protocol design, and in providing early warning of poor quality implementations. Testbeds are also valuable in providing implementors with insight into features of standards that may be problematic to implement and that can lead to unacceptable implementations, and can provide vital feedback to the standards development process, particularly if some of the standards developers are involved in the testbed process. A successful testbed becomes an incubator for the development of a shared base of engineering know-how for standards implementation. Implementations that have been proven in testbed activities tend to become robust quickly; they include provisions to detect and recover from erroneous information sent from other implementations and to provide extensive diagnostic tools (such as logging facilities) for debugging in these situations. One of the problems with protocol standards in general is that they usually don't address what to do in the case of errors such as incorrectly structured or sequenced protocol data units received from the remote system; immature or poorly designed implementations often typically assume that peer systems on the network are sending correctly coded and sequenced protocol data elements and do not include "suspicious" code to check for and recover from such errors. The system that freezes or crashes, is typically a result of a simplistic implementation encountering other implementations behaving incorrectly. This behavior is not accepted in high quality production implementations; graceful recovery from incorrect information sent from another system is vital in quality implementations (though it is not a standards issue). In a testbed environment implementations are typically exposed to a broad range of pathological behavior from other testbed participants, and thus are able to quickly achieve a high degree of robustness and maturity, which is difficult to obtain through other methods. Testbeds can also yield dividends that will facilitate the overall development of a marketplace of products implementing a standard. Implementation knowledge gained by testbed participants can be used by other implementors later to speed up development and avoid errors, if this information is appropriately captured and shared. Note that information sharing of this type can be a major policy issue in organizing and managing a testbed project, and particularly in the context of adding new members once a testbed project is underway, or relating testbed activities to those of the broader implementor community outside of the testbed group, since commercial participants who are making the investment in testbed participation may regard the implementation engineering knowledge gained as a valuable asset which they are unwilling to share freely, especially with competitors that did not choose to invest resources in testbed participation. Testbed "pioneers" may not be satisfied with simply enjoying the early implementation lead that participation in a testbed can provide, and may wish to maintain that leadership position as long as possible. Another common role of testbeds is as public demonstrations of the viability of a suite of standards that define a distributed system; they can be central not only in convincing a skeptical user community that distributed information systems can work, as discussed above, but also in helping the user community to understand the implications and operational limitations of such systems. For example, in the Z39.50 Interoperability Testbed project demonstrations provided the library community with the first real understanding of the implications of decoupling user interfaces (clients) from information servers and helped to advance consideration of the various policy, planning, and user support issues that such a decoupling raised for the library community. It should be emphasized that while there is a good deal of consensus on the value of interoperability testing, either on a case by case basis or in broader contexts such as testbeds, there is very little methodology for how such interoperability testing should be accomplished. The FIRP report, for example, exhorts the National Institute for Standards and Technology (NIST) to devote efforts to defining such methodologies; this is a research problem, although the FIRP report does not explicitly identify it as such. Essentially, the idea behind interoperability testing is simply to exercise the software and systems involved as extensively and as stressfully as possible. In some cases emphasis has been deliberately placed upon stress conditions Ñ for example, trying to send sequences to other systems that are legal within the protocol but represent boundary cases that might cause the other system to crash, with prizes to the last system left standing. In other cases, such as the Z39.50 testbed, emphasis was also placed an just understanding the behavior of interoperating systems, and the limits of effective interoperability at the applications level, such as the various ways in which different systems might interpret a given query. Ultimately, successful interoperable testing relies upon sufficient time commitments by energetic testers Ñ notably those very familiar with the standards in question, with specific implementations of the standards and also by those who represent a user perspective. This last group is difficult to enroll in a testbed project, since they don't have an obvious direct economic interest as the participating vendors do, though they are certainly members of a community with a strong interest in the successful product of the testbed project; these "end-user" representatives may require funding or other support to participate. But for applications-level standards such as Z39.50 their participation has been essential in successful testbed activities, and we believe that it will be important to engage them in any testbed that supports the GILS initiative. CONFORMANCE TESTING: LIMITATIONS AND PROBLEMS Interoperability testing is an art. It is a somewhat imprecise process that achieves its results through the active engagement of a group of implementors and other system testers sharing a common set of goals. In contrast, there has long been a desire to develop a science (or at least an engineering discipline) of standards conformance testing. The analogy to software development here is useful. Computer scientists have performed research in proofs of program correctness as a rigorous science for decades (with very limited practical results); software developers meanwhile have developed a great deal of largely anecdotal and heuristic knowledge about how to test large-scale software programs and now normally include extensive testing and quality assurance programs (much more akin to interoperability testing) as part of their development cycles. Rather more progress has been made in standards conformance testing than in the much more general (and hence complex) problem of proof of program correctness. There are still a number of major problems. For example, it is very difficult to rigorously test that an implementation conforms to a standard that it itself not rigorously specified. Thus it is fairly tractable to test areas of conformance such as correctness of state transitions where the desired behavior can be modeled in a reasonably clear, unambiguous and formal fashion, or to determine whether protocol data units generated by an implementation of a protocol are syntactically well-formed. Determining whether an implementation conforms to a specification that is only rather loosely defined in prose (as is typical of protocol semantics) is a much more intractable problem. In many key cases, it turns out that the standards are silent on the specific interpretations; for example, in Z39.50 Version 2 [Z39.50-1992] incompatibilities between implementations have arisen because of disagreements about the need for case sensitivity in the interpretation of database names and because of conflicting assumptions about the semantics of omitted or repeated attributes in search queries. This difficulty becomes more acute as one moves up the layers of a hierarchical protocol suite. At the bottom levels of the protocol hierarchy (e.g. the data link layer) one is dealing with the rather mechanistic movement and processing of bits and bytes and models such as finite state automata can very precisely define the protocol's operation. For applications layer protocols, while some parts of the standard may lend themselves to such a mechanistic definition (and thus very clear conformance testing, and also the development of test suites that examine a series of key behaviors) such as the algorithms in the Z39.50 protocol that determine how a server blocks records into protocol data units in a PRESENT RESPONSE, other parts of the standard address semantics of search processing and seem extremely resistant to abstract modeling. An excellent example of these problems is the ongoing disagreements within the Z39.50 implementor community about the extent to which it is appropriate for a server to apply liberal interpretations to attributes in queries when it does not support the precise attribute combination specified by the client. Another limitation of conformance testing has to do with the generality of the protocols being tested. Particularly at the OSI model applications layer, protocols are typically very complex and general, offering many options and choices. There is often no reason to believe that even two completely correct and conformant implementations of an applications layer protocol will interoperate in any useful way. Thus conformance testing at this protocol level may be of extremely limited value. As is the case with GILS, a protocol standard (such as Z39.50) may be further constrained by an applications profile, and conformance testing against the combination of a protocol standard and an applications profile may be more useful. In the OSI world, an additional specification called a Protocol Implementation Conformance Statement (PICS) is sometimes used both to provide further constraints that define a particular class of implementations of a protocol and to document the ways in which each implementation varies from the combination of protocol standard and applications profile. The International Organization for Standardization (ISO), which has invested a considerable effort in conformance testing methodologies and approaches, is clear about the limitations of not only the current state of the art but also the objectives of conformance testing as they view it. As stated in ISO/IEC 9646, which sets out the overall framework for specifying conformance test suites for OSI protocols and for defining the procedures to be followed during testing: "Conformance testing involves testing both the capabilities and behaviour of an implementation, and checking what is observed against both the conformance requirements in the relevant International Standards or CCITT Recommendations and what the implementor states the implementation's capabilities are. "Conformance testing does not include assessment of the performance nor the robustness or reliability of an implementation. It cannot give judgments on the physical realization of the abstract service primitives, how a system is implemented, how it provides any requested service, nor the environment of the protocol implementation. It cannot, except in an indirect way, prove anything about the logical design of the protocol itself. "The purpose of conformance testing is to increase the probability that different OSI implementations are able to interwork. However it should be borne in mind that the complexity of most protocols makes exhaustive testing impractical on both technical and economic grounds. Also, testing cannot guarantee conformance to a specification since it detects errors rather than their absence. Thus conformance to a test suite alone cannot guarantee interworking. What is does do is give confidence that an implementation has the required capabilities and that its behaviour conforms consistently in representative instances of communication." [ISO/IEC 9646-1 p. v] There is a substantial research literature on conformance testing (see, for example, [Bertine et al., 1990; Bush et al., 1990; EC 1991; Pink, 1990; Probert & Desjardins, 1990; Vermur & Blik 1993]) as well as the ISO/IEC 9646 document which defines the overall framework and specific standards documents addressing conformance tests for individual standards in the OSI context. ISO/IEC 9646 also defines a taxonomy of conformance requirements (static and dynamic) and of types of tests. We will not attempt to summarize this material here; the interested reader is referred to ISO/IEC 9646. There is a great temptation to pursue the definition of conformance testing for GILS. The presence of existing literature, standards documents, and taxonomies gives the illusion that one is doing well-defined, well- understood, rigorous engineering and making progress within a generally accepted context. However, as the discussion thus far emphasizes, the payoff from extensive investments in conformance testing may be quite limited. It certainly does not begin to solve the critical problem of insuring interoperability among GILS components and making the GILS a success. REFERENCE IMPLEMENTATIONS AND TESTBEDS Reference implementations can play a very important role in promoting the development of a critical mass of interoperable implementations of a standard or suite of standards. Some reference implementations have been explicitly developed, in the sense that an agency concerned with the deployment of a standard or standards suite has funded an implementation that was widely distributed. Arguably this was the case with TCP/IP, to the extent that an implementation was developed and widely distributed with Berkeley UNIX through funding from the Department of Defense Advanced Research Projects Agency (ARPA) and this TCP/IP implementation became a de facto reference implementation. In some cases, the reference implementation also serves as a code base from which other (commercial) implementations are developed. In other cases, de facto reference implementations have evolved because a number of early implementors (all of who successfully interoperate with each other) have made servers available for new implementors to test against. Sometimes these de facto reference implementations will emerge almost by acclamation; they will be among the more robust implementations that may have been part of a testbed early in a protocol's development, or perhaps the most accessible implementations (in that they are publicly available for testing). The extent to which the organization providing such access to its implementation is prepared to offer debugging assistance to other implementors also often plays a role in establishing its implementation as a reference for the implementor community. In some cases a given organization's implementation may be so important to the marketplace that it becomes one of the de facto reference implementations because that organization can mediate disputes about conformance and interoperability by virtue of its marketplace position. Note that source code or even object code (binaries) need not be made available in order for an implementation to serve in the de facto reference implementation role for complex application level standards like Z39.50 or GILS; the key point is that a working service is available on the network for testing new implementations against. This has certainly been proven to be the case with Z39.50 (particularly in bibliographic applications) where there are a number of publicly available servers accessible across the Internet for interoperability testing, such as OCLC, RLIN, DRA, the University of California and Pennsylvania State University. A developer that has tested interoperability against all of these systems successfully will not have perfect code, but can be assured of a reasonable stable and correct implementation of the Z39.50 protocol. One of the major problems with getting successful, interoperable implementations of a new standard started is the creation of a group of de facto reference implementations, at least for testing purposes. This is essential if interoperability, rather than conformance testing is to be the primary approach. The formation of an testbed early in the adoption and implementation of a protocol can play an essential role in creating the initial core of de facto reference implementations to test against, and in helping other vendors who are considering investments in developing products that implement the new protocol to commit resources to product development. It is important to ensure that access to the de facto reference implementations continues to be made available to new developers entering the marketplace even after the first wave of products have appeared, though this usually will occur naturally since easy accessibility is usually part of the characteristics that define the de facto reference implementations. An advantage of using a testbed populated by a mix of experimental implementations from the research and education community and product prototypes from the commercial sector rather than commissioning a single reference implementation is that it can stimulate the development of industry-wide expertise without the problems of having a funding agency pick a single winner (thus giving the selected implementor of the reference implementation a privileged position for commercial competition) or allowing a single organization to dominate developments thorough its ability to interpret or even amend a standard. Without selecting a 'winner" (a designated reference implementation) funding from a sponsoring agency can still be useful in moving the testbed project forward and thus promoting implementation of the standard, and the creation of a marketplace of products that implement it. For example, funding might be provided to support testbed activities, the documentation and dissemination of knowledge gained about the protocols involved and the implementation issues involved in developing software that uses them, and perhaps even to ensure that one or more experimental implementations are available early and contain sufficient instrumentation to facilitate protocol analysis and testing. Funding can also be used to ensure that representatives of the user community participate in the testbed. History suggests that if a single reference implementation must be funded it is vital to differentiate it from products that may later enter the marketplace. One approach has been to have the reference implementation done by an organization that will not compete in a commercial marketplace (such as a university or other nonprofit). The resulting implementation is then placed in the public domain, or licensed at low cost to all who are interested; in some cases it becomes a beginning code base that commercial implementors can build upon, with all of the interoperability advantages that a common initial code base offers[5]. The approach of commissioning a non-commercial reference implementation has been used with mixed success in a some cases such as X.500 directory services; a university or other research organization has been commissioned to develop the reference implementation and has used development tools and approaches that have emphasized flexibility and speed of implementation, and evaluation instrumentation, rather than the robustness and high performance that characterizes successful commercial implementations. This helps to differentiate the reference implementation from other commercial implementations, but it does have its dangers. The reference implementation may turn out to be widely used, simply because it is free, or be the first implementation that the user community gains much experience with. If this implementation is slow or unreliable it may lead to a perception by the user community that the standard, rather than the implementation, is irretrievably flawed and may thus play a role in rejection of the standard by the user community. Well-known, readily accessible reference implementations on the network also serve an important function for customers. Rather than relying on vendor claims about interoperability or having to make complex, costly arrangements for trial implementations and acceptance testing, a potential customer can quickly gain a reasonable degree of confidence about the ability of a vendor system to interoperate simply by asking the vendor to demonstrate interoperation with one or more of the well-known reference implementations to the customer Ñ this can even be done at a trade show if the vendor booths contain systems attached to the network. The FIRP report endorses the idea of reference implementations and recognizes their importance (though it does not explicitly recognize the developing role of publicly accessible reference implementations as services on the network): "Interoperable implementations of the standards and profiles must also be available, preferably with at least one reference implementation in the public domain. Standardization on technology which has yet to reach implementation and limited deployment stages has generally been less than successful." [FIRP, 1994, section 4.4] Several comments should be made on the FIRP recommendations, however, in the context of GILS. First, GILS is a profile based on a set of well- established technology and standards, not an entirely new standard; Z39.50, for example, is already widely deployed and well-accepted. Thus there is already a sizable base of existing implementations, both commercial and public domain, that will interoperate with GILS implementations to at least a limited extent. Second, at least for network based information access applications, it may be that the FIRP report goes farther than necessary in calling for public domain implementations (although these do exist for Z39.50); perhaps more important, at least in our view, is that servers be publicly accessible on the network for interoperability testing, including testing by potential purchasers of commercial products. CERTIFICATION Certification Ñ either of conformance or of interoperability Ñ is an extremely attractive idea in the abstract; suppliers of servers and/or clients would receive some kind of certification from a testing agency, and purchasers of products that carry this certification could be assured that the products would interoperate with other products. Certainly, the FIRP report looks forward to a time when certification of products could be available and play a role in the federal procurement process. This approach carries a number of political (and, very likely, legal) complexities. Who should perform such certification? How should the certification activities be funded? How are disputes adjudicated? What methodology and tests would be used to verify conformance or interoperability? In the case of interoperability, what other products should a given product be tested against? Beyond the political and legal problems are very real technical ones. Certification is a much more comfortable fit with the rather mechanistic processes of conformance testing: for example, a certification that an implementation successfully processes a test suite of protocol data elements. This does not address interoperability issues (and indeed, it is not clear how one would certify interoperability, except against a limited number of specific systems at a specific point in time). It also does not provide a way to address semantic issues that are very important to the user community. Finally, it should be noted that certification is not, in our view, particularly useful in moving a suite of standards and implementations towards maturity or in validating a new profile or suite of standards; it often fails to directly engage the key communities of implementors, standards developers and end users. READY AVAILABILITY OF STANDARDS DOCUMENTS The effect of having standards documents (both in draft form and in final versions) readily accessible in electronic form on the Internet for public use and review has proven to be of substantial importance in furthering development of a base of interoperable implementations. The difficulty and high cost of identifying and acquiring relevant OSI standards for implementation efforts (or, indeed, for teaching or research purposes) has proven, over time, to be a substantial barrier to the marketplace adoption of these standards. Further, the difficulties in obtaining OSI standards have limited their review by interested communities that might have greatly improved both the quality and the comprehensibility of these standards. This is especially important for applications level standards due to the very broad community that is interested in them. Internet standards (the so called Requests for Comments, or RFCs) , in contrast, are publicly available at no cost through the Internet, both as drafts during their development and later in final form. A typical Internet standard receives a very wide review from the potential user and implementor communities as part of its development. Internet standards also form a vital part of the base of material used in teaching and research. Relevant RFCs are available instantly to implementors who need to refer to them; thus they are used heavily to resolve questions during design and implementation. We would argue, with Carl Malamud [Malamud, 1992] that the effects of easy public availability of both draft and final standards in electronic form have been underestimated as a factor both in the quality of standards and in acceptance and adoption of these standards by the implementor and user communities. The Draft Report of the Federal Internetworking Requirements Panel has endorsed the Internet RFC model as the appropriate approach for federal networking standards: "The Panel also believes that all standards and profiles used in federal networking need to be widely available in electronic and paper form at low or no cost. Consistent with the policy espoused in OMB Circular A-130, these fees should cover the cost of dissemination of the standard, not the cost of its development." [FIRP, 1994, section 4.4] The GILS effort has followed the Internet RFC model in making its documents widely available for review and reference through the Internet. Indeed, the initiative has also reached out to a very broad community through the use of List Servers (LISTSERVs) and other electronic mail reflectors to distribute announcements about the availability of draft documents at various points, and by providing versions of these draft documents in a wide range of popular formats. Strategically, this has been an important decision, and one that can be expected to contribute to the development of a large base of interoperable implementations. CONFORMANCE, INTEROPERABILITY AND DATABASE SEMANTICS One of the difficulties of GILS from an interoperability testing point of view is that the profile and related documents specify not only computer-to- computer protocols but also discuss the content of locator databases. A well-constructed GILS client should understand, for example, how to interpret browsing menus and cross reference records and display them usefully to users; such a client should also understand the unique IDs present in GILS records and use them to eliminate duplicate records from result sets obtained by searching across multiple GILS servers before displaying these results to a user. These are not protocol functions, but rather capabilities that are available to the client because it understands the semantics of GILS records. When one speaks of conformance and interoperability in this context, one is speaking about client function and not about protocols. There is very little precedent for discussing system behavior in this context. Yet a client system can interoperate (successfully interchange search requests and receive data) with a GILS locator database to the extent of searching the database and presenting records to the user without supporting any of these features, and without understanding the semantics of GILS records. Indeed, to facilitate limited interoperability with the installed base of bibliographic and WAIS clients GILS servers are expected to support forms of record export that to some extent conceal the full semantics of locator database records. This might be the case in communicating with an existing Z39.50 client designed to support bibliographic searching, for example. Here critical issues are the mapping of the GILS data elements into MARC records which the server will then present to the bibliographic client, and the quality of the resultant MARC records. In this context it makes little sense to discuss client conformance and interoperability because the clients were never designed to interoperate with GILS servers in the first place; rather the interoperability issues become those of having GILS servers successfully emulate the kinds of servers that these clients were designed to interoperate with, including the ability of GILS servers to perform semantic mappings of data elements from locator database records into record interchange formats that are know to this base of clients. The GILS servers must in fact conform to other standards (such as MARC) and interoperate with systems that implement these other standards. One open question is whether the MARC records exported by GILS servers will meet the minimum standards for completeness and quality that bibliographic clients may establish; this is particularly troublesome because there are no explicit standards for such MARC records broadly accepted within the bibliographic community. The definition and description of these various levels and forms of interoperability are likely to be quite confusing to potential customers for GILS clients. CONCLUSIONS AND RECOMMENDATIONS Interoperability testing, rather than conformance testing, should be the keystone of any program to further the development of an interoperable base of GILS clients and servers. GILS, as an Internet application, has already positioned itself within this tradition, and has already made good use of practices within this tradition such as the public distribution of draft standards documents for wide review and comments. We do not recommend any substantial investment of effort in the development of conformance tests. Some highly non-comprehensive conformance test streams were developed within the Z39.50 implementor community as a debugging (rather than conformance testing) tool; these will be useful to GILS implementors, and it might be worth the rather minimal investment of effort to develop some collections of test protocol data units (PDUs) specifically as a debugging tool for GILS implementors. Certification is not a feasible approach, either politically or technically, at this stage in the development of the GILS. Leaving aside the basic problems of certification, we would argue that the priority at present is to create a base of conformant implementations and of implementation expertise and to validate the GILS profile through implementation (make changes to the profile as required based on implementation experience). This is best achieved by actively engaging implementors, users and standards developers rather than by assigning responsibility for testing or certification to some third party. It is essential to recognize that the success of the GILS initiative depends on more than simply interoperable software. The GILS system architecture and profile documents contain rules and guidelines for the construction of content for the locator databases. These need to be validated as well; a key issue will be the development of knowledge about how to construct locator databases, and the resolution of questions about how to propagate records from one database to another, the appropriate inclusion of cross- reference records, and the granularity of information resources identified by locator records. Interoperability testbeds, in part as a way of developing a de facto core of well- known reference implementations and in part as a way of simply moving early implementations to a more mature state and ensuring their mutual interoperability, deserves careful consideration. A testbed can also be justified as a means of validating the system architecture and the profile, and as a means of gaining experience with content-related questions. Testbeds have been an effective approach in other, similar enterprises, including the development of Z39.50 clients and servers for bibliographic applications. GILS has the advantage that many of the Z39.50 developers that would either produce GILS profile specific clients and servers or who would want to upgrade their existing clients to work well with GILS servers are already familiar with the testbed model and indeed may have had prior experience participating in testbeds. If the testbed approach is followed, one important decision would be the selection of an appropriate neutral party to organize the testbed. While we do not have a specific recommendation for a sponsoring organization for the testbed, we feel that it because so many of the testbed issues revolve around content and its representation that the organizer should not be a federal agency that has a vested interest in the way they have structured the contents of their locator database. Another unique issue for the GILS project is that two closely linked testbed projects may be needed; one for implementors of the GILS profile proper, and a second to explore interoperability issues between GILS profile servers and the existing installed Z39.50 client base. Building these two testbeds and carefully articulating and demonstrating the differences between them might also help to address confusion that will occur in the user community about interoperability expectations for GILS profile and non GILS profile clients communicating with GILS servers. Particular attention will need to be given to exploring interoperability issues (and related quality issues) in data content and data element mappings to record types such as MARC, as well as to defining and explaining the various forms of partial interoperability that may be achieved between GILS servers and clients developed by other communities of Z39.50 implementors, particularly the WAIS community and the bibliographic community. The bibliographic Z39.50 implementor and user communities are particularly important because they include the federal depository libraries and the broader community of government documents librarians, who are already familiar with bibliographic systems (including those which are used for organizing government documents) and who will also be an essential user community for the GILS, both as direct users and as intermediaries that assist the broader public in using the GILS. This is poorly explored and complex territory and to some extent falls outside of the usual framework of conformance and interoperability testing for protocols, yet it is critical to the success of the GILS initiative. We strongly recommend that if a testbed is pursued, these communities be brought into the testbed as representing one of the key user perspectives. NOTES 1. It should be noted that while the GILS documents only directly address the development of locator databases by federal government agencies, there is nothing in the architecture or design of the system that precludes the expansion of the GILS system to also incorporate locator databases providing access to information resources outside of the federal government or locator databases developed outside the federal government but providing access to a mixture of federal government and other information resources (for example, these locator databases might be developed by libraries, by commercial information providers, by state or local governments within the United States, or, indeed, even by other national governments or international organizations). In addition, the GILS architectural model could also be replicated by other organizations without any content linkage to the US federal government GILS system. 2.For a sense of the scope and diversity of the existing Z39.50 client base, see [Moen 1994]. 3.Actually, because the use of Z39.50 is not defined in a TCP/IP environment by the ANSI/NISO Z39.50 standard, GILS also specifies the mapping of Z39.50 on top of TCP/IP [Lynch, 1994]. 4.While outside the scope of the GILS program, it seems both desirable and probable that clients designed specifically to support the GILS profile will also be designed to interoperate with other Z39.50 servers, such as WAIS servers or bibliographic servers. 5.One of the issues with a common public domain or other generally available code base is that problems as a standard evolves, and the need for mechanisms to manage this code base. Various approaches have been taken to address this; for example, the code base of Berkeley UNIX was re-released for each new version of the de facto standard, which meant that commercial developers had to repeatedly re-integrate the common source code base. The X Consortium also uses a common code base, and invites implementors to contribute code back to the common code base that becomes part of new releases. REFERENCES Bertine, Herbert, Elsner, Wolfgang B., Verma, Pramode K., and Tevani, Kamlish T. "Overview of Protocol Testing Programs Methodologies and Standards" AT&T Technical Journal. (January/February 1990): 7-16. Bush, Matthew, Rasmussen, Kris and Wong, Tai. "Conformance Testing Methodologies for OSI Protocols: AT&T Technical Journal. (January/February 1990): 84-100. Christian, Eliot. Government Information Locator Service (GILS); Report to the Information Infrastructure Task Force. (May 2, 1994). Available by anonymous FTP from 130.11.48.107 as /pub/gils.doc (in Microsoft Word format) or /pub.gils.txt (ASCII text). Conformance Testing and Certification in Information Technology and Telecommunications. Proceedings of the European Conference, Brussels, Belgium 13-15 June 1990. Amsterdam, Netherlands; IOS, 1991. (cited as [EC 1991]) Federal Internetworking Requirements Panel. Draft Report of the Federal Internetworking Requirements Panel. Prepared for the National Institute of Standards and Technology, January 14, 1994. ISO/IEC 9646-1. International Standards Organization. Information Technology Ñ Open Systems Interconnection Ñ Conformance testing methodology and framework. Part 1: General Concepts. 1991. ISO/IEC 9646-2. International Standards Organization. Information Technology Ñ Open Systems Interconnection Ñ Conformance testing methodology and framework. Part 2: Abstract test suite specification. 1991. ISO/IEC 9646-3. International Standards Organization. Information Technology Ñ Open Systems Interconnection Ñ Conformance testing methodology and framework. Part 3: The Tree and Tabular Combined Notation (TTCN). 1992. ISO/IEC 9646-4. International Standards Organization. Information Technology Ñ Open Systems Interconnection Ñ Conformance testing methodology and framework. Part 4: Test realization. 1991. ISO/IEC 9646-5. International Standards Organization. Information Technology Ñ Open Systems Interconnection Ñ Conformance testing methodology and framework. Part 5: Requirement on test laboratories and clients for the conformance assessment process. 1991. Kahle, Brewster, et al.., "Interfaces for Distributed Systems of Information Servers" Networking, Telecommunications, and the Networked Information Revolution. Proceedings of the ASIS mid-year meeting, May 28-30, 1992, Albuquerque, NM. Silver Springs, MD; ASIS, 1992. 124-148. Ledrick, Diane P. & Spring, Michael B. "International Standardized Profiles". Computer Standards & Interfaces 11(1990): 95-103. Lynch, Clifford A. Using the Z39.50 Information Retrieval Protocol in the Internet Environment. Draft Internet RFC. Malamud, Carl. Stacks: interoperability in today's computer networks. Englewood Cliffs, NJ; Prentice Hall, 1992. McClure, Charles R. and Moen, William E. Application Profile for the Government Information Locator Service (GILS) (May 7, 1994). Available by FTP from ericir.syr.edu. (Cited as [McClure & Moen, 1994a]) McClure, Charles R., and Moen, William E. Using Z39.50 In An Application For The Government Information Locator Service (GILS): A Background Paper (May 7, 1994) Available by FTP from ericir.syr.edu. (Cited as [McClure & Moen, 1994b]) McClure, Charles R. and Moen William E. Expanding Research and Development of the ANSI/NISO Z39.50 Search and Retrieval Standard. (October 16, 1993). Available by FTP from ericir.syr.edu. McClure, Charles R., Ryan, Joe, and Moen, William E. Identifying and Describing Federal Information Inventory/Locators Systems: Design for networked-based Locators. 2 vols. Bethesda, MD; National Audio Visual Center, 1992. Available from ERIC document no. ED349031. Moen, William E. The Z39.50 Protocol: Information Retrieval in the Information Infrastructure. (June 1994). Available from NISO. NISO Z39.50-1992. National Information Standards Organization. ANSI/NISO Z39.50-1992, Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. Gaithersburg, MD; National Information Standards Organization Press. Pink, Jane. "Conformance Testing". An Analysis of the Information Technology Standardization Process. John L. Berg and Harold Schumny (eds.) Elsevier Science Publishers (North-Holland), 1990. 111-116. Probert, R.L. and Desjardins, M.M. "Improving Quality and Interoperability of Protocol Implementations via Conformance Testing Standardization". IEEE International Conference on Communications ICC '90. Atlanta, GA, 16-19 April 1990. New York; IEEE, 1990. 1374-1381. Rose, Marshall T. The Open Book: A practical perspective on OSI. Englewood Cliffs; Prentice Hall, 1990. Vermur, G.S. and Blik, H. "Interoperability Testing: Basis for the acceptance of communication systems (theory and practice)". (Sixth International Workshop on Practical Test Systems. IFIP TC6/WG6.1 Pau, France, 28-30 Sept. 1993) IFIP Transactions C (Communication Systems), 1993. C-19:315- 30.