Global Information Locator Service: history and future developments

Eliot Christian
United States Geological Survey

First, I would like to address a couple common questions about GILS:

Why is the United States Geological Survey so involved in the Global Information Locator Service and,

How does this work relate to the U.S. Government Information Locator Service?

The U.S. Geological Survey is one of many participants in the international "Global Change Research Program". Global Change encompasses many science and socio-political issues dealing with challenging problems such as climate change, loss of biological diversity, and population growth.

My job at the USGS is to help make Earth science data and information more accessible to researchers and the public. In that vein, I helped design the program plan on data management for Global Change, with particular attention to the problem of how people find relevant data and information resources.

The Government Information Locator Service grew out of this work and was established in U.S. law in 1995. With the adoption of GILS in Canada, some adaptations were made to address multiple languages and GILS was then offered as a model for a Global Information Locator Service through the G7 Global Information Society initiative.

Having roots in the Global Change Research program led to certain design choices in the architecture adopted by GILS. At a deep level, these science and application issues closely parallel public policy interests concerning free flow of information and public access to government information.

From a system design perspective, Global Change data management presents some interesting problems. We know that Global Change issues will be in a formative stage for decades. It may be 40 years before scientists even know the right questions to ask in some areas. This reality has profound implications.

For example, we might try to define our user community. Virtually anyone, anywhere who makes decisions affecting, or affected by the environment is within the targeted user community. Such users range from children to politicians and specialized researchers, and they communicate in any language, any discipline. Moreover, given the 40-year lead-time, most of these users have not even been born yet. Clearly, it is going to be difficult to interview these folks and gather user requirements.

What is the data and information domain?

Relevant resources range from small data tables to massive, global observation collections. They also include directories of people, chronicles of events, and bibliographic references for publications, books, and maps. Other relevant resources are the holdings of natural history museums and archives of all kinds, from butterflies to genetic libraries to the USGS rock library. Unfortunately, much of the most interesting stuff is in the real world and not itself electronic.

At first blush, it looks like finding data and information is already a huge problem and that this broadly scoped view only makes things worse. It turns out, though, that this broad view is not only necessary, it is very useful in forcing us to focus on fundamentals.

In GILS, the key to making things easier to find is think of the network itself as a vast, interconnected catalog. We can start by thinking about how people find information.

Let's say a person wants to pick a face out of a crowd, find a library book, or search the Internet. In a basic sense, the searcher is looking for patterns among the observable characteristics of some set of objects.

In the context of electronic networks, there are two kinds of opportunities here. To make it easier for people to find information, we can improve the pattern-matching process, and we can make it easier to observe object characteristics.

We all expect rapid evolution in standards for emerging components like natural language processing. Even as we await these new tools, we have an immediate basic need for a standard way to communicate search requests. To serve this need, GILS adopts adopted the international standard protocol for information search, and the GILS Profile is specially tuned for networked information discovery. Major players worldwide already support the base standard, ISO 23950, including most information services and libraries.

Even the best processes for pattern matching cannot work unless we can pick out specific object characteristics. Among these are common bibliographic characteristics--author, title, subject, date published, and so on. Such characteristics, or "cataloging rules", have been developed over many decades and are widely used throughout publishing, libraries, archives, in thousands upon thousands of institutions worldwide. Again, the GILS Profile simply adopts this widespread practice and applies it to networked information discovery.

While GILS supports search of traditional bibliographic catalogs, the search protocol applies just as well to machine-aided indexing and techniques of pattern recognition. It is already used for searching Web pages, telephone directories, gene sequence libraries, maps by latitude/longitude, imagery by content, chemicals by structure, and so on.

These pieces are the essence of the Global Information Locator Service. GILS simply points to relevant parts of existing international standards that support the use of networks as catalogs for all manner of data and information resources.

What is the strategy for deploying GILS broadly?

To assure that the free flow of ideas is sustainable over the long term, we cannot merely invoke government fiat. Instead, we must find and promote solutions that work for commercial and entertainment interests as well as Earth science.

Wide diversity is a defining characteristic of the emerging Global Information Society. There is now good consensus that information standards intended for global use should presume not central authorities, but decentralized and interoperable approaches.

My impression is that current bibliographic cataloging practice provides a good example of an effective decentralized approach. Cataloging standards provide interoperability across independently maintained libraries, while allowing wide latitude in how collections are developed and organized. Over the last decade, open standards in the library and information services communities have evolved to take advantage of public networks such as the Internet.

Much progress has been made, but we must acknowledge the bare fact of the currently primitive facilities for handling complex information at the personal, corporate level, community, national, and global scales. The focus of GILS on fundamentals is a conscious strategy to deal with the certainty of continuous evolution and occasional drastic revolutions.

With this short history and strategy overview as background, we can look at a few basic definitions.

GILS has been defined as "A decentralized collection of locators and associated information services used by the public either directly or through intermediaries to find information."

In the standards context, GILS is seen as an application profile forming part of a service definition. This "middleware" service performs specific functions useful for locating information. It is available for use by higher level applications and itself makes use of lower level components such as bitways.

A "locator" is defined as an information resource that identifies other information resources, describes the information available in those resources, and provides assistance in obtaining the information. A locator can be modeled as a database of locator records, each of which is a set of related data elements. If you like, you can regard locator records as equivalent to metadata, meta-information, directories, catalogs, or abstracts. All of these descriptive mechanisms are "bibliographic" in a broad sense, and GILS is carefully contrived to be interoperable with existing bibliographic systems.

What information resources can be represented through GILS, and how does it complement other schemes being floated?

GILS is a very useful facility for locating networked information resources. For example, Web documents can act as GILS locator records. Such documents can be searched via GILS, and HTML metadata can take advantage of GILS Core elements. Yet, GILS differs from most Internet metadata schemes because GILS locator records are designed to act as pointers not just to Internet pages, but to ALL kinds of information--including people, organizations, events, books, artifacts, paper documents, and so on.

GILS itself has no prescription for level of aggregation. However, communities of interest usually specify the level by adopting a specific "GILS Usage Guideline". GILS locator records already describe information resources ranging from individual pamphlets up to multi-national programs. North Carolina is using GILS locator records to describe individual fields within databases throughout the state. The U.S. Government Printing Office created GILS locator records at the level of entire agencies.

A GILS locator can be described by a record in another GILS locator. Using the Linkage element, the separate GILS locators form a network for distributed search. This recursive feature can be applied to the "query routing" problem of Internet searching and is basic to the Advanced Search Facility now under development for the U.S. Federal Government. We have also created a searchable database of hundreds of information sources that support search through the ISO 23950 protocol--many of which have many millions of bibliographic or GILS records. Such multi-step locators can encompass all of the Internet-accessible library catalogs, Web crawlers, WAIS servers, gopher, and X.500 databases yet also cover purely non-electronic resources like the USGS Rock Library.

Where are we now in terms of GILS implementations?

I have mentioned that GILS is established in U.S. law. It also plays a role in the National Spatial Data Infrastructure, the National Biological Information Infrastructure, and several other national initiatives. The Library of Congress is making its products and search engines GILS-compliant. Moreover, all 1,400 Federal Depository Libraries are required to provide public Internet access with GILS-aware client software.

North Carolina has a law and an executive order for GILS. Several other states and regional organizations also have GILS, including Florida, Massachusetts, Missouri, New York, South Carolina, Texas, and Washington, and there are additional states and cities involved in the National Spatial Data Infrastructure. The Southern Growth Policy Board is using GILS in the context of economic development.

GILS is showing up in Australia, Canada, Japan, Denmark, Germany, and other countries. GILS is especially well known for access to environmental information. In the Global Environmental Information Locator Service, G7 countries and others have agreed to use GILS for environmental information worldwide. This includes the European Environment Agency "Catalogue of Data Sources", and some United Nations conventions such as Climate Change and Biological Diversity.

There is no central registry of GILS implementations, but it is clear they are already many and diverse. These various organizations do not need to be tightly coordinated. They do not submit to a single view of information, nor do they subordinate themselves to some "mother of all GILS". Although their disparate goals are reflected in different Usage Guidelines, they are interoperable with all other GILS as well as libraries, information services, and other major resources worldwide.

How is GILS interoperability maintained?

Interoperability among the many different GILS-compliant servers is defined through the GILS Profile, coordinated internationally through the Open Systems Environment Implementors Workshop. The GILS Profile is defined on peer networks such as X.25 and TCP/IP. Instead of working at the presentation level like Web pages, GILS defines a machine-level network interface, which also supports automated agent clients.

The GILS Profile only specifies the behaviors of server software at the client/server interface--it does not constrain clients at all. In addition, on the back-end of the service definition, server behavior is defined in terms of an abstraction layer. This means that the service definition is completely independent of how servers actually manage content. Content need not even exist prior to the search request.

Viewed from a client, a GILS-compliant server appears to hold a searchable set of locator records. Each locator record can characterize other information of any kind, at any level of aggregation. These may be handcrafted catalog records or on-the-fly products of an automated abstracting or classification process. For example, a locator record that describes another server could include a listing of the words most characteristic of that server's contents. (The Advanced Search Facility uses centroids, for example.) Used this way, the locator record acts as an intermediary resource for information discovery.

Searches can be content-based using full-text searching or other feature extraction. The search can also exploit registered attributes such as structured elements (e.g., Title, Author, Subject) and relations (e.g., Equal, Greater Than). GILS offers well-known registered semantics for about 150 metadata elements, all of which have one-to-one semantic equivalents in MARC. GILS locator records can have locally-defined elements as well.

GILS-compliant servers must support the required attributes and all registered elements, and must not degrade locally defined elements. Any specific set of locator records available from a GILS-compliant server may have any number of these or other elements, and may have none at all. (Records without any element structure are searched full-text through a special attribute called "Anywhere".)

Interoperability is based on registered elements rather than a record format. This means GILS supports interoperable search of many different metadata structures--HTML, SGML, X.500, SQL databases, Dublin Core, IAFA, Internet mail, Whois++ templates, spatial metadata, and so on. GILS-compliant servers simply map local semantics to the registered elements. Multi-lingual searching is supported because the elements are referenced by number---interfaces simply translate the number to the particular language in use.

Why do we need so much of this standards stuff just to search metadata?

Many separate communities worldwide over many decades have used bibliographic techniques to characterize data and information resources. Unfortunately, when each community uses different tags for the bibliographic or metadata elements, any commonality becomes obscured.

The usual result of this independent development is that there is no functional interoperability between the catalog services, unless some organization is able to force the communities to accept imposition of a common format. Such strong-arm tactics may be useful at times, but they are clearly inappropriate on a long-term and global scale.

GILS takes a gentler approach--it encourages interoperability at the semantic level but let each community develop as much additional interoperability as they want. Often, this is merely coming to agreement on a Usage Guideline. Occasionally, a more specific profile is constructed on top of GILS, as in the Geospatial Profile.

Access to GILS-compliant servers happens through gateways, clients, or agents. Gateway freeware or tools are available for World Wide Web, X.500/LDAP, SQL, and WhoIs++. It may also be possible to implement GILS functions in the new Resource Description Framework (RDF) mechanism.

Any client software capable of access to a server compliant with Z39.50 version 2 or 3 can access GILS-compliant servers. This includes traditional bibliographic systems still widely deployed in libraries in the U.S. and elsewhere in the world. Extra capability is provided by GILS-aware clients such as BookWhere 2000, MetaStar, SIRSI Vizion, and Znavigator.

The GILS application profile specifies, though it does not mandate, spatial searching by latitude and longitude. This extension of search beyond text is an important feature of Z39.50. The protocol has already been applied to imagery, to searching for chemicals by bond angles, and to searching gene sequences--other pattern-matching is certainly feasible.

We are seeing some work on integrating search into a Web server with advanced database management systems, such as PostGres, Informix DataBlades, and Oracle 8 Cartridges. This approach is in use at the Alexandria Digital Library project, which is exploring the technologies for access to geospatial stores, including search by image content, traditional text, latitude-longitude, and conventional SQL. The beauty of a generalized search protocol is that these four very distinct search methods can be handled in a single search protocol so that the data stores can be treated as a coherent whole.

By applying an existing standard in wide use for many years, GILS takes advantage of existing networks and software to access a vast array of valuable resources--libraries, museums, archives, and spatial data repositories. These professionally maintained resources provide access to information resources collectively valued in the tens of billions of dollars.

As a long-term infrastructure initiative, it is clear that GILS will be evolving in many ways. Yet, there are some basic principles from which GILS should not diverge over the decades:

Policy and technology must support the diverse points of view in our Global Information Society. There should remain no preference toward any particular hierarchy or other way of organizing information, but many organizing structures should co-exist.

Intermediaries will have a crucial role in decentralized access. Content owners and intermediaries should be able to draw from other locators and also make their value-added products known.

GILS must continue to use open standards, and be fully coordinated through the international voluntary standards processes. GILS must continue to be sensitive to the world's many languages, and it must accommodate issues such as copyright, security, and payments.

GILS must continue to point to information in all media and forms, and should be extensible into many information processing modes.

GILS must look to the future, but preserve access to accumulated knowledge represented in libraries, museums, and archives worldwide. Adopted standards must accommodate the variable pace of technology worldwide.

We cannot today predict a single technical basis for GILS over the decades. Evolutionary and sometimes revolutionary changes must be expected and accommodated, but not at the expense of these basic principles.