Relational Databases

The two main software components (library and curatorial) that we have selected for this integrated access project are relational databases. In order to understand what that means and why it is important, it is useful to take a quick look at the predecessors to the relational database.

Originally, most organizations managed their operations with the help of a manual file system, usually consisting of a collection of file folders. The contents within each file folder were logically related and as long as the information needs and reporting responsibilities of the organization were relatively simple, this system worked (and still does work) quite well.

As information needs grow and become more complex, the manual system often becomes too cumbersome and timely information retrieval virtually impossible. Early computerized file systems simply mirrored manual systems, with individual files for different kinds of information (here we might have one file for donor, another for accessions, etc.). Because information could not be shared across files there was a great deal of data redundency, which often leads to a higher level of data inconsistency and data anomalies. For instance, using the museum example, you would need to enter the name of the donor in both the donor file which would record the address, etc. as well as the accession file to make a connection with a museum object. Programmers would write a program to generate even the most rudimentary reports and as reporting requirements grew, so did the level and complexity of programming. Programmers used a third-generation language (3GL) to tell the system both what to do and how to do it. Examples of 3GL include Common Business-Oriented Language (COBOL), Beginner's All-purpose Symbolic Instruction Code (BASIC), and FORmula TRANslation (FORTRAN).

The main drawbacks to data management with the file system are structural dependence and data dependence. Structural dependence means that a change in the file structure, no matter how small, forces modifications in all the programs that use the data in that file (and in most cases your talking LOTS of programs). Data dependence means that any time you make changes in the characteristics of the data (such as from integer to decimal), all programs that access that data must also be changed. Because of these constraints as well as the fact that programming is time consuming, getting a quick result to a new information need is nearly impossible.

As a response to these problems, various database systems evolved. The basic difference between a database system and a file system is that a database consists of a single repository for related data, as opposed to separate and unrelated files as in the file system. As Peter Rob and Carlos Coronel describe in their book, Database Systems,

"Information is derived from data, which are usually stored in a database. To implement a database and to manage its contents you need commercial software known as a DataBase Management System (DBMS). Database design defines the database structure; the DBMS stores the facts about the structure in the database itself. The database thus contains the data you have collected and "data about data" known as metadata. Good database design is important because even a good DBMS will perform poorly with a poorly designed database."

The Relational Database (finally!)

The relational model was first developed in 1970 by E. F. Codd who was working for IBM. It was considered revolutionary but impractical at the time due to a general lack of sufficient computing power (trivia: the computer on which you are reading this now has more computing power than was aboard the Apollo spacecraft that landed on the moon). Rob and Coronel explain:

"The relational database model gives us the luxury of forgetting the actual physical data storage characteristics, thereby allowing us to concentrate on the logical view of the database. That is, we may focus on the human perception of data storage rather than on the often difficult-to- comprehend manner in which the computer sees those same data. Since the relational model achieves both data independence and structural independence, it becomes much easier to design the database and to manage its contents.

One of the reasons for the relational database model's rise to dominance in the database market is its very powerful and flexible query capability. For most relational database software, the query language turns out to be Structured Query Language (SQL)."

The basic data components in a relational database are "entities" and their "attributes" and the basic logical structure is a "table". One of fundamental principles of relational databases is that each table is a separate and independent unit (only logically speaking, because remember that a database stores all data in a single repository), although tables may be related to one another.

It should be noted that the word relation is based on the mathematical set theory (relational algebra) that is the basis of Codd's model. It does not refer to the fact that the relational model utilizes relationships, and it would be erroneous to assume that other database models cannot permit the use of relationships.

It is the fundamental structures of the relational database as well as the powerful ad hoc querying capabilities of SQL that will hopefully enable us to integrate our traditionally separate information sources.

last updated on 10/31/97