You are viewing a Web site, archived on 23:28:33 Dec 11, 2012 It is now a Federal record managed by the National Archives and Records Administration. External links, forms, and search boxes may not function within this collection.
[ hide ]
A text database is a collection of related documents assembled into a single searchable unit. The individual documents
can be massive or minuscule, but they should bear some relation to each other.
A database is composed of smaller units called records. In a text database, a record can be an entire document, a
section within a document, a single page, or a fragment of text within a page. When you search a database, you will
retrieve one or more records containing information that satisfies your query.
A record can contain smaller regions of data called fields. A field usually defines a particular type of data common to
several or all records within a database. For instance, in a database of corporate memos, wherein each memo makes up
a record, the following fields might be used: TO, FROM, DATE, SUBJECT, and TEXT. You can narrow the scope of a
search by restricting it to one or more fields. In this example, you might limit your search to the FROM field when
searching for a sender's name. Only those records with the specified name in that field would be retrieved.
As opposed to a keyword-based system, PLWeb Turbo is full-text retrieval software, meaning that it indexes every word in a
document, with the exception of stopwords. Stopwords are those terms that PLWeb Turbo is programmed to ignore during the
indexing and retrieval processes, in order to prevent the retrieval of extraneous records. Generally, a stopword list
includes articles, pronouns, adjectives, adverbs, and prepositions (the, they, very, not, of, etc.) that are most common in
the English language. After reading about relevance ranking, you'll understand why a stopword list is used.