Knowledge organization system

Jump to: navigation, search


A generic term used for structured knowledge models such as authority files, glossaries, thesauri, taxonomies, ontologies etc.


The emphasis on developing comprehensive knowledge organization systems (KOSs) can be seen in the works of our earliest philosophers, many of whom continue to influence our view of the world. For example, Aristotle's effort to categorize knowledge into groups (such as physics, politics, or psychology) is reflected in our language, our education and our science.

Knowledge organization systems are deceptively simple and complicated at the same time.

They are simple because they are absolutely basic to human consciousness, so everybody manipulate and creates them with great ease. Our world is populated with categories of family, friends, social groups, objects, concepts, activities, feelings, places and many other things.

At the same time they are complicated because we use them for the most part unreflectively - they are simply part of our mental and social background - and we use categories in a huge variety of ways, often in contradictory and inconsistent ways.

The main point is that knowledge organization is not simply about locating and retrieving relevant knowledge, knowledge organization is a fundamental precondition for managing knowledge effectively.

The concept 'knowledge organization' means in general the rules or conventions of order or arrangement of information and knowledge.

The term knowledge organization systems is intended to encompass all types of schemes for organizing information and promoting knowledge management. Knowledge organization systems include classification and categorization schemes that organize materials at a general level, subject headings that provide more detailed access, and authority files that control variant versions of key information such as geographic and personal names. Knowledge organization systems also include highly structured vocabularies, such as thesauri, and less traditional schemes, such as semantic networks and ontologies. Because knowledge organization systems are mechanisms for organizing information, they are at the heart of every information system and archive.

Knowledge organization systems are used to organise materials for the purpose of retrieval and management an information collection. All information systems use one or more KOS. Just as in a physical library, the KOS in an information system provides an overview of the content of the collection and supports retrieval. The scheme may be a traditional KOS relevant to the scope of the material and the expected audience for the digital information system (such as the INIS Thesaurus, the Dewey Decimal System or the INSPEC Thesaurus), a commercially developed scheme such as Yahoo or Excite categories, or a locally developed scheme for a corporate intranet.

Main types of Knowledge organisation systems

There is no single knowledge classification scheme on which everyone agrees.  Culture may constrain the knowledge classification scheme so that what is meaningful to one culture is not necessarily meaningful to another. Therefore, we live in a world of multiple, variant ways of organizing knowledge.

All knowledge organization systems can be grouped into three clusters:

  • Term Lists
  • Classifications and categories (lists with hierarchy)
  • Relationship lists (lists with hierarchy and semantic links)

Term Lists

Term lists are lists of terms, often with definitions. They are split into the following categories:

Authority Files - lists of terms that are used to control different names for an entity or the domain value for a particular field. Examples include names for countries, individuals, and organizations. Non-preferred terms may be linked to the preferred versions. This type of KOS generally does not include a deep organization or complex structure. The presentation may be alphabetical or organized by a shallow classification scheme. Examples of authority files include the Library of Congress Name Authority File and the Getty Geographic Authority File.

Glossary - a list of terms, usually with definitions. The terms may be from a specific subject field or from a particular work. The terms are defined within a specific environment and rarely include different meanings. Examples include IAEA Glossary of Terms for the Nuclear Knowledge Management (available in this course).

Dictionary - an alphabetical lists of words and their definitions. Alternative meanings are provided where applicable. Dictionaries are more general in scope than glossaries. They may also provide information about the origin of a word, variants (by spelling and morphology), and multiple meanings across disciplines. While a dictionary may also provide synonyms and related words, there is no explicit hierarchical structure or attempt to group them by concept.

Controlled vocabulary - a rather broad term, but in this particular case meaning a closed list of named subjects, which can be used for classification. In library science this is sometimes known as an indexing language. The constituents of a controlled vocabulary are usually known as terms, where a term is a particular name for a particular concept (this is pretty much the same as the common-sense notion of a keyword).

It is common to distinguish concept and term (term being the name of the concept) where concept may have multiple names and the same term may name multiple subjects. A controlled vocabulary consists of terms, and not directly of concepts, and in general each term will be disambiguated to refer to a single subject (that is, there will be no duplicate terms). Note that "subject" as we have used the term so far is effectively equivalent to "concept".

The purpose of controlling vocabulary is to avoid authors defining meaningless terms, terms which are too broad, or terms which are too narrow, and to prevent different authors from misspelling and choosing slightly different forms of the same term. Thus we can prevent authors from using "topic navigation maps" and "topic map" by forcing them to choose "topic maps".

Gazetteers - a list of place names. Traditional gazetteers have been published as books or have appeared as indexes to atlases. Each entry may also be identified by feature type, such as river, city, or school. An example is the U.S. Code of Geographic Names. Geo-spatially referenced gazetteers provide coordinates for locating the place on the earth's surface. The term gazetteer has several other meanings, including an announcement publication such as a patent or legal gazetteer. These gazetteers are often organized using classification schemes or subject categories.

Classifications and Categories

Classifications and categories are basically lists with hierarchy. They can be split into the following:

Subject headings - provide a set of controlled terms to represent the subjects of items in a collection. Subject heading lists can be extensive and cover a broad range of subjects; however, the subject heading list's structure is generally very shallow, with a limited hierarchical structure. In use, subject headings tend to be coordinated, with rules for how they can be joined to provide concepts that are more specific. Examples include the INIS Subject Headings and the Library of Congress Subject Headings (LCSH).

Classification schemes and categorization schemes - are often used interchangeably. Although there may be subtle differences from example to example, these types of KOSs all provide ways to separate entities into "buckets" or broader topic levels. Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. These types of KOSs may not follow the rules for hierarchy required in the ANSI NISO Thesaurus Standard (Z39.19) (NISO 1998) and they lack the explicit relationships presented in a thesaurus. Examples of classification schemes include the Library of Congress Classification Schedules (an open, expandable system), the Dewey Decimal Classification (a closed system of 10 numeric sections with decimal extensions), and the Universal Decimal Classification (based on Dewey but extended to include facets, or particular aspects of a topic). Subject categories are often used to group thesaurus terms in broad topic sets that lie outside the hierarchical scheme of the thesaurus. Taxonomies are increasingly being used in object-oriented design and knowledge management systems to indicate any grouping of objects based on a particular characteristic.

Taxonomy The term taxonomy has been widely used and abused to the point that when something is referred to as a taxonomy it can be just about anything, though usually it will mean some sort of abstract structure. Taxonomies date back to Carl von Linné, who developed a hierarchical classification system for life forms in the 18th century - this has been the basis for the modern zoological and botanical classification and naming system for species. In this course we will use 'taxonomy' to mean a subject-based classification that arranges terms of a controlled vocabulary into a hierarchy. However, in real life you will find the term "taxonomy" applied to more complex structures as well.

In a taxonomy, the means for subject description consist of essentially one relationship: the broader / narrower relationship used to build the hierarchy.

Taxonomy plays an extremely important role in knowledge management to categorize knowledge objects, to link them and to build knowledge maps.

Effective taxonomy in knowledge management has the following attributes:

  • be a classification scheme
  • have semantic nature
  • represent a knowledge map

A good taxonomy should enable users with one "cast of the eye" to immediately have a grasp of the overall structure of the knowledge domain covered by the taxonomy. The taxonomy should be comprehensive, predictable and easy to manage.

Relationship lists

Relationship lists are hierarchies with semantic links emphasizing the connections between terms and concepts, such as:

Thesauri - are based on concepts and show relationships between terms. Relationships commonly expressed in a thesaurus include hierarchy, equivalence (synonyms) and association or relationships. These relationships are generally represented by the notation BT (broader term), NT (narrower term), USED FOR (synonym), and RT (associative or related term). Associative relationships may be more detailed in some schemes. For example, the INIS Thesaurus has defined eight relationships, many of which are associative. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms to be used for each concept.

There are standards for the development of monolingual thesauri (NISO 1998; ISO 1986) and multilingual thesauri (ISO 1985).

Many thesauri are large; they may include more than 50,000 terms. Most were developed for a specific discipline or a specific product or family of products. For example, the INIS Multilingual Thesaurus contains over 40,000 terms and available in seven languages.

Semantic Networks - with the advent of natural language processing, there have been significant developments in semantic networks. These KOSs structure concepts and terms not as hierarchies but as a network or a web. Concepts are thought of as nodes, and relationships branch out from them. The relationships generally go beyond the standard BT, NT, and RT. They may include specific whole-part, cause-effect, or parent-child relationships. The most noted semantic network is Princeton University's WordNet, which is now used in a variety of search engines.

Ontologies - the newest label to be attached to some knowledge organization systems. The knowledge management community is developing ontologies as specific concept models. They can represent complex relationships among objects, and include the rules and axioms missing from semantic networks. Ontologies that describe knowledge in a specific area are often connected with systems for data mining and knowledge management.