UMBEL - Annex G

From UMBEL Wiki
Jump to navigation Jump to search

__NOEDITSECTION__

UMBEL Annex G: UMBEL SuperTypes Documentation

UMBEL Annex Document - 10 May 2016

Latest version
http://techwiki.umbel.org/index.php/UMBEL_-_Annex_G
UMBEL Logo
Last update
$Date: 2016/5/10 9:22:47 $
Version
Version No.: 1.50
Volume
TR 16-5-10-G
Authors
Michael Bergman - Structured Dynamics
Frédérick Giasson - Structured Dynamics

Structured Dynamics Logo

UMBEL: Upper Mapping and Binding Exchange Layer by Structured Dynamics LLC is provided under the
Creative Commons Attribution 3.0 license. See the attribution section for how to cite the effort.

Creative Commons License

Copyright © 2009-2016 by Structured Dynamics LLC.

Beginning with UMBEL version 1.20, statistics regarding numbers of reference concepts (RCs) in the ontology and splits between SuperTypes (STs) and modules have been moved to the statistics Annex Z document. As a result, earlier statistics in this and other annexes are no longer being updated, which means any statistics cited below may be out of date. Please consult Annex Z for the current UMBEL statistics.
Major revisions were made to the SuperTypes in UMBEL version 1.50. For the specifics, see the Updates section below. For prior statistics and discussion, see UMBEL - Annex G 20160510.

UPDATES

As UMBEL has evolved and been used in commerce, consistency and coherency of the knowledge graph have come to be of paramount importance. Misassignments undercut the coherency of the system and lower the ability of UMBEL to be used in a computable manner.

Further, the large size and need for frequent updates also place a premium on an UMBEL system that can be built automatically from rather simple input specifications.

The initial introduction of SuperTypes to UMBEL began to show the way to a more systematic way to build and organize the system. Early build routines were first introduced in UMBEL version 1.10 (see UMBEL - Annex K). By version 1.50, the complete system was based on these build routines and standard input files.

SuperType assignments and checks are an integral part of these build routines, which were not sufficiently developed until version 1.50. The key changes that were introduced in version 1.50 included the following:

  • Removal of all instance or individual listings from UMBEL. The Reference Concepts (RCs) that remain in UMBEL are now all classes (OpenCyc was used to help make these determinations). This change does NOT affect the punning used in UMBELS's design (see Metamodeling in Domain Ontologies)
  • Corrections and re-alignments of some prior SuperTypes. In version 1.50, these SuperTypes were eliminated:
    1. Earthscape -- which was re-assigned to Forms and LocationPlaces
    2. Extraterrestrial -- which was re-assigned to AreaRegion and LocationPlaces
    3. Notations and Numbers -- which were moved to a new category of shared Reference Concepts.
  • These corrections led to the introduction of a number of new STs:
    1. AreaRegion - this better generalized earlier problems with spatial relations
    2. AtomsElements - this better set aside concepts shared by other STs
    3. BiologicalProcesses - this better called out shared processes by living things
    4. Forms - this emerged as an important spatial construct
    5. LocationPlaces - this better generalized earlier problems with spatial relations
    6. OrganicChemistry - a split of the prior Chemistry ST, which better aligns with the split between living and inanimate matter
    7. Shapes - though largely shared (and non-disjoint), this new ST captures a very common characteristic of all physical objects
    8. Situations - an important ST overlooked in prior efforts that helps to better establish context for Activities and Events
  • A typology was created for each of the disjoint STs, which enabled missing concepts to be identified and added and to better organize the concepts within each given ST; this was a major focus of effort
    1. Some additional upper-level categories were introduced to better organize the largely disjoint STs.

Finally, because of these changes, earlier designs that had been moving toward multiple modules has been replaced. It is now possible to invoke or not individual STs as a substitute for the earlier module design (see, for example, Annex J).

INTRODUCTION

This report describes the rationale for the class of SuperTypes within UMBEL and how its 34 K reference concepts (RCs) are assigned to one of a few categories of SuperTypes. This report is an update of the SuperTypes design, first introduced in version 0.80. We discuss five categories of SuperTypes below, with one category, the main disjoint category, being the most important.

The first category of SuperTypes is for non-disjoint types, mostly of a shared or buiding block nature. By design, these SuperTypes participate in little or no reasoning. Most have shared aspects across all SuperTypes. SuperTypes in this category are designed to be fully non-disjoint, and do not participate in any disjoint assertions. There are seven (7) SuperTypes in this first category, specifically Abstractions, Concepts, Conventions, Primitives, Structures, Symbols and TopicsCategories. Little further is discussed about this category below.

A second category is for the Attributes SuperType. Attributes may be assigned to any of the reference concepts (RCs) associated with any of the other SuperTypes. Attributes are thus inherently non-disjoint. Little further is discussed about this category below.

A third category is for SuperTypes that are parental types for other SuperTypes. These are largely organizational in nature for helping to keep the upper portions of UMBEL manageable. Since their children are specific SuperTypes, this parental category may be used for some minor reasoning, but is not the central focus of the overall SuperTypes design. There are nineteen (19) SuperTypes in this category, and specifically include Agents, Artifacts, AVInfo, Constituents, Eukaryotes, Information, LivingThings, Manifestations, MentalProcesses, NaturalMatter, OrganicMatter, Places, Relations, SignElements, SocialProcesses, Space, Symbolic, Systems and Time. Though used for organizational purposes below, none of these are discussed further below individually.

A fourth, somewhat special SuperType is Shapes. About half of the RCs in UMBEL have a Shapes aspect; about half do not. Thus, Shapes can be used for some disjoint analysis, but is shared widely enough to not be that useful in most circumstances. Shapes is thus kept separate from the main SuperTypes category.

The fifth and last SuperTypes category is for those that are largely disjoint with one another. This main SuperTypes category contains 31 SuperTypes, specifically including:

Activities
Animals
AreaRegion
AtomsElements
AudioInfo
BiologicalProcesses
Chemistry
Diseases
Drugs
Events
Facilities
FinanceEconomy
FoodDrink
Forms
Geopolitical
LocationPlace
NaturalPhenomena
NaturalSubstances
OrganicChemistry
Organizations
Persons
Plants
Products
Prokaryotes
ProtistsFungus
Situations
Society
StructuredInfo
Times
VisualInfo
WrittenInfo

In addition, all of these SuperTypes are clustered into 9 "dimensions" (drawn from the third category above), which are useful for aggregation and organizational purposes, but which have no direct bearing on logic assertions or disjoint testing.

SUPPORTING FILE

The master file with all SuperType assigments by Reference Concept (RC) is provided by SuperTypes.csv.

BASIS AND RATIONALE FOR THE SUPERTYPE CLASS

The assignment of UMBEL reference concepts to SuperTypes was an outgrowth of the observation that many of the concepts within UMBEL may be clustered into disjoint groupings. Most things and concepts about them are based on real, observable, physical things in the real world. Because most of these things can not occupy both the same moment in time and the same location in physical space, a useful criterion for looking at these things and concepts is disjointedness.

In a broad sense, then, we can split our concepts of the world between those ideas that are disjoint because they pertain to separable objects or ideas and those that are cross-cutting or organizational or classificatory. Attributes, such as color (pink, for example), are often cross-cutting in that they can be used to describe quite disparate things. Inherent classification schemes such as academic fields of study or library catalog systems — while useful ways to organize the world — are not themselves in-and-of the world or discrete from other ideas. Thus, classificatory or organizational concepts are inherently not disjoint.

The potential advantage of clustering into logical, disjoint groups can include:

  • A better basis for organizing a large concept space
  • Possible amenability to the use of templates for displaying similar attributes and information for similar concepts
  • Possible computational efficiency due to being able to segregate concepts into logically coherent groupings
  • Improved disambiguation by assessing concept matches in addition to entity matches via triangulation between the two assessments
  • Structure and integrity testing.

Any classificatory scheme has a degree of arbitrariness. To be useful, it must be perceived as logical and coherent and it should achieve most if not all of the potential advantages above.

Both "bottom up" (coherent clustering of related concepts) and "top down" (selecting top-level concepts and evaluating and clustering all child concepts using union, intersection or complement operators) were used to create the assignments herein. Each approach was iterated multiple times, with logic and coherence testing after every run. For example, analysis of shared parent concepts in the lineage and other structure-wide tests were employed.

Classification schemes always are subject to the tension between "lumping" and "splitting": Are three groupings too few? A hundred groupings too many? This tension is also compounded by the possible sense of arbitrary boundaries, such as why "Drugs" gets its own category and not "Toys"?

Classical taxonomists and other classifiers have always attempted to achieve "natural" classification systems. Based on the best information available, is the assignment of one item to Group A more defensible than it is to Group B? New knowledge or perceptions, such as the immense impact of genetics on classical systematics, can thoroughly change perceptions of what is logical and natural.

In the case of these UMBEL reference concepts, the tests employed were to find the highest degree of disjointedness while also maintaining a sense of logical coherence with the observable world. And, where non-disjointness was found, could that degree of overlap be seen as both natural and limited? For example, the SuperType of Persons is non-disjoint with Animals because persons are humans; otherwise the groups are disjoint. Similarly, Persons are non-disjoint with Organizations because some types of agents, such as MusicPerformingAgent, may be either an individual or a group.

These overlaps can be understood and can also be sought to be as minimal as possible.

DESCRIPTION OF THE SUPERTYPES

Table 1. Description and Organization of Disjoint SuperTypes

ANALYSIS OF THE SUPERTYPES

This section provides an analysis of the reference concept assignments and their possible disjointedness or overlap with other SuperTypes.

Non-disjoint (Shared) Concepts

First, the 31 SuperTypes in our mostly disjoint categories contain 87% of the UMBEL reference concepts. The remaining 13%, which by definition are classificatory or attributes, are non-disjoint (overlapping).

Here is the breakdown for the non-disjoint (overlapping) categories:

Category Count
Reference Concepts 33,565
Abstractions 2,794
Concepts 3,058
TopicsCategories 256
Shared 3,793
Upper Level 4,328
Attributes 2,794
Total Unique Non-disjoint 4,398
Table 2. Distribution of Non-disjoint SuperTypes

Disjoint Concepts

Here is the breakdown for the (mostly) disjoint (non-overlapping) 31 SuperTypes:

SuperTypes Count % of Total % of Unique Single ST % of Single ST
Activities 3,825 9.6% 12.9% 18 0.5%
Animals 8,704 21.9% 29.3% 8,206 94.3%
AreaRegion 1,090 2.7% 3.7% 398 36.5%
AtomsElements 190 0.5% 0.6% 35 18.4%
AudioInfo 196 0.5% 0.7% 107 54.6%
BiologicalProcesses 223 0.6% 0.8% 0 0.0%
Chemistry 1,128 2.8% 3.8% 554 49.1%
Diseases 583 1.5% 2.0% 5 0.9%
Drugs 779 2.0% 2.6% 418 53.7%
Events 4,404 11.1% 14.8% 87 2.0%
Facilities 1,254 3.2% 4.2% 472 37.6%
FinanceEconomy 603 1.5% 2.0% 104 17.2%
FoodDrink 1,177 3.0% 4.0% 525 44.6%
Forms 38 0.1% 0.1% 13 34.2%
Geopolitical 203 0.5% 0.7% 0 0.0%
LocationPlace 214 0.5% 0.7% 41 19.2%
NaturalPhenomena 85 0.2% 0.3% 32 37.6%
NaturalSubstances 725 1.8% 2.4% 459 63.3%
OrganicChemistry 668 1.7% 2.2% 123 18.4%
Organizations 1,390 3.5% 4.7% 1,127 81.1%
Persons 296 0.7% 1.0% 30 10.1%
Plants 2,579 6.5% 8.7% 2,477 96.0%
Products 5,531 13.9% 18.6% 4,304 77.8%
Prokaryotes 447 1.1% 1.5% 446 99.8%
ProtistsFungus 158 0.4% 0.5% 110 69.6%
Situations 697 1.8% 2.3% 24 3.4%
Society 133 0.3% 0.4% 70 52.6%
StructuredInfo 905 2.3% 3.0% 675 74.6%
Times 178 0.4% 0.6% 112 62.9%
VisualInfo 536 1.3% 1.8% 128 23.9%
WrittenInfo 869 2.2% 2.9% 187 21.5%
Total 39,808 21,287
Unique 29,718 21,287
Table 3. Quantification of Reference Concepts by SuperType

More than 21,000 RCs, or 72% of the RCs within the disjoint SuperTypes, occur within only one SuperType.

Even Where Overlaps Occur, They are Minor

Of the 31 mostly disjoint SuperTypes, only a relatively few show potential interactions, and then mostly in minor ways. Key areas of interacting overlap occur for:

  • Activities v Events
  • Within the artifact-related STs of Products, FoodDrink, Drugs and Facilities
  • Others, as shown in Bold in the right-most column of Table 1 above.

We can illustrate the degrees of this interaction using a Venn diagram for the products-related STs, with the fully disjoint category of Organization included for comparative purposes:

Example SuperTypes Overlap

Figure 3. Sample Venn Diagram of Minor SuperTypes Overlap

SuperType Typologies

Each of the 31 disjoint SuperTypes noted in Table 3 is represented by its own typology; that is, a hierarchical organization of related concepts. This design provides multiple levels and perspectives for relating to the UMBEL structure, as well as to provide a fine-grained entity resource for 31 entity types. The typologies may also be used independently on their own. Copies are provided at https://github.com/structureddynamics/UMBEL/tree/master/Typologies.

FUTURE WORK

We will continue to refine these STs with the intent of producing clean typologies, the maximum amount of disjointedness, and a logical and computable organization.

Copyright © 2009-2016 by Structured Dynamics LLC.

[[Category:ZTechWiki]][[Category:Specification]][[Category:UMBEL]]