UMBEL - Annex L 20160510

From UMBEL Wiki
Jump to: navigation, search
UMBEL Annex L: Attributes Ontology and Version 1.20

UMBEL Annex Document - 20 April 2015

Latest version
http://techwiki.umbel.org/index.php/UMBEL_-_Annex_L
UMBEL Logo
Last update
$Date: 2015/4/20 14:28:36 $
Version
Version No.: 1.20
Volume
TR 12-5-21-L
Authors
Michael Bergman - Structured Dynamics
Frédérick Giasson - Structured Dynamics

Structured Dynamics Logo

UMBEL: Upper Mapping and Binding Exchange Layer by Structured Dynamics LLC is provided under the
Creative Commons Attribution 3.0 license. See the attribution section for how to cite the effort.

Creative Commons License

Copyright © 2009-2015 by Structured Dynamics LLC.

Beginning with UMBEL version 1.20, statistics regarding numbers of reference concepts (RCs) in the ontology and splits between SuperTypes (STs) and modules have been moved to the statistics Annex Z document. As a result, earlier statistics in this and other annexes are no longer being updated, which means any statistics cited below may be out of date. Please consult Annex Z for the current UMBEL statistics.

This annex describes the process followed in the creation of UMBEL version 1.20, with particular reference to the new addition of the UMBEL Attributes Ontology.

The UMBEL Attributes Ontology is a module of UMBEL explicitly designed to provide a basis for mapping attributes (properties) into UMBEL. In this regard, the AO acts in a similar manner to the concepts mappings framework in UMBEL.

General Steps

  1. We followed the general approach of using the UMBEL generator as discussed in Annex K
  2. After initial construction of the Attributes Ontology (see below), the complete pool of candidate reference concepts (RCs) was assembled, including an analysis of dropped and changed concepts in OpenCyc between UMBEL versions 1.05 and 1.10. The construction of the Attributes Ontology also led to new RCs being identified and specified
  3. All candidate concepts were compared to the existing OpenCyc ontology, with OpenCyc IDs assigned. This effort resulted in two major lists being compiled:
    • A correspondence input file of UMBEL RCs with matching OpenCyc IDs
    • An input file of new RCs for UMBEL, which were characterized according to ID, preferred label, alternate labels, definitions and comment notes.
  4. All resulting RCs were then inspected for the assignment of SuperTypes. In the process, these changes were made:
    1. The MarketsIndustries SuperType was removed, and combined into the Attributes listing
    2. The Workspace SuperType was removed, with all entries combined into the Facilities SuperType
    3. A new Entities category was defined (see Entities section below) and all RCs were inspected as to whether they fit into this category or not
    4. After multiple iterations, all RCs were assigned to one or more SuperTypes
  5. The Geo module (see Annex J was inspected and 61 RCs were moved to UMBEL core from the earlier Geo module. Two separate listings were produced for the generator:
    • A listing of core Geo RCs
    • The listing of remaining Geo concepts (from both the Earthscape and Geopolitical SuperTypes) that were to be assigned to the Geo module
  6. A new Entities module was created (see Entities section)
  7. Then, all RCs were assigned to either either Core or to the Geo, Attributes Ontology or Entities modules
  8. The UMBEL structure was generated, including coherency and consistency checks
  9. One output is the inferred assignments to SuperTypes, a sort of "top down" view of the structure
  10. The new RC SuperType assignments were compared with original manual ones; we used differences to identify further structure improvements
  11. We then reviewed and made modifications (potentially repeating any of the steps above).

Specifics on the Attributes Ontology

The creation of the Attributes Ontology from the existing UMBEL followed these steps:

  1. All existing UMBEL RCs were inspected for whether a given concept was an "attribute" or not. Attributes were understood to be the ways that entities may be described or characterized. Attributes are NOT the relationship between concepts, but the descriptive properties for individual data records. An attribute includes any descriptive property that might be included in an infobox for a given entity type
  2. Possible attribute values were explicitly understood to include literal values and members of object types
  3. New Attribute RCs were identified
  4. UMBEL was then loaded into Protege and all non-attribute RCs were systematically removed from the listing. This step resulted in a core listing of existing UMBEL concepts that were also of the Attributes SuperType. Note that ALL Attribute RCs were designated to be included in the separate Attributes Ontology module
  5. Over a period of months, this listing was massaged and organized through many versions. The intent was to organize the attributes into like categories, as informed by OpenCyc. In some cases, the organizational concepts were missing in OpenCyc, in which case a new concept was added to UMBEL for this purpose
  6. At the same time, incomplete sets of like attributes were identified and missing members were added from OpenCyc if they were not already in the UMBEL listing. Further patterned instances helped hone
  7. After multiple reviews and revisions, a working Attributes Ontology was the result. This ontology was then saved and converted to a CSV file according to this recipe
  8. The CSV file was manually inspected and cleaned up, and then split into two parts, to provide the input into Steps #7 and #8 in the previous section above
  9. We are in the process of mapping external properties in schema.org and DBpedia to the Attributes Ontology.

This process resulted in about 2000 attributes now managed by this module.

Specifics on Entities and the Entities Module

Entities refer to the notable things that are contained within the current domain at hand. Entities, which are often named individuals, are grouped together into similar sets or types. Automobile products or astronauts are two examples of an entity type. Entities are often classified into the people, places, organizations or miscellaneous groupings, but entities can be typed more finely. One of the purposes of UMBEL is to enable a rich entity typing.

In order to fulfill this purpose, the various entity RCs in UMBEL needed to be identified, following these steps:

  1. Review all reference concepts (RCs) in UMBEL and assign an entity designator to all appropriate RCs
    • NOTE: Most all items in the pre-existing Geo module are entities
  2. Split entities into a separate module (the Geo module is used for those entities). In general, entity types were retained in UMBEL core, while specific instances (mostly named entities) were assigned to the Entities module.

This process resulted in about 20 K entities being identified for v.1.20, with about 9 K assigned to the Entities module.

Entities are an orthogonal characterization to the SuperTypes in UMBEL. Used in combination, Entities of specific SuperTypes can be filtered, retrieved and analyzed independently using this structure.

Future Notes

Some notes or additional activities that appear warranted are:

  • Complete the mapping of external properties (schema.org and DBpedia) to the Attributes Ontology
  • Need a ST Scaffold view
  • Improve AltLabel matches by using Wikipedia
  • Perform API calls against OpenCyc to find missing object type and attribute type members
  • TopicsCategories remains a disconnected ST
  • Add new section on UMBEL site that shows possible ST overlaps (e.g., at http://umbel.org/super-type/?uri=FoodDrink).


Copyright © 2009-2015 by Structured Dynamics LLC.