UMBEL - Annex L 20160510
UMBEL Annex Document - 20 April 2015
- Latest version
- Last update
- $Date: 2015/4/20 14:28:36 $
- Version No.: 1.20
- TR 12-5-21-L
- Michael Bergman - Structured Dynamics
- Frédérick Giasson - Structured Dynamics
| UMBEL: Upper Mapping and Binding Exchange Layer by Structured Dynamics LLC is provided under the
Creative Commons Attribution 3.0 license. See the attribution section for how to cite the effort.
This annex describes the process followed in the creation of UMBEL version 1.20, with particular reference to the new addition of the UMBEL Attributes Ontology.
The UMBEL Attributes Ontology is a module of UMBEL explicitly designed to provide a basis for mapping attributes (properties) into UMBEL. In this regard, the AO acts in a similar manner to the concepts mappings framework in UMBEL.
- We followed the general approach of using the UMBEL generator as discussed in Annex K
- After initial construction of the Attributes Ontology (see below), the complete pool of candidate reference concepts (RCs) was assembled, including an analysis of dropped and changed concepts in OpenCyc between UMBEL versions 1.05 and 1.10. The construction of the Attributes Ontology also led to new RCs being identified and specified
- All candidate concepts were compared to the existing OpenCyc ontology, with OpenCyc IDs assigned. This effort resulted in two major lists being compiled:
- A correspondence input file of UMBEL RCs with matching OpenCyc IDs
- An input file of new RCs for UMBEL, which were characterized according to
- All resulting RCs were then inspected for the assignment of SuperTypes. In the process, these changes were made:
MarketsIndustriesSuperType was removed, and combined into the
WorkspaceSuperType was removed, with all entries combined into the
- A new
Entitiescategory was defined (see Entities section below) and all RCs were inspected as to whether they fit into this category or not
- After multiple iterations, all RCs were assigned to one or more SuperTypes
Geomodule (see Annex J was inspected and 61 RCs were moved to UMBEL core from the earlier Geo module. Two separate listings were produced for the generator:
- A listing of core
- The listing of remaining Geo concepts (from both the Earthscape and Geopolitical SuperTypes) that were to be assigned to the Geo module
- A listing of core
- A new Entities module was created (see Entities section)
- Then, all RCs were assigned to either either Core or to the Geo, Attributes Ontology or Entities modules
- The UMBEL structure was generated, including coherency and consistency checks
- One output is the inferred assignments to SuperTypes, a sort of "top down" view of the structure
- The new RC SuperType assignments were compared with original manual ones; we used differences to identify further structure improvements
- We then reviewed and made modifications (potentially repeating any of the steps above).
Specifics on the Attributes Ontology
The creation of the
Attributes Ontology from the existing UMBEL followed these steps:
- All existing UMBEL RCs were inspected for whether a given concept was an "attribute" or not. Attributes were understood to be the ways that entities may be described or characterized. Attributes are NOT the relationship between concepts, but the descriptive properties for individual data records. An attribute includes any descriptive property that might be included in an infobox for a given entity type
- Possible attribute values were explicitly understood to include literal values and members of object types
AttributeRCs were identified
- UMBEL was then loaded into Protege and all non-attribute RCs were systematically removed from the listing. This step resulted in a core listing of existing UMBEL concepts that were also of the
AttributesSuperType. Note that ALL
AttributeRCs were designated to be included in the separate
- Over a period of months, this listing was massaged and organized through many versions. The intent was to organize the attributes into like categories, as informed by OpenCyc. In some cases, the organizational concepts were missing in OpenCyc, in which case a new concept was added to UMBEL for this purpose
- At the same time, incomplete sets of like attributes were identified and missing members were added from OpenCyc if they were not already in the UMBEL listing. Further patterned instances helped hone
- After multiple reviews and revisions, a working
Attributes Ontologywas the result. This ontology was then saved and converted to a CSV file according to this recipe
- The CSV file was manually inspected and cleaned up, and then split into two parts, to provide the input into Steps #7 and #8 in the previous section above
- We are in the process of mapping external properties in schema.org and DBpedia to the Attributes Ontology.
This process resulted in about 2000 attributes now managed by this module.
Specifics on Entities and the Entities Module
Entities refer to the notable things that are contained within the current domain at hand. Entities, which are often named individuals, are grouped together into similar sets or types. Automobile products or astronauts are two examples of an entity type. Entities are often classified into the people, places, organizations or miscellaneous groupings, but entities can be typed more finely. One of the purposes of UMBEL is to enable a rich entity typing.
In order to fulfill this purpose, the various entity RCs in UMBEL needed to be identified, following these steps:
- Review all reference concepts (RCs) in UMBEL and assign an entity designator to all appropriate RCs
- NOTE: Most all items in the pre-existing Geo module are entities
- Split entities into a separate module (the Geo module is used for those entities). In general, entity types were retained in UMBEL core, while specific instances (mostly named entities) were assigned to the Entities module.
This process resulted in about 20 K entities being identified for v.1.20, with about 9 K assigned to the Entities module.
Entities are an orthogonal characterization to the SuperTypes in UMBEL. Used in combination, Entities of specific SuperTypes can be filtered, retrieved and analyzed independently using this structure.
Some notes or additional activities that appear warranted are:
- Complete the mapping of external properties (schema.org and DBpedia) to the Attributes Ontology
- Need a ST Scaffold view
- Improve AltLabel matches by using Wikipedia
- Perform API calls against OpenCyc to find missing object type and attribute type members
- TopicsCategories remains a disconnected ST
- Add new section on UMBEL site that shows possible ST overlaps (e.g., at http://umbel.org/super-type/?uri=FoodDrink).