UMBEL - Annex C 20120521

From UMBEL Wiki
Jump to: navigation, search
UMBEL Annex C: Best Practices using UMBEL

UMBEL Annex Document - 21 May 2012

Latest version
http://techwiki.umbel.org/index.php/UMBEL_-_Annex_C
UMBEL Logo
Last update
$Date: 2012/5/21 16:28:36 $
Version
Version No.: 1.05
Volume
TR 12-5-21-C
Authors
Michael Bergman - Structured Dynamics
Frédérick Giasson - Structured Dynamics

Structured Dynamics Logo

UMBEL: Upper Mapping and Binding Exchange Layer by Structured Dynamics LLC and Ontotext AD is provided under the Creative Commons Attribution 3.0 license. See the attribution section for how to cite the effort.

Creative Commons License

Ontotext Logo

Copyright © 2009-2012 by Structured Dynamics LLC and Ontotext AD.

Terminology Note

As of version 0.80, 'Reference Concept' (RefConcept) has replaced the notion of 'Subject Concept' (SubjectConcept). Historical documentation may still use the older term and some use is kept in current documentation for continuity reasons. Please treat the two terms as synonomous.

INTRODUCTION

The UMBEL Reference Concept ontology tries to follow a series of best practices. These same practices, listed below, are also recommended when constructing a domain ontology based on the UMBEL Vocabulary.

NAMING PRACTICES

  • Name all concepts as single nouns. Use CamelCase notation for these classes (that is, class names should start with a capital letter and not contain any spaces, such as MyNewConcept)
  • Name all properties as verb senses (so that triples may be actually read); e.g., hasProperty. Try to use mixedCase notation for naming these predicates (that is, begin with lower case but still capitalize thereafter and don't use spaces)
  • Try to use common and descriptive prefixes and suffixes for related properties or classes (while they are just labels and their names have no inherent semantic meaning, it is still a useful way for humans to cluster and understand your vocabularies). For examples, properties about languages or tools might contain suffixes such as 'Language' or 'Tool' for all related properties
  • Enable multi-lingual capabilities in all definitions and labels. This is a rather complicated best practice in its own right. For the time being, it means being attentive to the xml:lang="en" (for English, in this case) property for all annotation properties and non-annotation properties.

PROPERTIES

Remember the special mapping and reference roles that UMBEL-based ontologies play. As a result, provide inverse properties where it makes sense, and adjust the verb senses in the predicates to accommodate. For example, <Father> <hasChild> <Janie> would be expressed inversely as <Janie> <isChildOf> <Father>

DEFINITIONS

Recall that reference concepts in UMBEL or based on the UMBEL Vocabulary are referents to real-world concepts with exact definitions and instance members, the combination of which (through both intension and extension[1]) defines what the concept means. The label for the concept is merely a useful handle to the class membership and meaning of that concept, and has no further meaning in and of itself.

As a result, give all concepts and properties a definition. The matching and alignment of things is done on the basis of concepts (not simply labels) which means each concept must be defined.[2]

Providing clear definitions (along with the coherency of its structure) gives an ontology its semantics. Remember not to confuse the label for a concept with its meaning. (This approach also aids multi-linguality). The UMBEL Vocabulary recommends the use of the property skos:definition, though others such as rdfs:comment or dc:description are also commonly used.

OTHER COMMON ANNOTATIONS

  • Provide a preferred label annotation property that is used for human readable purposes and in user interfaces. For this purpose, the UMBEL Vocabulary recommends the use of the skos:prefLabel property. Only ONE preferred label may be used per language variant
  • Provide a robust set of alternative labels (see SEMSET section next). For this purpose, the UMBEL Vocabulary recommends the use of the skos:altLabel property
  • Provide a robust set of misspellings for the reference concept (see SEMSET section next). For this purpose, the UMBEL Vocabulary recommends the use of the skos:hiddenLabel property
  • If there are authoritative sources of external information, specifically including Wikipedia, provide a rdfs:seeAlso annotation and link to the external URL
  • As noted above, provide language tags for all annotations.

SEMSETS

Since the UMBEL Reference Concept ontology and domain ontologies based on the UMBEL Vocabulary are used for reference and tagging purposes, it is essential that the ontology specification be sufficiently robust to inform information extraction systems.[3]

This purpose is served by including explicit consideration for the idea of a “semset” to accompany each reference concept. The semset construct is a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept. The semset construct is similar to the "synsets" in Wordnet, but with a broader use understanding.

Included in the semset construct is the single (per language) preferred (human-readable) label for the concept, the prefLabel, an embracing listing of alternative phrase and terms for the concept (including acronyms, synonyms, and matching jargon), the altLabels, and a listing of prominent or common misspellings for the concept or its alternatives, the hiddenLabels.

Contemporary jargon or slang as may be drawn from Web tagging or folksonomies. (For example, Web 2.0, Web 20, web20, web_20, web-20, etc., can be expanded variants.) The construct of semsets may also apply to named entities . In this case, their use is closer to the sense of an alias (such as nicknames, or "great Satan" or "uncle Sam" for the "United States").

There is no limit to the number of altLabels or hiddenLabels a semset may have.

One important feature of the semset construct is its relation to a language. For each language, the parts of the semset construct should be labeled specifically for that language. The construct may then be replaced for alternative languages. This means that a semset can be seen as a bag of related labels and that this bag of labels is related to a language. The goal is to relate a semset construct to a language instead of relating each label to the proper datatype. This means that all contributing members to a semset construct should have a relation to the lexvo[4] instance that describes the current language.

semset Example

Here is an example of the parts to a semset construct related to the Project reference concept, rc:Project.

  skos:prefLabel """project"""@en ;
  skos:altLabel """projects"""@en ;
  skos:altLabel """undertakings"""@en ;
  skos:altLabel """undertaking"""@en ;
  skos:altLabel """enterprises"""@en ;
  skos:altLabel """enterprise"""@en ;
  skos:altLabel """programs"""@en ;
  skos:altLabel """program"""@en ;
  skos:altLabel """programme"""@en ;
  skos:hiddenLabel """programz"""@en ;
  skos:hiddenLabel """projeckt"""@en .
Table 1. semset Example

OTHER SOURCES OF BEST PRACTICES

See further the TechWiki Ontology Best Practices for a broader set of guidances.

ENDNOTES

  1. The intension of a concept is provided by its definition as the described scope and coverage; the extension of a concept is the set of its members, which collectively imply its scope and coverage. Use of both approaches is encouraged and allowed in the idea of a "reference concept".
  2. As another commentary on the importance of definitions, see http://ontologyblog.blogspot.com/2010/09/physician-decries-lack-of-definitions.html.
  3. http://en.wikipedia.org/wiki/Information_extraction
  4. http://lexvo.org/
Copyright © 2009-2012 by Structured Dynamics LLC and Ontotext AD.