Wikipedia Mapping - Defects and Testing

From UMBEL Wiki
Jump to: navigation, search

POSSIBLE ONTOLOGY OR MAPPING DEFECTS

Mapping defects within or between ontologies may be either syntactic or semantic. Mapping defects may arise from:

  1. Poorly specified starting ontologies
  2. Conversion of non-ontology structures to ontologies, or
  3. Import of ontologies into a base system.
  4. Differences in terms of design between linked ontologies

Though we rely on other sources, the key basis of this article is a Mindswap paper from 2005.[1]

Syntactic Defects

No further discussion is offered here. Check your base ontology(ies) with either a validator or a vetted framework such as Protégé or the OWL API [2].

Semantic Defects

Consistency

As from the Mindswap paper:

Inconsistent ontologies are those which have a contradiction in the instance data, e.g., an instance of an unsatisfiable class. They are also fairly easy for a reasoner to detect, if it can process the ontology at all. In fact, in tableau reasoners, unsatisfiability testing is reduced to a consistency test by positing that there is a member of the to be tested class and doing a consistency check on the resultant knowledge base (KB). However, unlike with mere unsatisfiable classes, an inconsistent ontology is, on the face of it, very difficult for a reasoner to do further work with. Since anything at all follows from a contradiction, no other results from the reasoner (e.g., with regard to the subsumption hierarchy) are useful.

Satisfiability

As from the Mindswap paper:

Unsatisfiable classes are those which cannot be true of any possible individual, that is, they are equivalent to the empty set (or, in description logic terms, to the bottom concept, or, in OWL lingo, to owl:Nothing). For example, class A is unsatisfiable if it is a subclass of both, class C and ¬C, since it implies a direct contradiction. Unsatisfiable concepts are usually a fundamental modeling error, as they cannot be used to characterize any individual. Unsatisfiable concepts are also quite easy for a reasoner to detect and for a tool to display. However, determining why a concept in an ontology is unsatisfiable can be a considerable challenge even for experts in the formalism and in the domain, even for modestly sized ontologies.

Coherence

Some researchers, such as Qi and Hunter,[3], have used the idea of coherence to describe whether an ontology meets these twin tests of consistency and satisfiability:

A common error for an ontology is incoherence, i.e. whether there are unsatisfiable concepts which is interpreted as an empty set in all the models of its terminology. . . . Incoherence in ontologies corresponds to inconsistency in knowledge bases in classical logic, where a knowledge base is a finite set of classical formulae. A knowledge base is inconsistent if and only if there is no model satisfying all its formulae.

Qi and Hunter also provide metrics for measuring such incoherence.

While this approach provides a testable metric, and is one we therefore use, we actually use a more inclusive sense of coherence as our ultimate test [4], with this representative quote from Tennis and Jacob:[5]

In the context of an information organization framework, a structure is a cohesive whole or ‘container’ that establishes qualified, meaningful relationships among those activities, events, objects, concepts which, taken together, comprise the ‘bounded space’ of the universe of interest.

Other Possible Issues

Here are some additional ontology issues that may arise:

  • Incomplete specifications (including definitions and labels); see further Ontology Best Practices
  • Unintended inferences (subsumption, realization relationships etc.) discovered by the reasoner
  • Missing type declarations
  • Missing domain, range or SuperType assignments
  • Possible redundancies, where the same concept is asserted in the same or nearly the same manner more than once
  • Unused atomic classes or properties with no references anywhere in system.

TESTING AND METHODS

The Mindswap paper and our testing methods involve the Protégé framework and (one of) its Pellet reasoner.

What does it mean to "test an ontology"? This is really vague, because it can be a multitude of things. Let's try to define some of the possible tests that one can apply to ontologies:

  1. Test the usability of an ontology
  2. Test the coverage of a domain ontology
  3. Test the linkage(s) of an ontology

Test the Usability of an Ontology

Testing the usability of an ontology is a social and empirical process. Different systems will use the same ontology to try to publish/ingest/process different datasets that use that same ontology. Other systems will try to present that same information in a million different ways.

Testing the usability of an ontology happens over time between the interaction of the ontology's users and developers. A social agreement should eventually emerge with a consensus that the ontology is now usable for a specific set of general or specific usecases.

Test the Coverage of a Domain Ontology

Testing the coverage of a domain ontology means that one tests if there are semantic gaps in the ontology's structure. A semantic gap in a domain ontology can be defined as a set of missing classe(s) in a domain ontology. These are classe(s) that should be in the domain ontology because they belong to the scope of that domain ontology, but are currently missing in that structure.

The best way to test the coverage of a domain ontology is to try to link other ontologies that have some overlaps with your domain ontology, and see if there are concepts/classes that cannot be linked into the tested domain ontology. If there are, then they may be candidates for inclusion in the domain ontology.

This process is also quite time consuming considering the time it takes to find good external ontologies/taxonomies that overlap with the target domain ontology, and that it takes much time to create the linkage between these structures and the target domain ontology.

Test the Linkage(s) of an Ontology

To test if the linkage between two ontologies is properly done there needs to be a coherent and satisfiable reference structure that can be used to check if the linkages are in fact coherent and satisfiable according to that structure (so, the need for a "gold standard").

One such gold standard that can be used to test ontology linkages is UMBEL and its Reference Concepts Structure.

There are at least two different things to check for ontology linkages:

  1. Checking if the linked ontologies are still coherent
  2. Checking if the linked ontologies are still satisfiable

To be able to perform these checks, one has to make sure that:

  1. The linked ontologies are also linked to the reference ontology (gold standard)
  2. The reference ontology has restrictions defined
  3. The linked ontologies are available in the same ontology file (via import statements or by merging the two ontology files) along with the actual linkages.

Once these three criteria are met, the coherency and the satisfiability of the linkage(s) can be tested using a OWL reasoner such as Pellet. Any coherency or satisfiability issues should be fixed.

The coherency and satisfiability errors will most than likely pin-point linkage issues. These could arise for one of these reasons:

  1. The link does not link the proper classes
  2. The restrictions of the reference ontology are too strict or wrong

Once all such tests are resolved, it means the linkage between the two ontologies is coherent and satisfiable according to the gold standard employed. This assurance means that if you create individuals belonging to any class of the any of the two linked ontologies, then the instantiation of the individual will not cause either ontology to become incoherent.

Please note, however, that such tests and their assurances means there are no "errors" in the linkage. These tests are just tools that help you pin-point potential issues in your linkage. A linkage can be coherent and satisfiable according to the criteria above, but may still have conceptual or semantic errors because of gaps in the reference gold standard ontology.

In the UMBEL Reference Concepts Structure, the restrictions are created at the level of the Super Types structure.

ENDNOTES

  1. Aditya Kalyanpur, Bijan Parsia, Evren Sirin and James Hendler, 2005. "Debugging Unsatisfiable Classes in OWL Ontologies," which is an extended version of the papers presented at the WWW’05 Conference ('Debugging OWL Ontologies') and the DL’05 Workshop ('Black Box Debugging of Unsatisfiable Concepts’). See http://www.mindswap.org/papers/debugging-jws.pdf
  2. The OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL. Two recent papers describing the updated API are: Matthew Horridge and Sean Bechhofer, 2009. “The OWL API: A Java API for Working with OWL 2 Ontologies,” presented at OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009. See http://www.webont.org/owled/2009/papers/owled2009_submission_29.pdf; and, Matthew Horridge and Sean Bechhofer, 2010. “The OWL API: A Java API for OWL Ontologies,” paper submitted to the Semantic Web Journal; see http://www.semantic-web-journal.net/sites/default/files/swj107.pdf. Also see its code documentation at http://owlapi.sourceforge.net/2.x.x/documentation.html.
  3. Guilin Qi and Anthony Hunter, 2007. "Measuring Incoherence in Description Logic-based Ontologies," in The Semantic Web, 6th International Semantic Web Conference (ISWC’07), volume 4825 of LNCS, 381–394, Springer. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.338&rep=rep1&type=pdf.
  4. M. K. Bergman, 2008. "When is Content Coherent," posting from the AI3:::Adaptive Information blog, July 25, 2008, see http://www.mkbergman.com/450/when-is-content-coherent/.
  5. Joseph T. Tennis and Elin K. Jacob, 2008. “Toward a Theory of Structure in Information Organization Frameworks,” presentation at the 10th International Conference of the International Society for Knowledge Organization (ISKO 10), in Montréal, Canada, August 5th-8th, 2008. See http://www.ebsi.umontreal.ca/isko2008/documents/abstracts/tennis.pdf.