Wikipedia Mapping - Other Options

From UMBEL Wiki
Jump to: navigation, search

Use of redirects to improve UMBEL semsets.

First Sentence Processing

Sentence start stop list:

a
about
after
although
an
as
be
before
down
figure
first
fourth
he
if
is
it
no
none
note
on
one
once
our
over
put
second
she
since
so
table
that
the
then
these
third
those
though
thorought
three
through
to
two
up
upon
using
we
XXXing

Links and Surface Forms

Mihalcea stuff [[link|surface form]]

Natase and Strube: analyzing "decoding" categories

  • Similarity
  • Disambiguation pages that also have an article page
  • Equivalence
  • Categories and Articles that share the same concept

Semantic Graphs

  • semantic graph of doc
  • semantic graph of abstract (long/short)
  • semantic graph of links
  • semantic graph of categories
  • Keyphrasedness
  • Inverse Wikipedia frequency (IWF)
  • Total Wikipedia keyphrasedness (TWK)
  • Semantic Relatedness
  • Node Degree
  • nodes that share common parents are mostly related
  • UMBEL Analysis
  • Gazetteer generation
  • Concept tagging/annotation
  • Categorization
  • Named entity recognition (NER)
  • Word sense disambiguation (WSD)

Possible Added Relations?

FoundIn (YAGO) Using technique
Describe (YAGO)
LinkInCount
LinkOutCount

Eponymous Categories

From the Category:Eponymous_categories on Wikipedia

Re-tagging Content

By analyzing category and other structure, pages can be enhanced with more structural metadata. Some of the earlier removed categories are re-analyzed with metadata added to the articles to capture their assignments.

Additional Metadata

  • Dates
  • Countries
  • Suffixes

Types

  • Use of "lists of..." pages
    • ~ 18710 in DBpedia

Alternative Mapping Approaches/Algorithms