ESpotter is supported by the Dot.Kom project.
Team:
Dr.
Jianhan Zhu
Dr. Victoria
Uren
Prof. Enrico Motta
Related
Projects:
Magpie
Talks:
KMi Internal Talk: (June
14th 2005)
ESpotter:
A Domain and User Adaptation Approach for Named Entity Recognition on the Web
Abstract: Named entity recognition (NER)
systems are commonly designed with a "one-size-fits-all"
philosophy. Lexicons and patterns manually crafted or learned from a training
set of documents are applied to any other document without taking into
account its background and user needs. However, when applying NER to Web
pages, due to the diversity of these Web pages and user needs, one size
frequently does not fit all. In this talk, I present a system called
ESpotter, which improves NER on the Web by adapting lexicons and patterns to
domains on the Web and user preferences. My results show that ESpotter
provides more accurate and efficient NER on Web pages from various domains
than current NER systems. ESpotter is implemented as a browser plug-in to
help solve the information overload problem on the Web by discovering
relevant information on user's behalf. Further work of integrating ESpotter
with ontology based semantic browsing tool, Magpie, and the KMi semantic Web
site are explored.
Keywords: Named entity recognition, information extraction, hierarchies.
Talk slides
Papers:
Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter:
Adaptive Named Entity Recognition for Web Browsing. To appear in Proc. of
Workshop on IT Tools for Knowledge Management Systems at WM2005 Conference, Kaiserslautern, Germany, April 11-13, 2005.
Demos:
Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter:
A Prototype System for Adaptive Named Entity Recognition Supporting Web
Browsing. The Fifteenth ACM Conference on Hypertext and Hypermedia
(Hypertext'04), Santa Cruz,
USA, August
9-13, 2004.
Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter: A Prototype System for
Adaptive Named Entity Recognition Supporting Web Browsing. The Fourteenth
International Conference on Knowledge Engineering and Knowledge Management
(EKAW'2004), Whittlebury Hall, Northamptonshire, UK, October 5-8, 2004.
Download
ESpotter as a .NET Windows Application:
You can simply click one
button to extract entities of various types, e.g., "Open
University" as an organization and "Enrico Motta" as a person,
from documents. You can select one or multiple documents in plain text format
or html format and save the recognized entities in an XML file for further
processing.
The tool is based on the .NET framework and can be downloaded.
Run the ESpotter.msi file to install (you may need to install .net framework
1.0). The installation will create a shortcut for an ESpotter executable file
on your desktop. One example XML output as follows shows entities of various
types and their word offsets in a document.
<?xml version="1.0"
encoding="utf-8" standalone="yes"?>
<ESpotter-Processed-Documents
corpusSize="284">
<Document id="0">
<has-directory>D:\test.xml</has-directory>
<has-url>D:\test.xml</has-url>
<has-document-size>284</has-document-size>
<mentions-location>
<instance content=" Australia
" pos="108" />
</mentions-location>
<mentions-organization>
<instance content=" Monash
University " pos="132" />
</mentions-organization>
<mentions-person>
<instance content="Larry
Stillman" pos="130" />
</mentions-person>
<mentions-research-area>
<instance
content="network" pos="238"
alias="TechnologiesCommunity Informatics Research Network" />
</mentions-research-area>
<pn>
<instance content="ICT"
pos="22" />
</pn>
</Document>
</ESpotter-Processed-Documents>
ESpotter uses an MS Access database file
ESpotterResources.mdb to store lexicon and pattern information. Currently
ESpotter recognize People, Organization, Location, Research Area, Email,
Telephone, Postal Code, and other Proper Names. You can easily customize the
lexicon and patterns in ESpotterResources.mdb file to recognize any type of
entities you are interested in by adding new lexicon and patterns. Lexicon
and patterns are grouped into different tables. When you add new lexicon or
patterns, you can create a new table, and register the new table in the
TableSchema table. New entity types need to be registered in the TypeSchema
table. Using precision for domain adaptation is not used in the version of
ESpotter and can be ignored in the database file.
For developers interested in ESpotter, the installation
includes an DLL file ESpotterClass.dll for easy inclusion in a .NET
application for language engineering. An example is given in the Class1.cs
file. More info on using ESpotter for development is coming soon.
|
|
|