MnM User Manual


Welcome to MnM2's world!
You wanted the Semantic Web, now you need the tools to deal with it!
Welcome to MnM2's world!
Thank you for your interest in this application.

Table of Contents

0 Introduction

1 Getting Started
1.1 How to Install
1.2 How to Run
1.3 How to Configure: build.xml
1.4 How to Configure: MnM
1.4.1 General Preferences
1.4.2 Browser Preferences
1.4.3 IE Engine Plugin Preferences
1.4.4 I/O Plugin Preferences

2 First Run - An Example

3 The Document Browser Window
3.1 Opening files
3.2 Saving files

4 Browsing the Ontology
4.1 Choosing the Ontology
4.1.1 Ontology on Server
4.1.2 Ontology on URL
4.2 Creating a New KB
4.3 Loading a KB
4.4 Browse it
4.4.1 Ontology Viewer
4.4.2 Instance Viewer
4.4.3 Information Viewer

5 Marking-up the Document
5.1 Adding and Removing Tags
5.2 Marking-up Lists
5.3 Saving Marked-up Files

6 Populating the Ontology
6.1 Manual Population
6.1.1 Adding New Instances
6.1.2 Modifying Instances
6.2 Semi-Automatic Population
6.2.1 Adding a single tag to a set of instances
6.2.2 Importing tags to a set of instances
6.3 Automatic Population

7 Integration with Information Extraction Plugins
7.1 Learning
7.2 Extracting
7.2.1 What to do with the results
7.3 Background Learning
7.4 Background Extraction

8 Customization
8.1 Customizing Icons
8.3 Customizing Skins

9 What's Next

10 Troubleshooting

11 Contacts


0 Introduction

MnM is an annotation tool which provides both automated and semi-automated support for annotating web pages with semantic contents. MnM integrates a web browser with an ontology editor and provides open APIs to link to ontology servers and for integrating information extraction tools.


1 Getting Started

This application requires Java 1.4.1 (or higher) and Ant 1.5.1 (or higher) in order to run properly.

Java is not provided, it is available for download from Sun website.
Ant is not provided, it is available for download from Ant website.

Other required packeges (provided in the zip file):
- Jena-1.5.0;
- JTidy;
- Kunststoff Look&Feel;
- SkinLF;
- tools.jar (belongs to j2sdk's lib directory).

Note: this application has been tested only under Windows2000.

1.1 How to Install

Step1: download and install j2sdk following the instructions provided by Sun. Remember to set the environment variable JAVA_HOME;
Step2: download and install Ant following the instructions provided by the Apache Ant Project. Unless you (or your system administrator) are a security freak and you need a signature (PGP or MD5) for everything you download, you can simply grab the ZIP file. According to the Ant manual it is recommended to install Ant in a short path (eg.: C:\Ant). Remember to set the environment variable ANT_HOME;
Step3: download and unzip MnM2.zip in C:\Program Files\.

Note:When entering the path for JAVA_HOME or ANT_HOME just remember to specify only the base directory used for the installation and not the path for the binary files (bin).
Tip1: to set environment variables under Windows2000 select Start>Settings>Control Panel. From the Control Panel double-click on the System icon. Select Environment Variables... from the Advanced tab. A new dialog will appear. Choose New... from the System Variables group and insert the required information.
Tip2: to set environment variables under Linux edit the file .bashrc in your home directory adding the following line:
      export VARIABLE_NAME=<variable_path>;

1.2 How to Run

To execute the application under Windows2000 there are different options available:
Option1: open the Command Prompt, go to the directory where MnM has been installed and type "MnM2";
Option2: open Windows Explorer, go to the directory where MnM has been installed and double-click on the "MnM2.bat" icon;
Option3: follow Option2 and instead of a double-click on "MnM2.bat" perform a right-click on it and create a shortcut, so that you can place it on the desktop.

To execute the application under Linux:
- open the console, go to the directory where MnM has been installed and type "ant -buildfile Java\mnm2.xml"

1.3 How to Configure: mnm2.xml

In order to increase the speed of the application allow the Java Virtual Machine to use more memory - at least half of the RAM - certain plugins are very resource hungry!. You can accomplish this by editing the file MnM2/Java/mnm2.xml and changing the value of the option maxmemory in the execute target (to eg: 32m, 64m,  ...).
You can fine tune the way MnM behaves by modifying the configuration file MnM2/Java/mnm2.xml according to the comments included in the file itself and the documentation you can find in the Ant website.

1.4 How to Configure: MnM

During your first run with MnM it is essential that you spend some time to set-up the environment you are going to work with. To do this just open the Preference Dialog from the Settings>Preferences... menu.

1.4.1 General Preferences

In this section you can specify your settings for Directories and Look&Feel.

In the Directories tab you can define:
- Working directory: the base directory for the application;
- Plugins directory: the directory in which MnM will look for plugins, both IE Plugins and I/O Plugins;
- Scenario directory: the directory in which MnM will store the libraries created by the IE Engines.
Preference Dialog General Settings
Figure 1.4.1a: Preference Dialog for General settings with the Directories tab selected

In the Look&Feel tab you can define:
- Look&Feel directory: the directory in which MnM will look for themes and icon sets;
- Look&Feel theme: the theme to be used by MnM (Metal, Kuststoff and SkinLf skins). For more information on skins see section 8.2;
- Look&Feel icon set: the icon set to be used by MnM (Java's default icon set and various icon sets...). For more information on icons see section 8.1.
Preference Dialog LookFeel Settings
Figure 1.4.1b: Preference Dialog for General settings with the Look&Feel tab selected

Note: if you have followed our instructions about MnM's installation directory, you will not need to change the defaults here.

1.4.2 Browser Preferences

In this section you can specify your settings for the basic Web Browser provided with MnM:
- Home Page: the initial page to open when MnM starts up;
- Use Proxy: check this option if you want to use a proxy server;
- Proxy Host: the name of the proxy server;
- Proxy Port: the port of the proxy server.
Preference Dialog Browserl Settings
Figure 1.4.2: Preference Dialog for Browser settings

Note: consult your IT guru if you have trouble with this.

1.4.3 IE Engine Plugin Preferences

In this section you can define your specific settings for every IE Engine Plugin stored in the Plugins directory introduced in section 1.4.1.
MnM does not include any plugin for IE. When more IE Engine Plugins will be available, to integrate them into MnM simply drop the JAR files in the Plugins directory. If you are interested in developing your own IE Engine Plugin for MnM please refer to the MnM Developer Guide.

MnM has been tested with Amilcare an IE Engine developed by Fabio Ciravegna from the Department of Computer Science, University of Sheffield. We could not release Amilcare in the same package and under the same license as MnM. For this reason if you what to use it you will have to contact Fabio Ciravegna and ask for the version of Amilcare that he developed specifically for MnM. Once you have the file Amilcare.zip, please unpack it  in C:/Program Files/MnM2/Java/Amilcare directory and restart the application. From now on you will be able to use Amilcare within MnM.

Note: unless you are an expert in Information Extraction techniques or in the IE Engine Plugin you are using, you will not need to change the defaults here.

1.4.4 Ontology Plugin Preferences

In this section you can define your specific settings for every Ontology Plugin stored in the Plugins directory introduced in 1.4.1.
At the moment MnM is bundled with the following Ontology plugins:
- WebOnto;
- Rdf;
- Daml+Oil;
... and much more are on the way...
When more Ontology Plugins will be available, to integrate them into MnM simply drop the JAR files in the Plugins directory. If you are interested in developing your own Ontology Plugin for MnM please refer to the MnM Developer Guide.

In the current version no preferences are available for these.



2 First Run - An Example

Before diving into the explanation of the amazing features of MnM let's start with a small example that will introduce the basic functionalities of MnM.

Step0: decide what to do (populate kb, mark-up documents or both);
Step1: load an ontology (from a server or a file);
Step2: create/load a Knowledge Base;
Step3: choose the ontology and the class you want to use;
Step4: if you want to Manually populate the KB:
       4.a: right-click on the class for which you want to create a new instance and select Add new instance...;
       4.b: fill in the required fields;
       4.c: commit the new instance to the KB (OK button);
Step5: select a set of documents to annotate;
Step6: create a directory to store the mark-up documents. We call this training corpus directory;
Step7: mark-up the documents (open the class that you want to use if you haven't already done so);
       7.a: load a document;
       7.b: highlight a piece of text in the document;
       7.c: double-click on the slot/relation you want to use for the mark-up;
       7.d: repeat 7.b and 7.c until you are happy with the annotation;
       7.e: save the annotated document (in XML format) in the training corpus directory;
       7.f: repeat from 7.a to 7.e until you have marked-up all you documents;
Step8: if you want to Semi-Automatically populate the KB (this can be done at any moment during Step6):
       8.a: select one or more instances from the KB that you want to modify using the bits of annotated text;
       8.b.i: right-click on the selected instance(s) and choose Import Mark-up...;
       or
       8.b.ii: right-click on the tagged value, in the document, and choose Add to instance(s);
Step9: if you want to Automatically populate the KB:
       9.a: select an Information Extraction plugin;
       9.b.i: start the learning phase on the training corpus (you need 6~8 annotated documents for a small example or at least 30 documents for a real situation);
       or
       9.b.ii: if the class has already a library created by the IE plugin in a previous run just select it;
       9.c: select a set of non-annotated documents;
       9.d: create a directory to store the non-annotated documents. We call this test corpus directory;
       9.e: start the extraction phase on the test corpus;
       9.f: use the results of the extraction phase to populate the KB (Accept or Accept All buttons);

Now the same stuff but with pictures and funny comments!

Step0: decide what to do
Cannot help you here. Sorry...
..but as far as the example is concerned let's say we want to mark-up some of the documents in the C:\Program Files\MnM2\Archive directory. All the documents in that folder have the same subject: someone visiting something or someone else. We will then use the annotated documents to train the IE machanism provided with MnM (Amilcare), so that we can later use the library of rules and templates created by it to extract information from a set of non-annotated documents and populate our ontology.

Step1: load an ontology

Select Editor>Display, if have just started MnM, or Editor>Change Ontology..., if you were already playing with it. This brings up a dialog that will allow you to choose the ontology we are going to use for this example. Select RDF from the Urls group, click on Browse... enter the Ontologies directory and choose example_ontology.rdf. Click on the OK button to open it.
Step1: load an ontology
Figure 2a: load an ontology

Step2: create/load a Knowledge Base
Select Editor>Create KB... enter the Ontologies directory and create a Knowledge Base file called example_ontology_KB.rdf.

Step3: choose the ontology and the class you want to use
Choosing the ontology is easy, there is only one! Double-click on it to have a look at the classes that it contains. If you remember our goal, probably visiting-a-place-or-people will ring a bell. Double-click on it to reveal its slots.
Step3: choose the ontology and the class
Figure 2b: choose the ontology and the class

Step4: if you want to Manually populate the KB
       4.a: right-click on the class for which you want to create a new instance and select Add new instance...;
       4.b: fill in the required fields;
       4.c: commit the new instance to the KB (OK button);
             Step4: manually populate the KB
             Figure 2c: manually populate the KB

Step5: select a set of documents to annotate
If you remember Step0 the documents are stored in C:\Program Files\MnM2\Archive. For this example we need to annotate at least 6 documents in order to obtain decent results from the extraction phase. You can pick 6 documents at random from the ones that you can find in the Archive directory.

Step6: create a directory to store the mark-up documents (training corpus)
I shouldn't need to tell you how to create new directories so create a new one called example_visiting under C:\Program Files\MnM2\TrainingCorpus.

Step7: mark-up the documents
       7.a: load a document (File>Open...);
       7.b: highlight a piece of text in the document;
       7.c: double-click on the slot/relation you want to use for the mark-up;
       7.d: repeat 7.b and 7.c until you are happy with the annotation;
   Step7: mark-up the documents
             Figure 2d: mark-up the documents

       7.e: save the annotated document (in XML format) in the training corpus directory (File>Save As...);
       7.f: repeat from 7.a to 7.e until you have marked-up all you documents. At least 6 of them, come on it's not so hard...;

Step8: if you want to Semi-Automatically populate the KB
       8.a: select one or more instances from the KB that you want to modify using the bits of annotated text;
       8.b.i: right-click on the selected instance(s) and choose Import Mark-up...;
       or
       8.b.ii: right-click on the tagged value, in the document, and choose Add to instance(s);

Step9: if you want to Automatically populate the KB
       9.a: select an Information Extraction plugin (Settings>Select Plugin>Amilcare);
       9.b: start the learning phase on the training corpus (Action>Learn...). When you are prompted for the path of the training corpus enter C:\Program Files\MnM2\TrainingCorpus\example-visiting;
       9.c: select a set of non-annotated documents. You can use the documents that you haven't marked-up from the C:\Program Files\MnM2\Archive directory;
       9.d: create a directory to store the non-annotated documents (test corpus). What about a new folder called example_visiting under C:\Program Files\MnM2\TestCorpus;
       9.e: start the extraction phase on the test corpus (Action>Extract...);
   Step9: automatically populate the KB
       Figure 2e: automatically populate the KB

       9.f: use the results of the extraction phase to populate the KB (Accept or Accept All buttons). The results may vary according to the number of annotated documents used for the learning phase and the way those documents have been annotated. In Figure 2e you can see the sorts ot results that we achieved.


3 The Document Browser Window

The Document Browser provided with MnM is a very minimalistic one. It has some basic features such as go back, go forward,home,refresh,stop and history management. It can display TXT documents and pure HTML documents (as specified by the W3C) with no frames

3.1 Opening files

When opening a file, if an IE plugin and an ontology class with an associated library for IE are selected, MnM will perform a background extraction operation (see section 7.4). In this case the newly opened page will be augmented with some suggestions on how to mark-up the document. The user can confirm, remove or simply ignore the suggestions.

3.2 Saving files

When saving a file (that has been previously marked-up) in XML format, if an IE plugin and an ontology class with an associated library for IE are selected, MnM will perform a background learning operation(see section 7.3). This is done in order to improve the IE library associated with the selected class by adding the annotated information included in the document.

Note: by "an associated library for IE" we mean the set of rules and templates that has been created by an IE mechanism during the learning phase (see section 7.1)..


4 Browsing the Ontology

In this section you will find some information on how to use the Ontology Browser embedded in MnM to browse the ontology of your choice.

4.1 Choosing the Ontology

Before browsing an ontology you need to load one. You can load an ontology to browse every time the Ontology Browser is displayed (Editor>Display) or every time you decide you want to work with a different ontology (Editor>Change Ontology...). After selecting one of the previous commands you will be prompted for either a server or an Url for the ontology.
You can choose to access ontologies stored on a remote server, such as WebOnto, or ontologies stored locally in a file, written in RDF, Daml+Oil or OCML

4.1.1 Ontology on Server

If you choose to browse an ontology from a server you will be asked to enter the host name and host port and if the server accepts the connection you will be asked to enter a login name and password.
accessing ontologies from a server
Figure 4.1.1: accessing ontologies from a server

4.1.2 Ontology on URL

If you choose to browse an ontology stored locally you will be asked to enter the path where it resides.
accessing ontologies as a file
Figure 4.1.2: accessing ontologies as a file

4.2 Creating a New KB

Before you can populate the ontology you have to create a new Knowledge Base (Editor>Create KB...) or load an existing one.
There are two different ways to create a new Knowledge Base:
- ontology from a server: in this case you will be asked to enter some details for the new KB: ontology name, parent ontology, additional editor(s) and ontology type;
creating a new Kb from a server
Figure 4.2: creating a new Kb from a server

- ontology as a file: in this case you will be asked to enter the path where you want the new KB to be saved. If you omit the file extension, MnM will use the default one according to the format of the current ontology.

4.3 Loading a KB

To load an existing Knowledge Base select Load KB... from the Editor menu.
There are two different ways to load an existing Knowledge Base:
- ontology from a server: in this case the process is automatic and the KB associated to the selected ontology will be loaded automatically by the server;
- ontology as a file: in this case you will be asked to enter the path where the existing KB is located.

4.4 Browse it

The Ontology Browser window is composed of 5 units:
- QSearch Toolbar: this quick search facility allows the user to perform incremental searches on the Ontology Viewer (if On is selected) or on the Instance Viewer (if In is selected);
- Ontology Viewer: displays the ontology structure as a tree-like structure (Ontologies, Classes and Slots);
- Instance Viewer: displays the instances belonging to the selected class;
- Information Viewer: displays the information regarding selected ontology elements or selected instances provided by the loaded ontology;
- Status Bar: monitors the progress of background learning and background extraction (see section 7.3 and section 7.4).
the Ontology Browser
Figure 4.4: the Ontology Browser

4.4.1 Ontology Viewer

The Ontology Viewer displays the ontology structure as a tree-like structure (Ontologies, Classes and Slots).
To navigate the ontology tree just double-click on the element you want to expand. To go back one level in the ontology tree you need to click on the arrow at the bottom of the Ontology Viewer or double-click on the root element.
A right-click on a Class will popup a menu with the following options: Add new instance... (see section 6.1.1) and Available Plugins (if any), this will provide the list of all the IE Plugins that have an IE library for the class. Selecting one of the plugins from the list will initialize it and it will become the active plugin. From this moment all the learning and extraction processes will be handled by the new plugin. For more information on IE Plugins see section 7.
A double-click on a Slot will mark-up the document currently displayed in the Web Browser adding a tag, unique to the Slot, to the highlighted piece of text (see section 5.2).

In the Ontology Viewer a Class might have different icons depending on whether or not it has some IE library for the active IE Plugin:
- red icon: the class has no IE library associated with it or there is no active IE Plugin;
- green icon: the class has an IE library associated with it belonging to the active IE Plugin, but the plugin developer has not provided a custom icon;
- custom icon: the class has an IE library associated with it belonging to the active IE Plugin and the plugin developer has provided an icon.

It is possible to filter the Classes in the Ontology Viewer if the option Show only classes with a library from the Editor menu is turned on or off.

4.4.2 Instance Viewer

The Instance Viewer displays the instances belonging to the selected class.
A right-click on an Instance will pop up a menu with the following options: Import Mark-up (see section 6.2.2),Rename and Remove.
A double-click on an Instance will open a new dialog that allows the user to modify the instance manually (see section 6.1.2).

4.4.3 Information Viewer

The Information Viewer display the information regarding selected ontology elements or selected instances provided by the loaded ontology.
All the information provided is in HTML format and is fully browsable (with a double-click on the piece of text you want more info about). It has some basic features such as go back, go forward,home and history management.


5 Marking-up the Document

MnM is a tool for Semantic Mark-up (whatever that means), isn't it? So, let's start talking about it!
In this section you will learn how to annotate a document. The first section explains the easy way to add and remove tags from a document. In the next sections (more to come...) some tricks will be introduced to speed up the process of  marking up a document, because annotating can be boring and time consuming.

5.1 Adding and Removing Tags

To add a tag to the document:
- open a Class in the Ontology Browser so that you can see the Slots that it contains;
- highlight the piece of text in the Web Browser window that you want to mark-up;
- double-click on the Slot that you want to use to annotate the document.

Sometimes you make mistakes, other times you change your mind...
To remove a tag from the document:
- in the Web Browser window right-click on the tag you want to remove and select Remove Tag from the popup menu that will appear.

5.2 Marking-up Lists

If you have to mark-up each element in a list (e.g.: black, grey, white, red, green, blue and yellow) there is a better way than highlighting every single element in it and double-clicking on the Slot to add the desired tag.
You can simply highlight the whole list and double-click on the Slot to add the tag, then right-click on the inserted tag and select Tokenize List... from the popup that will appear. At this point you can choose one or more separator, or define your own, to use for tokenizing the list.
How to tokenize a list
Figure 5.2: How to tokenize a list

5.3 Saving Marked-up Files

After you have finished annotating the current document in the Web Browser windows you can save it by selecting SaveAs... in the File menu.
The default format to save marked-up documents is XML. MnM tries to preserve the structure of the original document. In order to do so it uses JTidy to grant the well-formedness of the HTML document that has been annotated before transorming it into an XML document. This is also the standard format accepted by most of the IE Plugins for annotated documents to be used during the learning phase. For further information on Information Extraction Plugins see section 7.


6 Populating the Ontology

In this section chapter you will find out how to add and modify instances in the ontology you are browsing.

Tip: before adding or editing instances you have to create a new KB (see section 4.2) or load an existing one (see section 4.3).

6.1 Manual Population

Manual population is done entirely by "hand" by the user without using any information gathered while annotating the document and without any help from the IE Plugins.

6.1.1 Adding New Instances

Right-click on a Class in the Ontology Viewer and select Add new instance... from the popup menu that will appear to open a dialog in which you can insert all the necessary data to create a new instance of the selected class.
adding a new instance
Figure 6.1.1a: adding a new instance

In the dialog that is displayed you will see a menu entry called Result. In this menu there are two sub-menus:
- Output Action: here you can decide what to do with the instance you are working on: commit it to the ontology, save it in a local file or print it in the console (Command Prompt) for debugging purposes;
- Output Format: in this sub-menu you can choose the format to give to the instance: default (the format used by the selected ontology), Daml+Oil, Ocml, Rdf or Xml. It is only possible to commit the instance to the ontology when the Default format is selected.
the Result menu
Figure 6.1.1b: the Result menu

6.1.2 Modifying Instances

Double-click on an Instance in the Instance Viewer to open a dialog with which you can modify the selected instance.
modifying an instance
Figure 6.1.2: modifying an instance

6.2 Semi-Automatic Population

Semi-Automatic population is done using the information gathered while annotating the document.

6.2.1 Adding a single tag to a set of instances

It is possible to modify the value of a field of one or more instances in the ontology by selecting the tagged value in the document.
To do this:
- mark-up the document;
- select a set of instances (one or more) from the Instance Viewer;
- right-click on the tagged value that you want to use to modify the selected instance(s) and select Add to instance(s) from the popup menu that will appear.

6.2.2 Importing tags to a set of instances

It is possible to modify the values of a set of fields of one or more instances in the ontology with a set of values that have been previously tagged.
To do this:
- mark-up the document;
- select a set of instances (one or more) from the Instance Viewer;
- right-click on it and select Import Mark-up... (Import Mark-up into Selection... in case multiple instances are selected) from the popup menu that will appear;
- the Import dialog, containing the marked-up information, will be displayed;
- select the set of values that you want to use to modify the selected instance(s) from the Import dialog and select Ok.
Import dialog
Figure 6.2.2: Import dialog

6.3 Automatic Population

Automatic population is done using the information extracted by the IE Plugins from a set of documents. This is known as the test corpus.
To populate an ontology using Information Extraction techniques simply activate an IE plugin (Settings>Select Plugin). Then select the Class in the Ontology you want to populate from the Ontology Browser. Start the extraction phase by selecting Action>Extract... and specifying the location of the test corpus. You can then use the results provided by the IE mechanism to populate the ontology. For further information see section 7.2.

Note: Automatic Population can be done only if an IE Engine Plugin is installed in your system.


7 Integration with Information Extraction Plugins

Once upon a time someone asked: "Why don't we try to integrate IE with a Semantic Mark-up tool?".
Well, we did it. ...and it works! ...and it is damn cool! ...and it can also speed up the process of annotating documents and populating ontologies (a couple of side effects that we couldn't get rid off! :P ).

The first thing to do when dealing with an IE mechanism is teaching it what you want it do for you. In other words you have to make it learn what kind of information is important to you (the learning/training phase), so that eventually it will be able to extract the same kind of information by itself (extraction phase). In order to train the IE mechanism you have to provide a set of annotated documents (training corpus) on which it can create rules and templates; those rules and templates will then be used from the same IE mechanism to extract information from a set of new and non-annotated documents (test corpus).
Annotating documents is the most delicate step when training an IE system, because if you annote the wrong thing it will try to extract information from new documents using the wrong rules and wrong templates, so the results will be completely unreliable.

Let's try with an example, suppose we want to annotate the following sentence:
    "Mickey Mouse visited Minnie. Mickey was accompanied by Pluto and Goofy."
a possible annotation could be:
    "<visitor>Mickey Mouse</visitor> visited <person-being-visited>Minnie</person-being-visited>.<visitor>Mickey</visitor> was accompanied by <other-people-involved>Pluto and Goofy</other-people-involved>."
another way of annotating the same sentence might be:
    "<visitor>Mickey Mouse</visitor> visited <person-being-visited>Minnie</person-being-visited>.<visitor>Mickey</visitor> was accompanied by <other-people-involved>Pluto</other-people-involved> and <other-people-involved>Goofy</other-people-involved>."

Passing the previous sentences to the IE mechanism for the learning phase will produce different rules and templates and consequently different results when extracting information from non-annotated documents. In any case the best way of annotating a document depends on the IE mechanism you are using, so for further details and suggestions please refer to the user manual provided by the developer of your favorite IE tool. If you have a new IE Plugin and you want to add it to MnM just put it in the Plugins directory (see section 1.4.1) and restart the application.

Another thing to keep in mind while annotating documents is that most of the IE mechanisms out there create rules and templates according both to positive and negative examples. A positive example is when you have a relevant sentence and you mark it up so that the IE mechanism can learn on it and later extract the same kind of information. A negative example is when you have a relevant sentence and you don't annotate it. In this case the IE mechanism will create new rules and templates considering the non-annotated sentence as something that the user is not interest in, therefore during subsequent extraction phases the algorithm will skip similar sentences. All this is to say that if in the same document you have more than one relevant sentence (eg.: Last monday Mickey Mouse visited Minnie. [...] The next day Mickey received a visit from Goofy. [...] During the week-end they visited Donald Duck and Daisy Duck. [...]) remember to annotate them all.

The number of annotated documents to be used as training corpus varies according to the IE mechanism used, but the following general rule applies:
"the more the better". For a small example 6~8 annotated documents would do, but for a real life situation 30 mark-up documents is a must.

Tip: to activate one the IE plugins open the Settings>Select Plugin menu.

7.1 Learning

Once a set of document has been annotated and is safely stored in a directory on your machine you can start the learning phase by first specifying the Class that has been used for the mark-up from the Ontology Viewer and then selecting Learn... from the Actions menu. At this point you will have only to provide the location of the training corpus and wait for the IE mechanism to do its job so that you can continue with your work.

Learning is an active process and while it is executing MnM is frozen and cannot perform any other task. According to the number and length of the documents in the training corpus the learning phase might take from a couple of seconds to some hours.

Note: Every time the same Class is used for the learning phase the rules and templates previously created will be overwritten. For this reason if you want to improve your IE library you will have to add the new annotated documents to the old training corpus.

Tip: Remember to store documents annotated using different classes in different directories or else the IE mechanism will fail to recognize the different sets of tags used for the mark-up. This is because when the learning phase starts the IE mechanism is provided with the set of tags given by the class used to mark-up the documents. If during the learning phase the training corpus directory contains documents annotated with a set of tags different from the one provided to the IE mechanism, those documents will be completly ignored, or, in the worst case, the IE plugin will abort the learning process and no rules and no templates will be generated.

7.2 Extracting

To start extracting information from a set of documents specify the Class with the library that you want to use to "guide" the extraction phase from the Ontology Viewer and then select Extract... from the Actions menu. At this point you will only have to provide the location of the test corpus. Once the IE mechanism has done its job the you will have the opportunity to check the results of the extraction and decide what to do with them.

Extraction is an active process and while it is executing MnM is frozen and cannot perform any other task. According to the number and length of the documents in the test corpus the extraction phase might take from a couple of seconds to some hours.

7.2.1 What to do with the results

Once the extraction process is over, the Results Browser will be displayed. In the upper part you can find the list of all the relevant documents belonging to the test corpus (a document is relevant if something has been extracted from it). All the documents are sorted by their filename. Additionally the filenames are also used to name the instances that will be created every time you choose to commit the results to the ontology. So if you don't like the name the new instance is going to have just right-click on it and rename it.
Every time a document is selected from the list the extracted information will be displayed in the main part of the Result Browser. At this point the user can check, correct and edit the results. It is also possible to add new values and fill in empty fields.
The selected document will be also opened in the Web Browser window. All the concepts found by the IE mechanism will be highlighted in different colors to allow the user to spot them more easily inside the document.

Checking the results
Figure 7.2.1a: Checking the results

Once you are ready you can decide what to do with results:
- Accept: create a new instance using the information extracted from the selected document;
- Reject: delete the results that the IE mechanism has provided for the selected document;
- Accept All: you trust the IE mechanism or you don't want to bother checking the results, therefore a new instance will be created for each document in the test corpus using the extracted information;
- Reject All: delete all the results that the IE mechanism has provided for all the documents in the test corpus. This works also as a cancel.

It is also possible to customize the behaviour of Accept and Accept All by opening the Result menu which is now available in the menubar. In this menu there are two options:
- Output Action: here you can decide what to do with the instance you are working on: commit it to the ontology, save it on a local file or print it in the console (Command Prompt) for debugging purposes;
- Output Format: here you can choose the format to give to the instance: default (the format used by the selected ontology), Daml+Oil, Ocml, Rdf or Xml.
Deciding what to do with the results
Figure 7.2.1b: Deciding what to do with the results

7.3 Background Learning

When saving a file in XML format that has been previously marked-up, if an IE plugin and an ontology class with an associated library for IE are selected, then MnM will perform a background learning in order to improve the IE rules and templates related to the selected class with the information included in the new document.

Background Learning is a background process and will not affect the normal use of MnM so the user can continue with his work without any interruption.

Note: If this option is turned on remember to save the new annotated document in the same directory where the corresponding training corpus is located or else the IE mechanism will try to create new rules and templates using only one document, the new one! In this case there will be heavy degradation in the precision of the IE mechanism during the extraction phase.

Tip: To turn off this feature uncheck the Enable Background Learn option in the Actions menu.

7.4 Background Extraction

When opening a file, if an IE plugin and an ontology class with an associated library for IE are selected, then MnM will perform a background extraction and the newly opened page will be augmented with some suggestions on how to mark-up the document. At this point the user can confirm, remove or simply ignore those suggestions.

Background Extraction is a background process and in general will not affect the normal use of MnM so the user can continue with his work without any interruption. It is possible, though, to experience some slight delays when loading documents in the Web Browser window.

Tip: To turn off this feature uncheck the Enable Background Extract option in the Actions menu.


8 Customization

All of the above is really interesting, but we also aim to please your eyes. That's why a bit of Look&Feel will make no harm.
MnM allows the user to choose between a set of default skins and icons to modify the appearance of the application. MnM also allows the user to create his/her own icon set or skin.

8.1 Customizing Icons

To create your own icon theme you just have to provide a ZIP file containing a set of icons and put it in the Look&Feel directory (see section 1.4.1).
The icons must be in PNG format and stored at the base level of the ZIP file (not in a directory). You must include the following icons:
- Back16.png, Back24.png: 16 pixels and 24 pixels version for the Back icon;
- Forward16.png, Forward24.png: 16 pixels and 24 pixels version for the Forward icon;
- Home16.png, Home24.png: 16 pixels and 24 pixels version for the Home icon;
- Refresh16.png, Refresh24.png: 16 pixels and 24 pixels version for the Refresh icon;
- Stop16.png, Stop24.png: 16 pixels and 24 pixels version for the Stop icon;
- Up16.png, Up24.png: 16 pixels and 24 pixels version for the Up icon.

8.2 Customizing Skins

MnM uses SkinLF to provide a skinnable user interface. This package allows the creation of themepacks to change the way the UI is displayed. Some themepacks are already included in the MnM package. If you want to create your own themepack please refer to the SkinLF documentation.


9 What's Next

ToDo:
- more plugins for I/O;
- more plugins for IE;
- deep search mechanism for the whole ontology;
- easy way to annotate complex lists, such as a list of references or bibliographic data;
- ...


10 Troubleshooting

Known issues regarding Amilcare:
- if the set of documents used for the learning phase contains some special characters (eg.: &) out of a tag, Amilcare will display an error like: "There is a problem in writing the temporary file C:\Program Files\MnM2\Java\Amilcare\Temp/TempAmilcareFile.txt See transcript", followed by an "Amilcare: error: see output for details". This is due to an XML parsing error and can be solved by just replacing "&" with "and".
- sometimes the Amilcare Progress Dialog will appear only the first time the learning or the extraction phase is executed and will not be displayed again until the whole application has been restarted. This bug does not prevent Amilcare from running properly and producing correct results.

Known issues regarding JTidy:
- if JTidy doesn't know how to correct an error, while checking for the well-formedness, it may remove some of the content from the document.


11 Contacts

Enrico Motta (e.motta@open.ac.uk)