<number>
1641 Depositions – witness testimonies, mainly by Protestants, but also by Catholics concerning their experience during the 1641 Irish rebellion.
These testimonies document different incidents like robbery, military actions, imprisonment and murder.
1916 Rising – documents events that happened during a rising that occurred at Enniscorthy (Ireland) in 1916.
IPSA collection – a digital archive of illuminated medieval manuscripts, with high quality drawings and metadata (mainly herbs and astrology).
<number>
In our work entity extraction involved definition of dictionaries for each entity type we wanted to extract (e.g. FirstName, LastName, Location, Date),
and a set of parsing rules that use these dictionaries to annotate piece of text that presumably fits an entity, or entity’s attribute and relationship between entities.
The parsing rule for identification of this Person includes person’s title (e.g., sir, Mrs), then FirstName followed by LastName, or LastName followed by FirstName.
We have a few tens of these rules for a comprehensive and accurate (to some extent) annotation of text.
The main part of the paper of therefore this presentation focuses on that task of manual updates of extracted entities.
The task of entity extraction can not produce 100% accuracy even under perfect conditions (such as clean and well structured text);
it becomes much more difficult when considering 1641 Depositions collection, with noisy text, huge number of misspellings and inconsistent grammar.
As an example, the same person’s name can be written differently in the same deposition; yet we strive to disambiguate these names and conclude that they refer to the same person.
Another example are entities attributes, e.g., Person’s occupation, religion or origin identification becomes challenging due to inconsistent structure of sentences.
In CULTURA we adopted an approach for manual modifications of extracted information using the PreMapper Tool that I present next.
<number>
<number>
Note that Entity-Relationship data is used for several (currently decoupled) purposes: (1) entities visualization with the PreMapper tool and (2) Entity-oriented search that is not the subject of this presentation,
but in few words is used for retrieval and exploration of extracted entities. Changes in extracted entities should be reflected in the EoS component, that uses the open source Lucene search engine for retrieval.
We will focus on the architecture of distributing the modifications made by users in one of the next slides.