The document provides an overview of the methodology for collecting, cleaning, and representing textual data from various sources in a unified XML format for the INDECT project. It reviews existing annotation schemes such as ACE and KBP and proposes a new WP4 annotation scheme. The new scheme is based on a multi-layered ontology that allows flexible tagging of entities, relationships, events, and their properties in news reports, weblogs, and chat data to facilitate information extraction and the development of search and analysis tools for crime prevention.