The Web is large and information is present in many forms. Complex techniques are necessary to discover the hidden structure of content and a single software provider cannot be expert on all them. Thus the integration platform comes as a perfect solution enabling the use of the best tools for each function. In this presentation we will present OSINT challenges and its growing importance. Then we will detail the WebLab approach to build flexible and scalable OSINT applications matching the fast-paced nature of OSINT. From semantic data models to upper architecture passing through selected technologies used, the presentation will do the complete tour of the WebLab project.
Dev Dives: Streamline document processing with UiPath Studio Web
WebLab, open source media mining platform, OW2con'12, Paris
1. Open source media mining platform
Gérard Dupont
Research engineer – COEDS2 – Advanced studies
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
2. Media mining platform
From unstructured data from any sources...
… to structured and actionable knowledge
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
4. OSINT challenges
Some activities need to be automated:
Search/Sources assessment
Data Acquisition
Classification, Screening, Indexing
Information retrieval
Knowledge capitalisation
Visualization
Summary
Some activities cannot be automated : Alert
- experts analysis of content ;
- linking and mapping heterogenous information ;
- evaluating reliability and assessing information ;
- report and synthesis of information.
→ Tools can provide support but keep human in the loop.
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
5. A processing workflow
Vidéo Audio Audio vocal
Collect
Traduction Segmentation Epuration
Audio
audio
Traduction
extraction
vidéo
Transcription
Texte annoté Texte traduit audio
Transcription
Audio
Enriched text Translated text audio
Transcription
Texte Sphinx
An international Greenpeace An international Greenpeace
alpine team delivers alpine team delivers Text
messages of support and messages of support and 国際グリーンピース高山チームは富
hope for the victims of the hope for the victims of the 士山の頂上への支援と福島第一
nuclear disaster at nuclear disaster at に原子力災害の被害者のための希望
Fukushima Daiichi to the Fukushima Daiichi to the
Extraction のメッセージを配信します。
summit of Mt Fuji. Collected information
Extraction Traduction
summit of Mt Fuji. Collected Translation
Alert from thousands of d’information Traduction 日本と世界中の何千人もの人々から
people in d Extraction
’information from thousands of people in
Japan and all over the Japan and all over the
収集した、グリーンピースは、
world, Greenpeace hopes world, Greenpeace hopes これらのメッセージは、原子力発電
that these messages will that these messages will に反対する日本の人々を団結に役立
help unite the people of help unite the people of つ、
Japan in opposition to Japan in opposition to 日本当局はそれらに耳を傾けること
nuclear power. nuclear power. を奨励することを期待しています。
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
6. Integration approach
A platform providing "plug & play" functionalities for the integration of tools for collection,
processing, analysis and communication...
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
7. Technology pile
Java Application Server
Apache Tomcat
Enterprise Service Bus Portail/Portlets
SOA ESB JBI JSR168
Content store BPEL OWL
WSDL SPARQL RDFS
Database
SOAP XSD XPath XQuery RDF
XML Namespaces
Portal
URI UTF-8
Maps server
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
8. Standard model & interfaces
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
9. Standard model & interfaces
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.
15. Thanks for your attention
Take away [weblab.ow2.org]
Logos and names of the tools presented are the property of their respective providers and are here only as
illustration purposes on the already integrated technology in WebLab. Neither CASSIDIAN, nor EADS,
claims any paternity on these external tools.
HERITRIX - http://crawler.archive.org GOOGLE TRANSLATE - http://translate.google.com/
FFMPEG - http://ffmpeg.org/ GATE - http://gate.ac.uk/
SPHINX - http://cmusphinx.sourceforge.net/sphinx4/ JENA - http://jena.apache.org/
OW2Con'12, November 28-29, 2012
Orange Labs, Paris. www.ow2.org.