Building a data driven search application with LucidWorks SiLK
Impact of open source search on the intelligence community
1. Impact of Open Source Search On
The Intelligence Community
Mats Bjore, Infosphere AB,
opcenter@infosphere.se, 07 OCT 10
2
2. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
What I Will Cover
• Impact of Open Source Search On The
Intelligence Community
• Who I am
• Defining the intelligence landscape
– Business, Government, Coalition
• Wake up, challenges opportunities
• Some policy statements and reactions
• Real world Intelligence examples
3
3. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
My Background
4
4. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
FIRST WAKE UP CALL
10/12/10
Copyright 2010: Infosphere AB
1969 1990 1994 2010
5. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Challenge no 1
6
TECHNOLOGY CHANGE FAST –
MINDSETS AND ORGANIZATIONS DON´T
6. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
2nd WAKE UP CALL….
10/12/10
Copyright 2008: Infosphere AB
Collection
AnalysisDissemination
Planning
7. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Evolution of the Revolution
8
Information is
Power
Knowledge is
Power
Sharing is
Power
- 1994 1995-2007 2008
Governments
• 200
• Need for intelligence
Corporations
• +77 Million
• Need for information
• Search and Analytics intensive
• Creates isolated units even within
a nation.
• Collect, Store and Re-Retrieve,
Analyze React and Act.
• Enforces existing methods on new
media
• Build own systems based on
existing rules and culture
• Violates copyright rules to save
money (sometimes)
• Simple
• BI intensive
• Uses a mix of consultants,
research reports, in-house
knowledge.
• Buy, Compare, Analyze and Act
• Benchmark and create rules for
market leadership
• Live with media
• Buy rights to use information
• Complex ( M&A)
8. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Challenges for the IC (s)
• Nature of digital information - From Data to Text to Media Mining
07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
9
Data mining is sorting through
data to identify patterns and
establish relationships.
Association - looking for patterns
where one event is connected to
another event
Sequence or path analysis -
looking for patterns where one event
leads to another later event
Classification - looking for new
patterns
Clustering - finding and visually
documenting groups of facts not
previously known
Forecasting - discovering patterns
in data that can lead to reasonable
predictions about the future
Text mining, also known as
intelligent text analysis, text data
mining or knowledge-discovery in
text (KDT), refers generally to the
process of extracting interesting and
non-trivial information and
knowledge from unstructured text.
Text mining is a young
interdisciplinary field which draws on
information retrieval, data mining,
machine learning, statistics and
computational linguistics.
As most information (over 80%) is
stored as text, text mining is
believed to have a high commercial
potential value.
Media Mining, also known as
intelligent text analysis, text data
mining or knowledge-discovery in
text (KDT), refers generally to the
process of extracting interesting and
non-trivial information and
knowledge from unstructured text.
Text mining is a young
interdisciplinary field which draws on
information retrieval, data mining,
machine learning, statistics and
computational linguistics.
As most information (over 80%) is
stored as text, text mining is
believed to have a high commercial
potential value.
9. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Challenges for the IC (s)
• Nature of digital information
– From Data to Text to Media Mining
– Volumes
• Digital copycats
• Languages
– Original, Machine translated, transcribed, mixed
• Snippets
– The Moreover syndrome, Blog posts, Social media
– Location
• Internal Silos
– Mental, Security, Organizational
• External Silos
– Free and for fee - but how to connect?
07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
10
10. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Early Warning
Academic
Publications
Patents
Alternative
Press
Trade
Publications
Research
Reports
ChatRooms
Personal
Web
Sites
Online
New
Sites
News
Groups
News
Groups
Chat
Rooms
e-commerce
sites Chat
Rooms
Chat
Rooms
Chat
Rooms
Chat
Rooms
News
Groups
News
Groups
News
Groups
Chat
Rooms
Personal
Web
Sites
Personal
Web
Sites
Personal
Web
Sites
Personal
Web
Sites Personal
Web
Sites
Personal
Web
Sties
PersonalWeb
Sites
Personal
Web
Sites
Chat
Rooms
News
Groups
Online
New
Sites
Online
New
SitesOnline
New
Sites
Online
New
Sites
Online
New
Sites
OnlineNew
Sites
Online
New
Alternative
Press
Alternative
Press Patents
Patents
Patents
Trade
Publications
Trade
Publications
Trade
Publications
Research
ReportsNews
Magazines
Periodical
Magazines
Periodical
Magazines
News
Magazines
Quality of DataRaw Synthesized
Timeless of DataInstantaneous Historical
e-commerce
sites
e-commerce
sites
Academic
Publications
Patents
Alternative
Press
Trade
Publications
Research
Reports
ChatRooms
Personal
Web
Sites
Online
New
Sites
News
SMS
News
Groups
Chat
Rooms
e-commerce
Chat
Rooms
Chat
Rooms
Chat
Blogs
Chat
Rooms
News
Groups
News
Groups
News
Grops
Chat
Rooms
Personal
Web
Sites
Personal
Web
Sites
Personal
Web
Sites
Personal
Web
Sites Personal
Web
Sites
Personal
Web
Sties
Personal
W
e
b
Sites
Personal
Web
Sites
Chat
News
Groups
Online
New
Sites
Online
New
SitesOnline
New
Sites
Online
New
Sites
Online
New
Sites
OnlineNew
Sites
Offline
News
Alternative
Press
Alternative
Press Patents
Patents
Patents
Trade
Publications
Trade
Publications
Trade
Publications
Research
ReportsNews
Magazines
Periodical
Magazines
Periodical
Magazines
News
Magazines
Quality of DataRaw Synthesized
Timeliness of DataInstantaneous Historical
e-commerce
e-commerce
sites
MMS
Search Monitor Receive
Shape Control Follow
Active Passive
Late Reaction
The Big Challenge=TIME TO PRODUCT
Twitter
11. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
1
2
Information Value Chain
DATA INFORMATION INTELLIGENCE
PAST PRESENT FUTURE
Contextualized
Categorized
Calculated
Corrected
Condensed
Compared
Connections
Calculated
Consequences
Connections
Conversations
Chances
Data becomes information when asked for
Intelligence becomes information
when not needed
PASSIVE ACTIVE PROACTIVE
Information becomes data when not needed
INFORMATION VOLUME
LEVEL OF SYNTHESIS (ANALYSIS) AND CONTEXT
12. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
IC Requirement: making sense
– Who’s involved - How they are related - Where it happened- What people
are saying - Who has written about it - Who has written about related
issues - What topics or categories of information are involved
– Predictions ( hypothesis based) - Fact based analytics – etc.
– Storage & Retrieval
Tools that can ”document” conclusions,
facts, relationships, sentiments- and that constantly be
triggered, questioned, challenged and further validated by
the incoming information
13
13. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
IC(s) Legacy and Opportunities
14
• Legacy vendors have created customs & relationships that are entrenched
within government and the “beltway bandits” ( in every country)
• 80 per cent of government IT spending in the UK goes to only five
companies
• Lack of knowledge that open-source equivalents to proprietary software
exists.
14. US DoD guidance memo
• The U.S. Department of Defense
issued a guidance memo in
October 2009 outlining the
positive aspects of OSS that
should be considered when
conducting market research on
software for Department use.
• Some of the benefits noted in the
memo include:
• .
• The continuous and broad peer-review enabled by publicly
available source code supports software reliability and
security efforts through the identification and elimination
of defects that might otherwise go unrecognized by a
more limited core development team.
• The unrestricted ability to modify software source code
enables the Department to respond more rapidly to
changing situations, missions, and future threats.
• Reliance on a particular software developer or vendor due
to proprietary restrictions may be reduced by the use of
OSS, which can be operated and maintained by multiple
vendors, thus reducing barriers to entry and exit
• By sharing the responsibility for maintenance of OSS with
other users, the Department can benefit by reducing the
total cost of ownership for software, particularly compared
with software for which the Department has sole
• responsibility for maintenance
• OSS is particularly suitable for rapid prototyping and
experimentation, where the ability to "test drive" the
software with minimal costs and administrative delays can
be important.
15
15. Europe
• Since 3 February 2010, the
European Union's Open Source
Observatory and Repository
(OSOR.eu) has been providing
the public administrations with
an access to more than two
thousand free and open source
applications.
• The OSOR is a platform where
public administrations can
exchange information and
experiences and collaborate in
developing free and open source
software. The platform has
managed to bring together more
than 2 000 of such open source
software applications in just
sixteen months after its launch.
• www.OSOR.eu
• http://ec.europa.eu/idabc/
en/document/2623
• http://cordis.europa.eu/
fp7/ict/ssai/foss-
home_en.html
16
16. So, why the hesitation?
17
• There are mostly academic & promotional arguments that favor
the OSS
• Open source need to industry to prove that it can deliver cost
savings compared with proprietary technology
• Provide business cases with articulated open source as cheaper
than proprietary. - Shift from the academic discussion to business
discussion!
• Applications without formal support and training
• Mindsets within organizations & companies
• Legal questions about licenses / uncertainties
• Perception that products change too much
• Businesses want the comfort of having a relationship with a
commercial account manager from a software firm, rather than
relying on the developer community for help and support
• Large organizations demand product warranties and service
agreements.
• Procurement processes is not being set up in proper ways.
17. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Examples
18
NATIONWIDE ALL SERVICES INTELLIGENCE PLATFORM
COMMERICAL INTELLIGENCE APPLICATION
DOCUMENTATION & SEARCH FOR A SECURITY SERVICE
RISK SOULTION
18. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Solution using Lucene as one component in a nationwide
intelligence platform
19. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Solution using Lucene as one component in a nationwide
intelligence platform
20
27. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se2
8
“I see by the current issue of ‘Lab News’, Ridgeway,
that you’ve been working for the last 20 years on the same problem I’ve been
working on for the last 20 years.”
Sharing is power…
28. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Solve customers problems..
29
WE HAVE CREATED A COMPUTERIZED,
INTERACTIVE ARTIFICIAL
INTELLIGENCE PROFILING INTRANET
DEVICE FOR THE UN WITH ENTITY
EXTRACTION AGENTS AND
VIZUALIZATION. I CALL IT THE
”OSINT-CENTER” AND IT IS
RESTRICTED TO 40 COMPUTERS.
WONDERFUL. MAKE SOME
PHOTOCOPIES AND ROUTE IT
AROUND.
BUT I REALLY ONLY ASKED
FOR THE NAME OF THE
GENERAL SECRETARY OF THE
LUCID IMAGINATION
IS IT ABOUT TECHNOLOGY? OR…. BUSINESS
AS IN BUSINESS AS USUAL?
29. 07 OCT 2010 - Mats Bjore
Infosphere AB Opcenter@infosphere.se
Contact
• Mats Björe
• Infosphere AB
• mats.bjore@infosphere.se
• +468 611 22 33
• www.infosphere.se
30
Know more about open source search at
www.lucidimagination.com