2. About Ontotext
• Provides products and services for creating,
managing and exploiting semantic data
– Founded in 2000
– Offices in Bulgaria, USA and UK
• Major clients and industries
– Media & Publishing (BBC, Press Association, EuroMoney,
NDP Nieuwsmedia)
– HCLS (AstraZeneca, UCB, NIBIO)
– Cultural Heritage (The British Museum, The National
Archives, Polish National Museum, Dutch Public Library)
– Government (UK Parliament, United Nations FAO, LMI)
#2May 2013From Big Data to Smart Data (Semantic Days 2013)
3. Contents
• The Problem with Big Data for BI
• From Big Data to Smart Data
• Success Stories by Ontotext
#3From Big Data to Smart Data (Semantic Days 2013) May 2013
4. BIG DATA FOR BUSINESS
INTELLIGENCE
#4From Big Data to Smart Data (Semantic Days 2013) May 2013
5. The Problem with Big Data for BI
#5From Big Data to Smart Data (Semantic Days 2013) May 2013
6. The Problem with Big Data for BI
• It’s not only about Volume, Velocity & Variety
• Too much focus on processing speed & storage
volume
• “Brute force” approaches increase the amount of
data processed…
– But not necessarily the Value & insight derived from data
– May lead to even more data quality & inconsistency
problems
– Problems with data visualisation & exploration
– Often do not lead to better decision making
#6From Big Data to Smart Data (Semantic Days 2013) May 2013
7. The Problem with Big Data for BI
• BI success is not measured by Volume, Velocity &
Variety, but by more derived Value
• Organisations should learn how to better utilise their
“small data” before targeting Big Data
– Quality over quantity
– Better understanding of the data leads to better decision
making
– Avoid “needle in a haystack” situations
#7From Big Data to Smart Data (Semantic Days 2013) May 2013
8. The Problem with Big Data for BI
#8From Big Data to Smart Data (Semantic Days 2013) May 2013
9. Smart Data for Better BI
• Efficiently analyse unstructured data
– Most of the enterprise data is still unstructured
– Even within structured & transactional data sources there
is a lot of embedded unstructured data
– … and this unstructured data is poorly analysed (if at all) =>
lots of potential value still remains locked
– (sometimes even within semantic / Linked Data with
insufficient granularity)
#9From Big Data to Smart Data (Semantic Days 2013) May 2013
10. Smart Data for Better BI
• Focus on metadata first, Big Data later
– (As opposed to: Big Data first, metadata later)
• Enrich data
• Interlink data
• Provide a common metadata layer
– Break legacy silos
– Align heterogeneous metadata if necessary
• Better analysis of the data, better insight
#10From Big Data to Smart Data (Semantic Days 2013) May 2013
12. UK Job Market Intelligence
• Comprehensive recruitment database for the UK
– 4 million job ads / vacancies (dynamic)
– 220,000 company websites & 700 job boards monitored
• Questions we can answer
– What skills are in demand at present?
– Which are the top job boards in a region?
– Which is the right Job board for your industry sector?
– Which are the most active job advertisers / employers?
– Which are the agencies and employers that do not
advertise on your job board?
#12From Big Data to Smart Data (Semantic Days 2013) May 2013
13. UK Job Market Intelligence
#13From Big Data to Smart Data (Semantic Days 2013) May 2013
14. UK Job Market Intelligence
• Technology stack
– Web mining & focussed crawling
– KB construction from open & proprietary data sources
– Skills taxonomy (based on DISCO)
– Text mining & semantic enrichment
– Reconciliation & interlinking
– BI reporting & dashboards
#14From Big Data to Smart Data (Semantic Days 2013) May 2013
15. UK Job Market Intelligence
#15From Big Data to Smart Data (Semantic Days 2013) May 2013
16. UK Job Market Intelligence
#16From Big Data to Smart Data (Semantic Days 2013) May 2013
17. UK Job Market Intelligence
#17From Big Data to Smart Data (Semantic Days 2013) May 2013
18. Asset Recovery Intelligence System (ARIS)
• Support Financial Intelligence Units with tracking
stolen assets, fight corruption & money laundering
• Questions we can answer
– What are the reported activities related to a person?
– What is the person’s personal/professional network?
– What are corruptions cases reported in regional news?
• Data sources
– News feeds from major news agencies
– Dow Jones data & news feeds
– SARs to the FIU
– Open data (people & companies, Wikipedia)
#18From Big Data to Smart Data (Semantic Days 2013) May 2013