Presentation by Martin Kaltenböck, Semantic Web Company, at the first workshop of Societal Challlenge 6 in the BigDataEurope project, taking place in Luxembourg on 18 November 2015.
http://www.big-data-europe.eu/social-sciences/
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture: The results of the online survey - SC6 Workshop
1. BIG DATA EUROPE
BIG
DATA
EUROPE
PLATFORM
REQUIREMENTS
&
DRAFT
ARCHITECTURE:
THE
RESULTS
OF
THE
ONLINE
SURVEY
BIG DATA EUROPE WORKSHOP: THE CHALLENGES OF BIG DATA FOR
SOCIETIES IN A CHANGING WORLD
MARTIN
KALTENBÖCK
(SEMANTIC
WEB
COMPANY),
18.11.2015
HTTP://WWW.BIG-‐DATA-‐EUROPE.EU/
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
2. Semantic Web Company (SWC)
SWC was founded 2001, head-quartered in Vienna
30 experts in linked data technologies & textmining
Product: PoolParty Semantic Suite (launched 2009)
Serving customers from all over the world
EU- & US-based consulting services
3. SWC: Customers & Partners
Some of our Customers
● Credit Suisse
● Boehringer Ingelheim
● Roche
● Wolters Kluwer
● BMJ Publishing Group
● Red Bull Media House
● Canadian Broadcasting Corporation (CBC)
● Pearson
● Council of the EU
● DG Environment, EC
● Healthdirect Australia
● Ministry of Finance (Austria)
● World Bank Group
● Inter-American Development Bank (IADB)
● International Atomic Energy Agency (IAEA)
● Buildings Performance Institute Europe (BPIE)
● Renewable Energy & Energy Efficiency P (REEEP)
● Global Buildings Performance Network (GBPN)
● American Physical Society
● Education Services Australia (ESA)
● Norwegian Directorate of Immigration
● Australian National Data Service
Finance / Automotive / Publisher / Health Care / Public Administration / Energy / Education
Selected Partners
● EBCONT
● EPAM Systems
● iQuest
● PwC
● Tenforce
● OpenLink Software
● Ontotext
● MarkLogic
● Gravity Zero
● Altotech
● Wolters Kluwer
● Taxonomy Strategies
● Digirati
● Fraunhofer (IAIS)
● University of Leipzig (INFAI)
● The Open Data Instizute (ODI)
We all have one goal in mind: Make machines smart enough so that they can
help us to find those needles in the haystack, which are really relevant to us.
4. The Motivation – Big Data
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the
world today has been created in the last two years alone.
This data comes from everywhere: sensors used to gather climate information, posts to
social media sites, digital pictures and videos, purchase transaction records, and cell phone
GPS signals to name a few.
This data is big data. Source: IBM
7. BIG DATA EUROPE
STAKERHOLDER ENGAGEMENT
& REQUIREMENTS ENGINEERING
APPROACH
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
9. Work Packages & Implementation Phases
Community
Building
M1-‐M12 M13-‐M24 M25-‐M36
Enabling
Technologies
Component
Integration
Uptake
Integrator
Deployment
Community
Assessment
WP3
–
Big
Data
Generic
Enabling
Technologies
&
Architecture
WP5
–
Big
Data
Integrator
Instances
WP7
–
Dissemination
&
Communication
WP2
–
Community
Building
&
Requirements
WP4
–
Big
Data
Integrator
Platform
WP6
–
Real-‐life
Deployment
&
User
Evaluation
10. Orthogonal Dimensions of Big Data Ecosystems
Generic
Big
Data
Enabling
Technologies
Data
Value
Chain
Data
Generation
&
Acquisition
Data
Analysis
&
Processing
Data
Storage
&
Curation
Data
Visualization
&
Usage
Data-‐driven
Services
Societal
Challenges
Domain
Specific
Data
Assets
&
Technology
Healthcare
Food
Security
Energy
Intelligent
Transport
Climate
&
Environment
Inclusive
&
Reflective
Societies
Secure
Societies
11. Methodology of Requirements Engineering
BDE Approach & Methodology
• BDE Core Question Matrix as a basic Tool
• Online Survey (20.5. – 26.6.2015, 394 Participants)
• 7 x 15 Face to Face Interviews (3 x 5 per SC)
• 7 Workshops in 2015 (7 in 2016, 7 in 2017)
• 7 BDE Pilot (Use Case) ideas / specifications
Requirements
Use case
pilots
Online
survey Interviews
12. BDE Core Question Matrix
Elements of the RE model
Questions to people within the specific Societal Challenge
(grouped by type of interviewee)
Business Strategic Technical Domain Experts
Stories Question Question Question Question
In this element, stories which describe the current status
and future development are asked
Question Question Question Question
Personas Question Question Question Question
In this element, typical personas which play a role are
described Question Question Question Question
Data Question Question Question Question
This element is to describe the data in amount, quality,
type, usage, etc. Question Question Question Question
Technologies Question Question Question Question
In this element, the technical requirements to our specific
solution are described Question Question Question Question
Other Question Question Question Question
13. BDE Stakeholder Survey
The empirical methodology of
online surveys generally coincides
with problems of representativity.
Samples generated through online
surveys are regarded as biased,
especially in terms of age, sex and
education.
Additionally lower response rates
compared to other methods, self-
selection and the lack of verifiability of
demographic information provided by
the respondents do not allow to draw
conclusions beyond the sample
ascertained by the survey itself.
19. BDE Stakeholder Survey - Results
Importance of Variety Efficiency of Data Infrastructures
20. BDE Stakeholder Survey - Results
Big
Data
Volume
Velocity
Variety
Veracity
• Not an issue
• Would be nice to have
• Very important
• “mostly economic and Social Science data”
• Not so much data
• “Increasingly important”
• Very important “Data inconsistencies and
ambiguities are solved before processing”
21. BDE Stakeholder Survey - Results
Investments in Big Data Technologies Investments per Orgaisation Size
25. BDE Stakeholder Survey – Results: Long Term Preservation
¥ Long term preservation of data
o SC6 has the infrastructure in place for longterm
preservation of data
o “Current practice is a core service where data is held in a
central place within a national infrastructure, and secure
remote access is provided to each social research team.”
¥ Data processing
o “We use small samples or just the “main information”’ of
data needed.”
26. BDE Stakeholder Survey – Results
Need of Processing Large Volumes of Data
per Organisation Size
27. BIG DATA EUROPE
TECHNICAL REQUIREMENTS &
ARCHITECTURE / COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
28. Blueprint of the Data Aggregator Platform
Batch Layer
Speed Layer
Data Storage
Real-time data &
Transactions …
Batch View
Real-time
View
messagepassing
message passing
Applications & Showcases
Real-time dashboards
Domain-specific BDE apps
Big Data Analytics
In-stream Mining
BDEPlatform&Intelligence
Input data
Stream
Spatial
Social
Statistical
Temporal
Transactional
Imagery
+ Semantic Layer
Lambda Architecture
33. Announcements….
• HangOut, 23.11.2015, 11.00am -12.00pm CET (SC2)
INRA’s Big Data Perspectives and Implementation Challenges
• HangOut, 25.11.2015, 14.00pm -15.00pm CET (SC1)
Challenge of Health, Demographic Change and Wellbeing
• HangOut, 08.12.2015, 11.00pm -12.00pm CET (SC3)
Big Data in the energy domain
• Big Data Europe MeetUp Vienna, 15.12.2015, 16:00-19:30pm CET, LINK
• 2016 Conference on Big Data from Space, March 15, 2016, LINK
SEMANTiCS2016, early September 2016
in Leipzig, Germany, http://www.semantics.cc
EDF2016, 29-30 June 2016
Eindhoven, Netherlands, http://2016.data-forum.eu
34. BDE Channels for Societal Challenge 6
• Overall Website: http://www.big-data-europe.eu
• SC 6 Website: http://www.big-data-europe.eu/social-sciences/
• W3C Community Group: https://www.w3.org/community/bde-societies/
• Subscribe BDE Newsletter: http://bit.ly/1PyhXRS
Contact the BDE Societal Challenge 6 network
Domain: Ivana Ilijasic Versic (CESSDA):
ivana.versic@cessda.net
Technical: Martin Kaltenböck (Semantic Web Company):
m.kaltenboeck@semantic-web.at
35. Workshop 18.11. – Interactive Sessions
Session 1: Data in place in the Social
Sciences and Humanities
• What are the most important data sources in social
sciences available / you are using (open / closed)?
• How are the characteristics along the 4 Vs of Big Data
regarding such sources (Volume - Variety - Velocity -
Veracity)?
Session 2: Risks and Challenges of
successful data management
• What are the most important challenges in data
management in social sciences?
• What are the most dangerous risks you can think of
regarding data management in social sciences?
• SWOT - Analysis
Session 3: Technological demands of data
• What technologies are in place in your organisations?
• What technologies are on your roadmap - or are you
evaluating at the moment?
• What are the most critical technological issues?
Session 4: Legal and policy demands of data
• Open Versus Closed data in social sciences?
• What are the most important legal issues in place?
• What needs to change regarding Policies to enable
more efficient data management in social sciences?
36. Martin Kaltenböck, m.kaltenboeck@semantic-web.at
Semantic Web Company GmbH
Mariahilfer Strasse 70/8, A-1070 Vienna
+43-1-4021235
http://www.semantic-web.at
http://www.poolparty-software.com
http://slideshare.net/semwebcompany
http://youtube.com/semwebcompany
Your Questions please….
www.big-data-europe.eu
27-Nov-15
#BigDataEurope