Det finns en enorm potential i analys, förädling, modellering och åskådliggörande av de enorma datamängder som genomsyrar såväl näringsliv som samhälle. För att realisera denna potential räcker det inte med kapacitet att lagra, förmedla och söka igenom data. Till det behövs istället Big Data Analytics, storskalig analys av enorma datamängder. Vi presenterar här den forskningsagenda för Big Data Analytics som SICS tillsammans med IBM och ytterligare parter från industri och akademi håller på att ta fram.
Talare: Daniel Gillberg, Research Group Leader, SICS, Anders Holst Senior Research Scientist, SICS, samt Flemming Bagger, IBM
Besök http://smarterbusiness.se för mer information.
2. 83x
6,000,000 users on Twitter 500,000,000 users on Twitter
pushing out 300,000 pushing out 400,000,000
tweets per day tweets per day
1333x
3. Where is big data coming from?
4.6
30 billion RFID
billion
12+ TBs tags today
camera
of tweet data (1.3B in 2005)
phones
every day world
wide
100s of
millions
data every
? TBs of
of GPS
day
enabled
devices
sold
25+ TBs annually
of 2+
log data billion
every day people
76 million smart meters on the
in 2009… 200M by 2014 Web by
end 2011
4. The Characteristics of Big Data
Cost efficiently Responding to the Collectively analyzing
processing the
increasing Velocity the broadening Variety
growing Volume
50x 35 ZB 30 Billion
RFID 80% of the
sensors and worlds data is
counting unstructured
2010 2020
Establishing the By 2015, 80% of all available data will be uncertain
- The number of networked devices will be double the entire
Veracity of big global population
data sources - The total number of social media accounts exceeds the entire
global population
5. Big Data is a Hot topic
- Because it is possible to Analyze ALL Available Data
• The percentage of available data an enterprise can analyze is decreasing proportionately to
the available to that enterprise
– Quite simply, this means as enterprises, we are getting ―more naive‖ about our business over time
• Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s
bottom line
• Cost effectively manage and analyze ALL available data in its native form
unstructured, structured, realtime streaming…….Internal and external
Data AVAILABLE to
an organization
Data an organization
can PROCESS
6. Business-centric Big Data Platform
• ―Big data‖ isn’t just a technology
—it’s a business strategy for
capitalizing on information resources
• Getting started is crucial
• Success at each entry point is
accelerated by products within the
Big Data platform
• Build the foundation for future
requirements by expanding further
into the big data platform
6
7. Different data workloads have different characteristics
Database services that handle
large volumes of transactions with
System for Transactions high availability, scalability and integrity
Data Warehouse services for
System for Analytics complex analytics and reporting
powered by on data up to petabyte scale -
Netezza technology with minimal administration
Operational Warehouse services for continuous
ingest of operational data, complex analytics, and
System for a large volume
Operational Analytics of concurrent operational queries
9. Big Data Analytics – A national research initiative
Daniel Gillblad
Research Group Leader, Senior Research Scientist
SICS, Swedish Institute of Computer Science
10. Background
• There is a very large potential, both societal and
commercial, in the analysis, refinement, modeling,
and visualization these data sets
• Capacity to store, transfer, and search is not enough -
analytics is critical
11. Additional business value of Analytics
• Predict and optimize business outcomes
• New services and applications, both for end-users
and industry
• New value chains, were different actors can create and
exchange new analysis services
12. A national Big Data Analytics initiative
① A strategic nation-wide research and innovation agenda
– Input from several sectors and application areas
– Both new businesses built on analytics applications
and traditional industry
– Input from academia, both as developers and as users
② A national Big Data Analytics network
– Open to all interested parties
– Industry and academia with an active interest in Big Data Analytics
13. Focus areas
Control and planning
Visualization
Focus areas
{ Analytics
Computation
Storage
Collection
15. Research and development challenges
• Huge businesses are built on Big Data Analytics today,
but a large number of issues must be resolved to fully
realize the potential
• Three examples
17. Example 2: Social network mining
• Challenges: Unstructured data, biased data, data access
18. Example 3: Access network pattern mining
• Challenges: Integrity issues, distributed
mining, service frameworks
19. Long term trends
• Currently dominating approach will continue to be successful, but
will be complemented due to
– Too much data, unstructured data, noisy data
– Limited access – security, integrity, legal, and business
– Fast data generation, situation awareness
• The consequences are
– Analysis closer to data generation / collection
– No storage - Catching information on the fly
– Distributed analysis with incomplete data
– Real time collection, real time analytics
20. Research challenges
• Research challenges on different levels:
– The sensor/collection level
– The algorithmic/analytical level
– The system level
– The organisational level
21. Technical challenges, examples
• Computational and storage framework development
• Analysis of unstructured data
• Distributed analysis
• Efficient analysis algorithms
• Stream mining
• Managing sample bias
• Managing uncertain and missing data
22. Platform and organisational challenges, examples
• Service and analytics frameworks, exchanging models and data
• API:s and standards
• Privacy, integrity, security, and legal
• Business models
23. Contacts
• If you are interested in the Swedish Big Data Analytics Network,
feel free to contact
Daniel Gillblad Anders Holst
dgi@sics.se aho@sics.se
+46 8 633 15 68 +46 8 633 15 93
Hinweis der Redaktion
An enormous amounts of data permeate societyBoth the data itself and how it is usedDeeper analysis of audio and video
* A move from instance based to model based approaches