SlideShare a Scribd company logo
1 of 78
Download to read offline
BI(G) DATA
Opportunities for BI professionals
in the Netherlands

Most companies mentioned are Dutch
Our fantasy...

At Last: an IT job is sexy
Agenda
● Big Data views
○ Scientific Method
○ Data Characteristics
○ New Technology
○ Business Opportunities
○ Culture
● Opportunities for BI professionals
Google Trends

The famous McKinsey Report: Big data: The
next frontier for innovation, competition, and
productivity

BIG Data became trending because of Mckinsey
Now it’s correlated with hadoop
Wikipedia Big Data
Big data usually includes data sets with sizes beyond the ability of commonly
used software tools to capture, curate, manage, and process the data within a
tolerable elapsed time.[19]
Big data sizes are a constantly moving target, as of 2012 ranging from a few
dozen terabytes to many petabytes of data in a single data set.
The target moves due to constant improvement in traditional DBMS technology
as well as new databases like NoSQL and their ability to handle larger amounts
of data.[20]
With this difficulty, new platforms of "big data" tools are being developed to
handle various aspects of large quantities of data.

Focus on volume… instead of other V’s
BIG Data
The Scientific method is changing
The Fourth Paradigm: Data-Intensive
Scientific Discovery
Increasingly, scientific breakthroughs will
be powered by advanced computing
capabilities that help researchers
manipulate and explore massive datasets.
Implicit in the idea of a fourth paradigm is
the ability, and the need, to share data. In
sciences like physics and astronomy, the
instruments are so expensive that data
must be shared

Data analysis is the new microscope
Human Genome, Large Hydron Collider
Jim Gray
●
●
●
●

Thousand years ago: science was
empirical describing natural
phenomena
Last few hundred years: theoretical
branch using models, generalizations
Last few decades: a computational
branch simulating complex
phenomena
Today:data exploration (eScience)
unify theory, experiment, and
simulation
○ Data captured by instruments
Or generated by simulator
○ Processed by software
○ Information/Knowledge stored
in computer
○ Scientist analyzes database /
files using data management
and statistics

On Sunday, January 28, 2007, during a short solo sailing trip to the Farallon Islands near San
Francisco to scatter his mother's ashes, Gray and his 40-foot yacht, Tenacious, were reported
missing by his wife, Donna Carnes. The Coast Guard searched for four days using a C-130
plane, helicopters, and patrol boats but found no sign of the vessel.[10][11][12][13]
Gray's boat was equipped with an automatically deployable EPIRB (Emergency PositionIndicating Radio Beacon), which should have deployed and begun transmitting the instant his
vessel sank. The area around the Farallon Islands where Gray was sailing is well north of the
East-West ship channel used by freighters entering and leaving San Francisco Bay. The
weather was clear that day and no ships reported striking his boat, nor were any distress radio
transmissions reported.
On February 1, 2007, the DigitalGlobe satellite did a scan of the area, generating thousands of
images.[14] The images were posted to Amazon Mechanical Turk in order to distribute the work
of searching through them, in hopes of spotting his boat.
In the immediate aftermath of the disappearance, many theories were put forward on how
Gray disappeared.[15]
On February 16, 2007, the family and Friends of Jim Gray Group suspended their search,[16]
but continue to follow any important leads. The family ended its underwater search May 31,
2007. Despite much effort and use of high-tech equipment above and below water, searches
did not reveal any new clues.[17][18][19][20][21][22]

Personal life[edit]
While at Berkeley, Gray and his first wife Loretta had a daughter; the couple later divorced.[2]
He is survived by his wife, Donna Carnes, his daughter, three grandchildren, and his sister
Gail.
The University of California, Berkeley and Gray's family hosted a tribute to him on May 31,
2008. The conference included sessions delivered by Richard Rashid and David Vaskevitch.
[23]

Microsoft's WorldWide Telescope software is dedicated to Gray. In 2008, Microsoft opened

a research center in Madison, Wisconsin, named after Jim Gray.[24]
Having being missing for five years as of May 16, 2012, Gray is legally assumed to have died
at sea.[4][25]

Jim Gray Award[edit]
Each year, Microsoft Research presents the Jim Gray eScience Award[26] to a researcher who
has made an outstanding contribution to the field of data-intensive computing. Award
recipients are selected for their ground-breaking, fundamental contributions to the field of
eScience. Previous award winners include Alex Szalay (2007), Carole Goble (2008), Jeff
Dozier (2009), Phil Bourne (2010), Mark Abbott (2011) and Antony John Williams (2012).

Books[edit]
●

Transaction Processing: Concepts and Techniques (with Andreas Reuter) (1993).
ISBN 1-55860-190-2.

●

The Benchmark Handbook: For Database and Transaction Processing Systems
(1991). Morgan Kaufmann. ISBN 978-1-55860-159-8.

See also
esciencecenter
Projecten
Chris Anderson
This is a world where massive amounts of data and applied mathematics replace every
other tool that might be brought to bear. Out with every theory of human behavior, from
linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why
people do what they do? The point is they do it, and we can track and measure it with
unprecedented fidelity. With enough data, the numbers speak for themselves.
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can
stop looking for models. We can analyze the data without hypotheses about what it might
show. We can throw the numbers into the biggest computing clusters
the world has ever seen and let statistical algorithms find patterns
where science cannot.
The end of theory:
Edge
Wired
Cukier and MAyer-Schonberger
Shift 1: End of Samples
Shift 2: End of exactitude
Shift 3: End of Causality
patterns & correlations
if you know that your customers are going to buy more products
by analyzing a data set or correlation, then the “why” doesn’t matter
— you should try to exploit that.

The technical equivalent in big data is the ability to survey a whole population instead
of just sampling random portions of it.
with less error from sampling we can accept more measurement error”. According to
the authors, science is obsessed with sampling and measurement error as a
consequence of coping in a ‘small data’ world.
The third and most radical shift implies “we won’t have to be fixated on causality [...]
the idea of understanding the reasons behind all that happens.” This is a straw
Nate Silver

“We're not that much smarter than we
used to be, even though we have much
more information - and that means the
real skill now is learning how to pick out
the useful information from all this noise.”

“I came to realize that prediction in the era
of Big Data was not going very well.”
“If the quantity of information is increasing
[exponentially]… Most of it is just noise.”
“… numbers have no way of speaking for
themselves. We speak for them.”

Nate Silver has lived a preposterously interesting life. In 2002, while toiling away as a
lowly consultant for the accounting firm KPMG, he hatched a revolutionary method for
predicting the performance of baseball players, which the Web site Baseball
Prospectus subsequently acquired. The following year, he took up poker in his spare
time and quit his job after winning $15,000 in six months. (His annual poker winnings
soon ran into the six-figures.)
Nasim Taleb

Big Data is bullshit
This is the tragedy of big data: The more
variables, the more correlations that can show
significance. Falsity also grows faster than
information; it is nonlinear (convex) with respect
to data.

1.

It is an outlier, as it lies outside the realm of
regular expectations, because nothing in the past
can convincingly point to its possibility.

2.

It carries an extreme 'impact'.

3.

in spite of its outlier status, human nature makes
us concoct explanations for its occurrence after

I am not saying here that there is no information
in big data. There is plenty of information. The
problem — the central issue — is that the
needle comes in an increasingly larger
haystack.

the fact, making it explainable and predictable.
A small number of Black Swans explains almost
everything in our world, from the success of ideas and
religions, to the dynamics of historical events, to
elements of our own personal lives.
Ludic Fallay

The discovery of the Higgs particle was a dissapointment for some physicist because now they
know what they don’t know: no big things to discover

The ludic fallacy is a term coined by Nassim Nicholas Taleb in his 2007 book The Black
Swan. "Ludic" is from the Latin ludus, meaning "play, game, sport, pastime."[1] It is
summarized as "the misuse of games to model real-life situations."[2] Taleb explains the fallacy
as "basing studies of chance on the narrow world of games and dice."[3]
It is a central argument in the book and a rebuttal of the predictive mathematical models used
to predict the future – as well as an attack on the idea of applying naïve and simplified
statistical models in complex domains. According to Taleb, statistics works only in some
domains like casinos in which the odds are visible and defined. Taleb's argument centers on
the idea that predictive models are based on platonified forms, gravitating towards
mathematical purity and failing to take some key ideas into account:
●

It is impossible to be in possession of all the information.

●

Very small unknown variations in the data could have a huge impact. Taleb does
differentiate his idea from that of mathematical notions in chaos theory, e.g. the
butterfly effect.

●

Theories/Models based on empirical data are flawed, as they cannot predict
events that have never happened before, but have tremendous impact. E.g. the
911 terrorist attacks, invention of the automobile, etc.
Discover what you (don’t) know you
don’t know?
BIG Data
Data Characteristics are changing
BI community
●
●
●
●
●
●
●

Collegues..

Data integration is already 20+ years old
Just another source
We do not have much data
Small or big data: it has to be managed
Big data = business analytics
One-off projects (data is too varied)
We know what data is all about. Nobody has to tell us what you can do with data.
Gartner’s definition (2001)
Big Data is high-volume, high-velocity, and/or high-variety information assets that require
new forms of processing to enable enhanced decision making, insight discovery and
process optimization.
●
●
●

Volume: relative size of data sources
Velocity: speed at which data refresh is handled
Variety: handling various data formats

●

(Validity, Veracity( accuracy, correctness, applicability), Value, and Visibility)
Variety

source: Hortonworks
Velocity

keeping history for clickpaths isn’t interesting if the site is changing through the years.
Volume
“Information was a pond
and has become a river”
Peter Hinssen

fantastiche leuke spreker op het SAS forum. goede presentatie : filtering wordt/is heel
belangrijk
Liquid Data

om data actionable te houden moet er instant gerageerd worden. . vissen in een meer
versus vissen in een rivier. zoveel water dat snel voorbij stroomt
Barry Devlin
The true godfather of Data
warehousing.
●

●

●

Human Sourced Information
○ is now largely digitized and
electronically stored everywhere
from tweets to movies
Process-mediated data
○ This data includes transactions,
reference tables and
relationships, as well as the
metadata that sets its context, all
in a highly structured form.
Machine-generated data
○ from simple sensor records to
complex computer logs
Impact on the DWH
●

●

●

●

The central core business data pillar
is the consistent, quality-assured
data found in EDW and MDM
systems
Deep analytic information requires
highly flexible, large scale
processing such as the statistical
analysis and text mining
Fast analytic data requires such
high-speed analytic processing that
it must be done on data in-flight,
Specialty analytic data, using
specialized processing such as
NoSQL, XML, graph and other
databases and data stores

inmon richt zich nu op deep analytic information met zijn text mining
BIG Data
New Tools
Other BIG data related trends
●
●
●

elastic cloud
nosql
data visualization
Nosql
A NoSQL database provides a mechanism for storage and retrieval of data that employs less
constrained consistency models than traditional relational databases. NoSQL systems are also
referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be
used.
●

Document: MongoDB, Couchbase

●

Key-value : Dynamo, Riak,
Redis, Cache,
Project Voldemort

●

Graph: Neo4J, Allegro,
Virtuoso
Nosql: Mongo DB
●

How and Why Leading Investment Organizations are Migrating to MongoDB

●

Real World MongoDB: Use Cases from Financial Services

●

How Financial Firms Create Single Customer Views Using MongoDB

●

How Banks Use MongoDB to Manage Risk

●

How Banks Manage Reference Data with MongoDB

●

How Banks Use MongoDB
as a Tick Database

●

Position and Trade Management
withMongoDB
Nosql: Neo4j

Graph database
●

Nodes represent entities

●

Properties are pertinent
information that relate to nodes.

●

Edges are the lines that connect
nodes to nodes or nodes to
properties and they represent the
relationship between the two
dataviz: synerscope

Ooh/aah strategy: first be amazed then understand
Local intelligence: ORTEC/TSS
Ortec Team Support Systems (ORTEC TSS),
develops decision, support & information ICTSystems to analyze sport performances.
These software systems are employed before,
during and after sport matches. During a match,
they are used to measure teams’ and players’
performances.
Following top athletes and talents by their clubs,
teams, sponsors, unions and the public has
been brought to a whole new dimension
because of these systems.
Internet of Things
Elastic cloud: Amazon Redshift
$999 per TB per year

Amazon Redshift
$999 per TB per year
Hadoop….

●
●
●

●

●

●

●

ecosystem isn’t stable. A lot of configurations are possible
Hadoop is complex. Java expertise.
Apache Hadoop : Open source Hadoop framework in Java.
Consists of Hadoop Common Package (filesystem and OS
abstractions), a MapReduce engine (MapReduce or YARN),
and Hadoop Distributed File System (HDFS)
Apache Mahout : Machine learning algorithms for
collaborative filtering, clustering, and classification using
Hadoop
Apache Hive : Data warehouse infrastructure for Hadoop.
Provides data summarization, query, and analysis using a
SQL- like language called HiveQL. Stores data in an
embedded Apache Derby database.
Apache Pig: Platform for creating MapReduce programs
using a high-level “Pig Latin” language. Makes MapReduce
programming similar to SQL. Can be extended by user
defined functions written in Java, Python, etc
Apache Avro: Data serialization system. Avro IDL is the
interface description language syntax for Avro.
●

●

●
●
●
●
●
●
●

Apache HBase: Non-relational DBMS part of the Hadoop
project. Designed for large quantities of sparse data (like
BigTable). Provides a Java API for map reduce jobs to
access the data. Used by Facebook.
Apache ZooKeeper : Distributed configuration service,
synchronization service, and naming registry for large
distributed systems like Hadoop.
Apache Cassandra: Distributed database management
system. Highly scalable.
Apache Ambari: A web-based tool for provision, managing
and monitoring Apache Hadoop cluster
Apache Chukwa: A data collection system for managing large
distributed systems
Apache Sqoop: Tool for transferring bulk data between
structured databases and Hadoop
Apache Oozie: A workflow scheduler system to manage
Apache Hadoop jobs
Hadoop jobs
From a single solution to an
Ecosystem
BIG Data
Business Opportunities
Mckinsey’s big data report
For big data, 2013 is the year of experimentation and early deployment," said Frank
Buytendijk, research vice president at the research firm. "Adoption is still at the early
stages with less than 8 percent of all respondents indicating their organization has
deployed big data solutions. [Across the board], 20 percent are piloting and
experimenting, 18 percent are developing a strategy, 19 percent are knowledge
gathering, while the remainder has no plans or don't know."
Has "Big Data" significantly changed
Data Science principles and practice?

kdnuggets poll (Oct 29, 2013.)
Analytics is BIG

analytics is hotter. green line is google analytics: blue line should be corrected for that
Kaggle
●
●
●
●
●

Platform for predictive analytics competitions
Business hands over part of the data and keeps part of the data sets
Contenders build models based on the available data
Contenders predict the values of the kept data sets
Best prediction wins the competition
Algoritmica
Science Rockstars
Ewatercycle

A global hydrological model will provide the international community with the best
possible estimates of the state of water resources in the world.
Assimilation of remotely sensed and in situ data will be a major mathematical
and computational challenge.
A successful implementation of the project will lead to a community model for
hydrologists across the globe.
- See more at: http://esciencecenter.nl/projects/project-portfolio/watermanagement/#sthash.Pj7kDbBI.dpuf
BIG Data
Cultural shift in using data
“Perhaps the most important cultural trend
today: The explosion of data about every
aspect of our world and the rise of applied math
gurus who know how to use it.”
Chris Anderson
Sharing: Silk

Since Silk first came out of stealth mode in 2011, there have been 300,000 interactive
pages created on its cloud-based, web data-crunching platform designed for nontechnical “knowledge workers.” Taking less easy-to-read data sets and making them
more digestible, results have ranged from the Guardian newspaper in the UK creating
graphics of which countries have the most asylum seekers, through to charting what
products Google has killed and dads mapping out the best playgrounds for his kid in
Amsterdam (where Silk also happens to be founded). It’s been a popular, and free,
tool, with pages created by some 16,000 people growing by 20 percent each month.
Now, Silk is moving on to its next phase: its first paid product, Silk for Teams, aimed
at groups of enterprise users who want to use the platform to produce cleaner internal
data sets, and eventually to create data visualizations that work with paywalls.
Open Data

anay idea’s?
“Our research suggests that seven sectors
alone could generate more than $3 trillion a
year in additional value as a result of open
data…”
Mckinsey
Open Data

Open data: Unlocking innovation and performance with liquid information
A new McKinsey report says that open data can help create $3 trillion a year of
economic value across seven sectors. In a related podcast, the McKinsey Global
Institute’s Michael Chui discusses the economic
Data.Overheid.nl
Cap Gemini
Data Journalism

new york times, guardian, sargasso, nu.nl
Quantified Self
Quantified Self
Quantified Self
Quantified Self

Combining all the sources of this and the previous 3 slides and finding correlations is
the essence of (big) data analytics.
example: combining sunpower with sleepcycle and fitness and diet
BIG Data
Opportunities for BI professionals
“The ability to take data — to be able to
understand it, to process it, to extract value
from it, to visualize it, to communicate it — that’
s going to be a hugely important skill in the next
decades.”
Hal Varian

Google guru
“The illiterate of the 21st century will not be
those who cannot read and write, but those
who cannot learn, unlearn, relearn”
Alvin Toffler
Mckinsey report highlights
A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in
statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big
data… Furthermore, this type of talent is difficult to produce, taking years of training in the case of someone with intrinsic
mathematical abilities. (p.10)
Data Scientist
Applying varying degrees of statistics, data visualizations, computer programming, data mining,
machine learning, and database engineering to solve complex data problems.
●

Association rule learning

●

Pattern Recognition

●

Classification

●

Predictive Modelling

●

Cluster Analysis

●

Regression

●

Crowd Sourcing

●

Sentiment Analysis

●

Data Fusion and Integration

●

Signal Processing

●

Ensemble Learning

●

Supervised and Unsupervised

●

Genetic Algorithms

●

Machine Learning

●

Simulation

●

Natural Language Processing

●

Time Series Analysis

●

Neural Networks

●

Visualization

Learning
Typical Big Data Job is not a BI Job
JOB OPENING: BIG DATA ARCHITECT
We are looking to expand our core product team with a Senior Java Developer/Architect that will contribute in the
product design and development and take pride in the delivery of kick-a** products.

Knowledge, Skills and Experience
●

Minimum 4 years Java experience

●

Experience with NoSQL Databases, preferably MongoDB (MapReduce, Sharding)

●

Experience with Cloud-based infrastructure, esp. AWS

●

Expertise with Hadoop eco-system is a plus (examples: Flume, Zookeeper, Ganglia, etc)

●

Experience with Web services (REST/SOAP)

●

Obsession with performance and big data

●

Passion for elegant technical design and good programming practices (TDD, CI)

●

Energetic “self-starter” , have the will to take ownership, and be accountable for deliverables

●

A true defender of quality and (light-weight) documentation of the designs

●

●
●

Relevant HBO/University education or experience

●

Sense of humor is essential

Not typical BI
hardcore tech..
Personal Strategies
●

●
●

●

●

●
●

Do nothing
○ Just sell your personal data
○ Wait untill the big DM companies incorporate Hadoop ecosystem
Hadoop expert
○ Learn java and the hadoop ecosystem
Data scientist
○ Learn Python/R
○ Learn statistics and all kinds of algorithms (especially Bayes)
Data architect/manager
○ Learn the principles of hadoop/nosql
○ Learn how to integrate (big) data in the enterprise dwh
○ data governance/ data stewardship/ DQ / metadata
BI(g) Tool Specialist
○ Adopt a big data dataviz or reporting tool (Splunk, Platfora)
○ Adopt a platform (Cloudera, Hortonworks, MapR, Azure, Google, Amazon)
Data artist
○ Data visualization tools, design info graphics
Data story teller
○ data journalism course
Group Activities
●

●

●

●

Expert Groups
○ Explore platforms
○ Explore tools
Open data for personal and group branding
○ Start a project
○ Join open data sites
Data journalism
○ Start a blog/join a blog
○ Make news with data
Business Cases
○ Scanning business cases
○ Almere Datacapital

Group Activities BI United
living in an big data augmented
world

More Related Content

What's hot

Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Big data v4.0
Big data v4.0Big data v4.0
Big data v4.0Ian Brown
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013The Pathway Group
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningHJ van Veen
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Prof. Dr. Diego Kuonen
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...IABmembership
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopIan Hopkinson
 
Strata Conference NYC 2013 Full Version
Strata Conference NYC 2013 Full VersionStrata Conference NYC 2013 Full Version
Strata Conference NYC 2013 Full VersionTaewook Eom
 

What's hot (20)

Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Big data v4.0
Big data v4.0Big data v4.0
Big data v4.0
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
THE AGE OF SCALE
THE AGE OF SCALETHE AGE OF SCALE
THE AGE OF SCALE
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecture
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
 
Strata Conference NYC 2013 Full Version
Strata Conference NYC 2013 Full VersionStrata Conference NYC 2013 Full Version
Strata Conference NYC 2013 Full Version
 

Similar to Bi(G) data: opportunities for BI Professionals

TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docxssuserf9c51d
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Han Woo PARK
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lrDominic A Ienco
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Data-driven journalism (GIJC, Geneva April 2010) #ddj
Data-driven journalism (GIJC, Geneva April 2010) #ddjData-driven journalism (GIJC, Geneva April 2010) #ddj
Data-driven journalism (GIJC, Geneva April 2010) #ddjMirko Lorenz
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lectureRob Jewitt
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)Prof. Dr. Diego Kuonen
 

Similar to Bi(G) data: opportunities for BI Professionals (20)

TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Ayasdi Case Study
Ayasdi Case StudyAyasdi Case Study
Ayasdi Case Study
 
Ayasdi: Demystifying the Unknown
Ayasdi: Demystifying the UnknownAyasdi: Demystifying the Unknown
Ayasdi: Demystifying the Unknown
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Big Data Paper
Big Data PaperBig Data Paper
Big Data Paper
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Data-driven journalism (GIJC, Geneva April 2010) #ddj
Data-driven journalism (GIJC, Geneva April 2010) #ddjData-driven journalism (GIJC, Geneva April 2010) #ddj
Data-driven journalism (GIJC, Geneva April 2010) #ddj
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lecture
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)
 

Recently uploaded

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Bi(G) data: opportunities for BI Professionals

  • 1. BI(G) DATA Opportunities for BI professionals in the Netherlands Most companies mentioned are Dutch
  • 2. Our fantasy... At Last: an IT job is sexy
  • 3. Agenda ● Big Data views ○ Scientific Method ○ Data Characteristics ○ New Technology ○ Business Opportunities ○ Culture ● Opportunities for BI professionals
  • 4. Google Trends The famous McKinsey Report: Big data: The next frontier for innovation, competition, and productivity BIG Data became trending because of Mckinsey Now it’s correlated with hadoop
  • 5. Wikipedia Big Data Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.[19] Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data.[20] With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data. Focus on volume… instead of other V’s
  • 6. BIG Data The Scientific method is changing
  • 7.
  • 8. The Fourth Paradigm: Data-Intensive Scientific Discovery Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. Implicit in the idea of a fourth paradigm is the ability, and the need, to share data. In sciences like physics and astronomy, the instruments are so expensive that data must be shared Data analysis is the new microscope Human Genome, Large Hydron Collider
  • 9. Jim Gray ● ● ● ● Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch using models, generalizations Last few decades: a computational branch simulating complex phenomena Today:data exploration (eScience) unify theory, experiment, and simulation ○ Data captured by instruments Or generated by simulator ○ Processed by software ○ Information/Knowledge stored in computer ○ Scientist analyzes database / files using data management and statistics On Sunday, January 28, 2007, during a short solo sailing trip to the Farallon Islands near San Francisco to scatter his mother's ashes, Gray and his 40-foot yacht, Tenacious, were reported missing by his wife, Donna Carnes. The Coast Guard searched for four days using a C-130 plane, helicopters, and patrol boats but found no sign of the vessel.[10][11][12][13] Gray's boat was equipped with an automatically deployable EPIRB (Emergency PositionIndicating Radio Beacon), which should have deployed and begun transmitting the instant his vessel sank. The area around the Farallon Islands where Gray was sailing is well north of the East-West ship channel used by freighters entering and leaving San Francisco Bay. The weather was clear that day and no ships reported striking his boat, nor were any distress radio transmissions reported. On February 1, 2007, the DigitalGlobe satellite did a scan of the area, generating thousands of images.[14] The images were posted to Amazon Mechanical Turk in order to distribute the work of searching through them, in hopes of spotting his boat. In the immediate aftermath of the disappearance, many theories were put forward on how Gray disappeared.[15] On February 16, 2007, the family and Friends of Jim Gray Group suspended their search,[16]
  • 10. but continue to follow any important leads. The family ended its underwater search May 31, 2007. Despite much effort and use of high-tech equipment above and below water, searches did not reveal any new clues.[17][18][19][20][21][22] Personal life[edit] While at Berkeley, Gray and his first wife Loretta had a daughter; the couple later divorced.[2] He is survived by his wife, Donna Carnes, his daughter, three grandchildren, and his sister Gail. The University of California, Berkeley and Gray's family hosted a tribute to him on May 31, 2008. The conference included sessions delivered by Richard Rashid and David Vaskevitch. [23] Microsoft's WorldWide Telescope software is dedicated to Gray. In 2008, Microsoft opened a research center in Madison, Wisconsin, named after Jim Gray.[24] Having being missing for five years as of May 16, 2012, Gray is legally assumed to have died at sea.[4][25] Jim Gray Award[edit] Each year, Microsoft Research presents the Jim Gray eScience Award[26] to a researcher who has made an outstanding contribution to the field of data-intensive computing. Award recipients are selected for their ground-breaking, fundamental contributions to the field of eScience. Previous award winners include Alex Szalay (2007), Carole Goble (2008), Jeff Dozier (2009), Phil Bourne (2010), Mark Abbott (2011) and Antony John Williams (2012). Books[edit] ● Transaction Processing: Concepts and Techniques (with Andreas Reuter) (1993). ISBN 1-55860-190-2. ● The Benchmark Handbook: For Database and Transaction Processing Systems (1991). Morgan Kaufmann. ISBN 978-1-55860-159-8. See also
  • 12. Chris Anderson This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. The end of theory: Edge Wired
  • 13. Cukier and MAyer-Schonberger Shift 1: End of Samples Shift 2: End of exactitude Shift 3: End of Causality patterns & correlations if you know that your customers are going to buy more products by analyzing a data set or correlation, then the “why” doesn’t matter — you should try to exploit that. The technical equivalent in big data is the ability to survey a whole population instead of just sampling random portions of it. with less error from sampling we can accept more measurement error”. According to the authors, science is obsessed with sampling and measurement error as a consequence of coping in a ‘small data’ world. The third and most radical shift implies “we won’t have to be fixated on causality [...] the idea of understanding the reasons behind all that happens.” This is a straw
  • 14. Nate Silver “We're not that much smarter than we used to be, even though we have much more information - and that means the real skill now is learning how to pick out the useful information from all this noise.” “I came to realize that prediction in the era of Big Data was not going very well.” “If the quantity of information is increasing [exponentially]… Most of it is just noise.” “… numbers have no way of speaking for themselves. We speak for them.” Nate Silver has lived a preposterously interesting life. In 2002, while toiling away as a lowly consultant for the accounting firm KPMG, he hatched a revolutionary method for predicting the performance of baseball players, which the Web site Baseball Prospectus subsequently acquired. The following year, he took up poker in his spare time and quit his job after winning $15,000 in six months. (His annual poker winnings soon ran into the six-figures.)
  • 15. Nasim Taleb Big Data is bullshit This is the tragedy of big data: The more variables, the more correlations that can show significance. Falsity also grows faster than information; it is nonlinear (convex) with respect to data. 1. It is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. 2. It carries an extreme 'impact'. 3. in spite of its outlier status, human nature makes us concoct explanations for its occurrence after I am not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack. the fact, making it explainable and predictable. A small number of Black Swans explains almost everything in our world, from the success of ideas and religions, to the dynamics of historical events, to elements of our own personal lives.
  • 16. Ludic Fallay The discovery of the Higgs particle was a dissapointment for some physicist because now they know what they don’t know: no big things to discover The ludic fallacy is a term coined by Nassim Nicholas Taleb in his 2007 book The Black Swan. "Ludic" is from the Latin ludus, meaning "play, game, sport, pastime."[1] It is summarized as "the misuse of games to model real-life situations."[2] Taleb explains the fallacy as "basing studies of chance on the narrow world of games and dice."[3] It is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics works only in some domains like casinos in which the odds are visible and defined. Taleb's argument centers on the idea that predictive models are based on platonified forms, gravitating towards mathematical purity and failing to take some key ideas into account: ● It is impossible to be in possession of all the information. ● Very small unknown variations in the data could have a huge impact. Taleb does differentiate his idea from that of mathematical notions in chaos theory, e.g. the butterfly effect. ● Theories/Models based on empirical data are flawed, as they cannot predict events that have never happened before, but have tremendous impact. E.g. the 911 terrorist attacks, invention of the automobile, etc.
  • 17.
  • 18. Discover what you (don’t) know you don’t know?
  • 20. BI community ● ● ● ● ● ● ● Collegues.. Data integration is already 20+ years old Just another source We do not have much data Small or big data: it has to be managed Big data = business analytics One-off projects (data is too varied) We know what data is all about. Nobody has to tell us what you can do with data.
  • 21. Gartner’s definition (2001) Big Data is high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. ● ● ● Volume: relative size of data sources Velocity: speed at which data refresh is handled Variety: handling various data formats ● (Validity, Veracity( accuracy, correctness, applicability), Value, and Visibility)
  • 23. Velocity keeping history for clickpaths isn’t interesting if the site is changing through the years.
  • 25. “Information was a pond and has become a river” Peter Hinssen fantastiche leuke spreker op het SAS forum. goede presentatie : filtering wordt/is heel belangrijk
  • 26. Liquid Data om data actionable te houden moet er instant gerageerd worden. . vissen in een meer versus vissen in een rivier. zoveel water dat snel voorbij stroomt
  • 27. Barry Devlin The true godfather of Data warehousing. ● ● ● Human Sourced Information ○ is now largely digitized and electronically stored everywhere from tweets to movies Process-mediated data ○ This data includes transactions, reference tables and relationships, as well as the metadata that sets its context, all in a highly structured form. Machine-generated data ○ from simple sensor records to complex computer logs
  • 28. Impact on the DWH ● ● ● ● The central core business data pillar is the consistent, quality-assured data found in EDW and MDM systems Deep analytic information requires highly flexible, large scale processing such as the statistical analysis and text mining Fast analytic data requires such high-speed analytic processing that it must be done on data in-flight, Specialty analytic data, using specialized processing such as NoSQL, XML, graph and other databases and data stores inmon richt zich nu op deep analytic information met zijn text mining
  • 30.
  • 31. Other BIG data related trends ● ● ● elastic cloud nosql data visualization
  • 32. Nosql A NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational databases. NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be used. ● Document: MongoDB, Couchbase ● Key-value : Dynamo, Riak, Redis, Cache, Project Voldemort ● Graph: Neo4J, Allegro, Virtuoso
  • 33. Nosql: Mongo DB ● How and Why Leading Investment Organizations are Migrating to MongoDB ● Real World MongoDB: Use Cases from Financial Services ● How Financial Firms Create Single Customer Views Using MongoDB ● How Banks Use MongoDB to Manage Risk ● How Banks Manage Reference Data with MongoDB ● How Banks Use MongoDB as a Tick Database ● Position and Trade Management withMongoDB
  • 34. Nosql: Neo4j Graph database ● Nodes represent entities ● Properties are pertinent information that relate to nodes. ● Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two
  • 35. dataviz: synerscope Ooh/aah strategy: first be amazed then understand
  • 36. Local intelligence: ORTEC/TSS Ortec Team Support Systems (ORTEC TSS), develops decision, support & information ICTSystems to analyze sport performances. These software systems are employed before, during and after sport matches. During a match, they are used to measure teams’ and players’ performances. Following top athletes and talents by their clubs, teams, sponsors, unions and the public has been brought to a whole new dimension because of these systems.
  • 38. Elastic cloud: Amazon Redshift $999 per TB per year Amazon Redshift $999 per TB per year
  • 39. Hadoop…. ● ● ● ● ● ● ● ecosystem isn’t stable. A lot of configurations are possible Hadoop is complex. Java expertise. Apache Hadoop : Open source Hadoop framework in Java. Consists of Hadoop Common Package (filesystem and OS abstractions), a MapReduce engine (MapReduce or YARN), and Hadoop Distributed File System (HDFS) Apache Mahout : Machine learning algorithms for collaborative filtering, clustering, and classification using Hadoop Apache Hive : Data warehouse infrastructure for Hadoop. Provides data summarization, query, and analysis using a SQL- like language called HiveQL. Stores data in an embedded Apache Derby database. Apache Pig: Platform for creating MapReduce programs using a high-level “Pig Latin” language. Makes MapReduce programming similar to SQL. Can be extended by user defined functions written in Java, Python, etc Apache Avro: Data serialization system. Avro IDL is the interface description language syntax for Avro.
  • 40. ● ● ● ● ● ● ● ● ● Apache HBase: Non-relational DBMS part of the Hadoop project. Designed for large quantities of sparse data (like BigTable). Provides a Java API for map reduce jobs to access the data. Used by Facebook. Apache ZooKeeper : Distributed configuration service, synchronization service, and naming registry for large distributed systems like Hadoop. Apache Cassandra: Distributed database management system. Highly scalable. Apache Ambari: A web-based tool for provision, managing and monitoring Apache Hadoop cluster Apache Chukwa: A data collection system for managing large distributed systems Apache Sqoop: Tool for transferring bulk data between structured databases and Hadoop Apache Oozie: A workflow scheduler system to manage Apache Hadoop jobs
  • 42. From a single solution to an Ecosystem
  • 44.
  • 46. For big data, 2013 is the year of experimentation and early deployment," said Frank Buytendijk, research vice president at the research firm. "Adoption is still at the early stages with less than 8 percent of all respondents indicating their organization has deployed big data solutions. [Across the board], 20 percent are piloting and experimenting, 18 percent are developing a strategy, 19 percent are knowledge gathering, while the remainder has no plans or don't know."
  • 47.
  • 48.
  • 49.
  • 50.
  • 51. Has "Big Data" significantly changed Data Science principles and practice? kdnuggets poll (Oct 29, 2013.)
  • 52. Analytics is BIG analytics is hotter. green line is google analytics: blue line should be corrected for that
  • 53. Kaggle ● ● ● ● ● Platform for predictive analytics competitions Business hands over part of the data and keeps part of the data sets Contenders build models based on the available data Contenders predict the values of the kept data sets Best prediction wins the competition
  • 56. Ewatercycle A global hydrological model will provide the international community with the best possible estimates of the state of water resources in the world. Assimilation of remotely sensed and in situ data will be a major mathematical and computational challenge. A successful implementation of the project will lead to a community model for hydrologists across the globe. - See more at: http://esciencecenter.nl/projects/project-portfolio/watermanagement/#sthash.Pj7kDbBI.dpuf
  • 57. BIG Data Cultural shift in using data
  • 58. “Perhaps the most important cultural trend today: The explosion of data about every aspect of our world and the rise of applied math gurus who know how to use it.” Chris Anderson
  • 59. Sharing: Silk Since Silk first came out of stealth mode in 2011, there have been 300,000 interactive pages created on its cloud-based, web data-crunching platform designed for nontechnical “knowledge workers.” Taking less easy-to-read data sets and making them more digestible, results have ranged from the Guardian newspaper in the UK creating graphics of which countries have the most asylum seekers, through to charting what products Google has killed and dads mapping out the best playgrounds for his kid in Amsterdam (where Silk also happens to be founded). It’s been a popular, and free, tool, with pages created by some 16,000 people growing by 20 percent each month. Now, Silk is moving on to its next phase: its first paid product, Silk for Teams, aimed at groups of enterprise users who want to use the platform to produce cleaner internal data sets, and eventually to create data visualizations that work with paywalls.
  • 61. “Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data…” Mckinsey
  • 62. Open Data Open data: Unlocking innovation and performance with liquid information A new McKinsey report says that open data can help create $3 trillion a year of economic value across seven sectors. In a related podcast, the McKinsey Global Institute’s Michael Chui discusses the economic
  • 65. Data Journalism new york times, guardian, sargasso, nu.nl
  • 69. Quantified Self Combining all the sources of this and the previous 3 slides and finding correlations is the essence of (big) data analytics. example: combining sunpower with sleepcycle and fitness and diet
  • 70. BIG Data Opportunities for BI professionals
  • 71. “The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’ s going to be a hugely important skill in the next decades.” Hal Varian Google guru
  • 72. “The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, relearn” Alvin Toffler
  • 73. Mckinsey report highlights A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data… Furthermore, this type of talent is difficult to produce, taking years of training in the case of someone with intrinsic mathematical abilities. (p.10)
  • 74. Data Scientist Applying varying degrees of statistics, data visualizations, computer programming, data mining, machine learning, and database engineering to solve complex data problems. ● Association rule learning ● Pattern Recognition ● Classification ● Predictive Modelling ● Cluster Analysis ● Regression ● Crowd Sourcing ● Sentiment Analysis ● Data Fusion and Integration ● Signal Processing ● Ensemble Learning ● Supervised and Unsupervised ● Genetic Algorithms ● Machine Learning ● Simulation ● Natural Language Processing ● Time Series Analysis ● Neural Networks ● Visualization Learning
  • 75. Typical Big Data Job is not a BI Job JOB OPENING: BIG DATA ARCHITECT We are looking to expand our core product team with a Senior Java Developer/Architect that will contribute in the product design and development and take pride in the delivery of kick-a** products. Knowledge, Skills and Experience ● Minimum 4 years Java experience ● Experience with NoSQL Databases, preferably MongoDB (MapReduce, Sharding) ● Experience with Cloud-based infrastructure, esp. AWS ● Expertise with Hadoop eco-system is a plus (examples: Flume, Zookeeper, Ganglia, etc) ● Experience with Web services (REST/SOAP) ● Obsession with performance and big data ● Passion for elegant technical design and good programming practices (TDD, CI) ● Energetic “self-starter” , have the will to take ownership, and be accountable for deliverables ● A true defender of quality and (light-weight) documentation of the designs ● ● ● Relevant HBO/University education or experience ● Sense of humor is essential Not typical BI hardcore tech..
  • 76. Personal Strategies ● ● ● ● ● ● ● Do nothing ○ Just sell your personal data ○ Wait untill the big DM companies incorporate Hadoop ecosystem Hadoop expert ○ Learn java and the hadoop ecosystem Data scientist ○ Learn Python/R ○ Learn statistics and all kinds of algorithms (especially Bayes) Data architect/manager ○ Learn the principles of hadoop/nosql ○ Learn how to integrate (big) data in the enterprise dwh ○ data governance/ data stewardship/ DQ / metadata BI(g) Tool Specialist ○ Adopt a big data dataviz or reporting tool (Splunk, Platfora) ○ Adopt a platform (Cloudera, Hortonworks, MapR, Azure, Google, Amazon) Data artist ○ Data visualization tools, design info graphics Data story teller ○ data journalism course
  • 77. Group Activities ● ● ● ● Expert Groups ○ Explore platforms ○ Explore tools Open data for personal and group branding ○ Start a project ○ Join open data sites Data journalism ○ Start a blog/join a blog ○ Make news with data Business Cases ○ Scanning business cases ○ Almere Datacapital Group Activities BI United
  • 78. living in an big data augmented world