SlideShare ist ein Scribd-Unternehmen logo
1 von 30
1 Sponsored by:
Sponsored by:
‘Bad Data’ Is Polluting Big Data
Enterprises Struggle with Real-Time Control of Data Flows
A Global Survey of Big Data Professionals
June 2016
2
Executive Summary
The big data market is still maturing, especially as relates to
data in motion and as evidenced by lack of best practices or
consistent processes to clean and manage data quality. For
companies who use big data to optimize current business
operations or to make strategic decisions, it is critical
that they ensure their big data teams have real-time
visibility and control over the data at all times.
This report finds that companies who are leveraging big data are rarely
capable of controlling their data flows. Almost 9 out of 10 companies
report ‘bad data’ polluting their data stores and shockingly nearly 3/4
indicate there is ‘bad data’ in their stores currently. The findings also
reveal a chasm between the problem detection capabilities data experts
have today and what they desire. This translates into a lack of real-time
visibility and control of data flows, operations, quality and security.
3 Sponsored by:3
Key Findings
• 87% state ‘bad data’ pollutes their data stores while 74% state ‘bad data’ is
currently in their data stores
• Ensuring data quality was the most common challenge cited, by 68% of
respondents, and only 34% claimed to be good at detecting divergent data
• 72% responded that they hand code their data flows while 53% claimed they
have to change each pipeline at least several times a month
• Tremendous gaps exist between today’s big data flow management tools’
capabilities and what is needed
• Only 10% of respondents rated their performance as good or excellent across 5
key data flow operational performance areas
• 72% desire a single pane of glass solution to manage all data flows
• 81% state there is a significant operational impact when they upgrade big data
components
4 Sponsored by:
METHODOLOGY AND
PARTICIPANTS
5 Sponsored by:5
Research Goal
The primary research goal was to capture how
companies manage the flow of big data. The
research also investigated and documented current
tools’ capabilities, data quality and efforts to maintain
big data pipelines and infrastructure
Goals and Methodology
Methodology
Big data professionals worldwide were invited to
participate in a survey on the topic of big data and
ensuring data flow operations and data quality.
The survey was administered electronically and
participants were offered a token compensation for
their participation.
Participants A total of 314 participants that manage big data
operations completed the survey.
6 Sponsored by:6
Companies Represented
Industry Size
500 - 1,000
25%
1,000 - 5,000
29%
5,000 - 10,000
16%
More than
10,000
30%
2%
1%
1%
1%
1%
4%
5%
5%
5%
6%
6%
6%
10%
12%
18%
18%
0% 5% 10% 15% 20%
Other
Food and Beverage
Hospitality and Entertainment
Media and Advertising
Non-Profit
Retail
Transportation
Energy and Utilities
Telecommunications
Government
Services
Education
Healthcare
Manufacturing
Financial Services
Technology
7 Sponsored by:7
Participant Demographics
LocationRole
6%
8%
17%
34%
52%
56%
0% 10% 20% 30% 40% 50% 60%
Business analyst
Business stakeholder who uses
data to make decisions
BI or Analytics Technology Owner
(e.g. data architect, head of data
platform)
IT executive with data initiatives
in my portfolio
IT manager responsible for
delivering data initiatives
IT staff responsible for
implementing and operating data
infrastructure (e.g. database…
United States or
Canada
75%
Europe
14%
Mexico, Central
America, or South
America
4%
Australia or New
Zealand
3%
Middle East or
Africa
2%
Asia
2%
8 Sponsored by:
DETAILED FINDINGS
9 Sponsored by:
What challenges
does your company
face when managing
your big data flows?
Top 3 Challenges for Big Data Flows are
Quality, Security and Reliable Operation
1%
32%
40%
47%
52%
60%
68%
0% 10% 20% 30% 40% 50% 60% 70% 80%
We have no challenges
Adapting pipelines to meet new requirements
Upgrading big data infrastructure components
(Kafka, Hadoop, etc.).
Building pipelines for getting data into the data
store
Keeping data flow pipelines operating effectively
Complying with security and data privacy policies
Ensuring the quality of the data (accuracy,
completeness, consistency)
10 Sponsored by:
Does ‘bad data’
occasionally get into
your data stores?
87% State ‘Bad Data’ Pollutes Their Data
Stores
Yes
87%
No
13%
11 Sponsored by:
Do you believe there
is any ‘bad data’ in
your data stores
currently?
74% State ‘Bad Data’ is Currently in Their
Data Stores
Yes
74%
No
26%
12 Sponsored by:
How does your
company build big
data flow pipelines
today?
77% of Companies Still Use Hand Coding to
Build Big Data Flows
27%
63%
77%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Using big data ingestion tools such as StreamSets,
NiFi, etc.
Using ETL or data integration tools
Coding with Python, Java, etc. or low-level
frameworks such as Sqoop, Flume or Kafka
13 Sponsored by:
On average, how
often are changes or
fixes made to typical
data flow pipeline?
53% Change Data Flow Pipelines At Least Several
Times a Month
3%
19%
31%
26%
12%
8%
0%
5%
10%
15%
20%
25%
30%
35%
Several times a
day
Several times a
week
Several times a
month
Several times a
quarter
Several times a
year
Less often than
several times a
year
14 Sponsored by:
When data structure
or semantics
unexpectedly
change, how big is
the impact on the
operation of your big
data flows (failures,
slowdowns, data
corruption, etc.)?
85% State Unexpected Structure and Semantic Changes
Have Substantial Impact on Dataflow Operations
31% 54% 11%2%2%
0% 20% 40% 60% 80% 100%
Significant impact
Moderate impact
Minor impact
Structure and semantic changes
have no effect on our big data
flows
Data structure and semantic
changes never occur
15 Sponsored by:
How would you
assess your
ability to detect
each of the
following issues
in real-time?
More Than Half of Companies Lack Real
Time Information About Data Flow Quality
18%
5%
7%
7%
16%
33%
29%
37%
37%
46%
30%
43%
38%
37%
29%
13%
20%
16%
17%
9%
6%
3%
1%
1%
1%
0% 10%20%30%40%50%60%70%80%90%100%
Personally identifiable information (credit
card numbers, social security numbers) is
being inappropriately placed in a data store
The values of incoming data are diverging
from historical norms
Error rates are increasing
Data flow throughput is degrading or latency
is growing
A specific data flow pipeline has stopped
operating
Excellent
Good
Average
Poor
None
16 Sponsored by:
Only 12% Rated Their Performance as ‘Good’ or
‘Excellent’ Across All Five Key Data Flow Metrics
1. A specific data flow pipeline has
stopped operating
2. Data flow throughput is
degrading or latency is growing
3. Error rates are increasing
4. The values of incoming data are
diverging from historical norms
5. Identify personally information
within the data flows
Five Key Data Flow Metrics
Number of Key Data Flow Metrics Participants Represented as ‘Good’ or ‘Excellent’
19% 17% 19% 20% 12% 12%
1
Metrics
0
Metrics
All 5
Metrics
4
Metrics
3
Metrics
2
Metrics
17 Sponsored by:
In your opinion, how
valuable would it be
to be able to detect
each of these issues
in real-time?
Substantial Value In Real-Time Data Flow
Detection Capabilities
40%
23%
33%
28%
42%
35%
46%
46%
49%
42%
18%
26%
17%
20%
14%
6%
4%
4%
3%
3%
0% 20% 40% 60% 80% 100%
Identify personally information within
the data flows
The values of incoming data are
diverging from historical norms
Error rates are increasing
Data flow throughput is degrading or
latency is growing
A specific data flow pipeline has
stopped operating
Very valuable
Valuable
Average value
Limited value
Not valuable
18 Sponsored by:
Gap Between Current Pipeline Real-Time
Visibility Capabilities and Stated Value
42%
16%
42%
46%
14%
29%
3%
9%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Assessed value
Real-time ability
Excellent/ Very valuable
Good/ Valuable
Average/ Average value
Poor/ Limited value
None/ Not valuable
A specific data flow pipeline has stopped operating
62%
84%
19 Sponsored by:
B. Data flow throughput is degrading or latency is growing
Chasm Between Today’s Data Flow
Throughput Metrics and What is Needed
28%
7%
49%
37%
20%
37%
3%
17%
1%
1%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Assessed value
Real-time ability
Excellent/ Very valuable
Good/ Valuable
Average/ Average value
Poor/ Limited value
None/ Not valuable
44%
77%
Data flow throughput is degrading or latency is growing
20 Sponsored by:
Significant Gap Between Error Rate
Visibility Value and Current Capabilities
33%
7%
46%
37%
17%
38%
4%
16%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Assessed value
Real-time ability
Excellent/ Very valuable
Good/ Valuable
Average/ Average value
Poor/ Limited value
None/ Not valuable
44%
79%
Error rates are increasing
21 Sponsored by:
Chasm Between Value of Detecting
Divergent Data and Current Capabilities
23%
5%
46%
29%
26%
43%
4%
20%
1%
3%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Assessed value
Real-time ability
Excellent/ Very
valuable
Good/ Valuable
Average/ Average
value
Poor/ Limited value
None/ Not valuable
34%
69%
The values of incoming data are diverging from historical norms
22 Sponsored by:
Large Gap Between Data Privacy Value and
Current Capabilities
40%
18%
35%
33%
18%
30%
6%
13%
2%
6%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Assessed value
Real-time ability
Excellent/ Very valuable
Good/ Valuable
Average/ Average value
Poor/ Limited value
None/ Not valuable
51%
75%
Identify personal information within the data flows
23 Sponsored by:
How valuable is it to
have a single control
panel for
comprehensive
visibility and
management across
all of your data
flows?
72% Desire A Single Pane of Glass Solution
To Manage All Data Flows
24% 48% 24% 4%
0% 20% 40% 60% 80% 100%
Very valuable
Valuable
Average value
Limited value
24 Sponsored by:
Which of the
following do you
consider to be the
most effective
approach to ensuring
data quality?
50% State that Data Cleansing at the Source
is the Most Effective Quality Practice
Cleanse data as it
flows in from the
source
50%
Cleanse and update
data once it is in the
store
27%
Data scientists or
business analysts
cleanse data before
using it
23%
25 Sponsored by:
What is the
operational impact of
upgrading big data
components (ingest
technologies,
message queues,
data stores, search
stores, etc.)?
81% State There is Significant Operational
Impact to Upgrading Big Data Components
17% 64% 17% 2%
0% 20% 40% 60% 80% 100%
Heavy impact
Moderate impact
Minor impact
No impact
26 Sponsored by:26
For more information…
About Dimensional Research
Dimensional Research provides practical marketing research to help technology companies make
smarter business decisions. Our researchers are experts in technology and understand how
corporate IT organizations operate. Our qualitative research services deliver a clear
understanding of customer and market dynamics.
For more information, visit www.dimensionalresearch.com.
About StreamSets
Place holder
For more information, visit www.streamsets.com.
27 Sponsored by:
APPENDIX
28 Sponsored by:
Tremendous Gaps Exist Between Currant Big Bata Flow
Management Tool Capabilities and What is Needed
Ability to Detect Area in Real-Time Compared Against Stated Value To Detect in Real-Time
18%
40%
5%
23%
7%
33%
7%
28%
16%
42%
33%
35%
29%
46%
37%
46%
37%
49%
46%
42%
30%
18%
43%
26%
38%
17%
37%
20%
29%
14%
13%
6%
20%
4%
16%
4%
17%
3%
9%
3%
6%
2%
3%
1%
1%
0%
1%
1%
1%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Personally identifiable information (credit card numbers, social
security numbers) is being inappropriately placed in a data store
The values of incoming data are diverging from historical norms
Error rates are increasing
Data flow throughput is degrading or latency is growing
A specific data flow pipeline has stopped operating
Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable
Stated Value
Current Ability
Stated Value
Current Ability
Stated Value
Current Ability
Stated Value
Current Ability
Stated Value
Current Ability
29 Sponsored by:
Which of the
following approaches
for ensuring data
quality does your
company utilize?
Various Approaches To Managing Data
Quality Indicates a Lack of Best Practice
43%
54%
55%
0% 10% 20% 30% 40% 50% 60%
Data scientists or business analysts cleanse data
before using it
Cleanse data as it flows in from the source
Cleanse and update data once it is in the store
30 Sponsored by:
Approximately, what
percentage of data
flow changes and
fixes are made for
day-to-day
maintenance and
troubleshooting
purposes?
Many Must Perform Maintenance and
Troubleshooting on Data Flows Routinely
3%
10%
24%
27%
36%
0%
5%
10%
15%
20%
25%
30%
35%
40%
More than 80% 60% - 80% 40% - 60% 20% - 40% Less than 20%

Weitere ähnliche Inhalte

Was ist angesagt?

Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneySeeling Cheung
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control TowerDatabricks
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceTony Baer
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIDataWorks Summit
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Seeling Cheung
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesHortonworks
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonJeffrey T. Pollock
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
 

Was ist angesagt? (20)

Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lon
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 

Andere mochten auch

Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...DataStax
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Andere mochten auch (17)

Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Ten canoes
Ten canoesTen canoes
Ten canoes
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Ähnlich wie Bad Data is Polluting Big Data

Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...DATAVERSITY
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsSapience Analytics
 
Analytic Transformation | 2013 Loras College Business Analytics Symposium
Analytic Transformation | 2013 Loras College Business Analytics SymposiumAnalytic Transformation | 2013 Loras College Business Analytics Symposium
Analytic Transformation | 2013 Loras College Business Analytics SymposiumCartegraph
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
State of Data Governance in 2021
State of Data Governance in 2021State of Data Governance in 2021
State of Data Governance in 2021DATAVERSITY
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Precisely
 
3 Tips to improve supplier information management
3 Tips to improve supplier information management3 Tips to improve supplier information management
3 Tips to improve supplier information managementSarah Fane
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesSlideTeam
 
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...Utah Digital Marketing Collective
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial servicesTamrMarketing
 
CSCMP 2014: Big Data Use in Retail Supply Chains
CSCMP 2014: Big Data Use in Retail Supply ChainsCSCMP 2014: Big Data Use in Retail Supply Chains
CSCMP 2014: Big Data Use in Retail Supply ChainsAnnibalSodero
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsBredin, Inc.
 
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Cloudera, Inc.
 
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...Amazon Web Services
 
Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides SlideTeam
 
Humanising-your-data-strategy
Humanising-your-data-strategyHumanising-your-data-strategy
Humanising-your-data-strategyhighgate10
 
How life-event data can improve & protect your marketing in a post-GDPR world
How life-event data can improve & protect your marketing in a post-GDPR worldHow life-event data can improve & protect your marketing in a post-GDPR world
How life-event data can improve & protect your marketing in a post-GDPR worldPaul Laughlin
 
Predictive Analysis Powerpoint Presentation Slides
Predictive Analysis Powerpoint Presentation SlidesPredictive Analysis Powerpoint Presentation Slides
Predictive Analysis Powerpoint Presentation SlidesSlideTeam
 
Data Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataData Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataPrecisely
 

Ähnlich wie Bad Data is Polluting Big Data (20)

Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
 
Analytic Transformation | 2013 Loras College Business Analytics Symposium
Analytic Transformation | 2013 Loras College Business Analytics SymposiumAnalytic Transformation | 2013 Loras College Business Analytics Symposium
Analytic Transformation | 2013 Loras College Business Analytics Symposium
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
1530 track1 rosenbaum
1530 track1 rosenbaum1530 track1 rosenbaum
1530 track1 rosenbaum
 
State of Data Governance in 2021
State of Data Governance in 2021State of Data Governance in 2021
State of Data Governance in 2021
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 
3 Tips to improve supplier information management
3 Tips to improve supplier information management3 Tips to improve supplier information management
3 Tips to improve supplier information management
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...
Trends, Issues and New Approaches to Digital Marketing & Experience Data: Mik...
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services
 
CSCMP 2014: Big Data Use in Retail Supply Chains
CSCMP 2014: Big Data Use in Retail Supply ChainsCSCMP 2014: Big Data Use in Retail Supply Chains
CSCMP 2014: Big Data Use in Retail Supply Chains
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB Leads
 
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
 
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
 
Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides
 
Humanising-your-data-strategy
Humanising-your-data-strategyHumanising-your-data-strategy
Humanising-your-data-strategy
 
How life-event data can improve & protect your marketing in a post-GDPR world
How life-event data can improve & protect your marketing in a post-GDPR worldHow life-event data can improve & protect your marketing in a post-GDPR world
How life-event data can improve & protect your marketing in a post-GDPR world
 
Predictive Analysis Powerpoint Presentation Slides
Predictive Analysis Powerpoint Presentation SlidesPredictive Analysis Powerpoint Presentation Slides
Predictive Analysis Powerpoint Presentation Slides
 
Data Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataData Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from Data
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 

Kürzlich hochgeladen (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Bad Data is Polluting Big Data

  • 1. 1 Sponsored by: Sponsored by: ‘Bad Data’ Is Polluting Big Data Enterprises Struggle with Real-Time Control of Data Flows A Global Survey of Big Data Professionals June 2016
  • 2. 2 Executive Summary The big data market is still maturing, especially as relates to data in motion and as evidenced by lack of best practices or consistent processes to clean and manage data quality. For companies who use big data to optimize current business operations or to make strategic decisions, it is critical that they ensure their big data teams have real-time visibility and control over the data at all times. This report finds that companies who are leveraging big data are rarely capable of controlling their data flows. Almost 9 out of 10 companies report ‘bad data’ polluting their data stores and shockingly nearly 3/4 indicate there is ‘bad data’ in their stores currently. The findings also reveal a chasm between the problem detection capabilities data experts have today and what they desire. This translates into a lack of real-time visibility and control of data flows, operations, quality and security.
  • 3. 3 Sponsored by:3 Key Findings • 87% state ‘bad data’ pollutes their data stores while 74% state ‘bad data’ is currently in their data stores • Ensuring data quality was the most common challenge cited, by 68% of respondents, and only 34% claimed to be good at detecting divergent data • 72% responded that they hand code their data flows while 53% claimed they have to change each pipeline at least several times a month • Tremendous gaps exist between today’s big data flow management tools’ capabilities and what is needed • Only 10% of respondents rated their performance as good or excellent across 5 key data flow operational performance areas • 72% desire a single pane of glass solution to manage all data flows • 81% state there is a significant operational impact when they upgrade big data components
  • 4. 4 Sponsored by: METHODOLOGY AND PARTICIPANTS
  • 5. 5 Sponsored by:5 Research Goal The primary research goal was to capture how companies manage the flow of big data. The research also investigated and documented current tools’ capabilities, data quality and efforts to maintain big data pipelines and infrastructure Goals and Methodology Methodology Big data professionals worldwide were invited to participate in a survey on the topic of big data and ensuring data flow operations and data quality. The survey was administered electronically and participants were offered a token compensation for their participation. Participants A total of 314 participants that manage big data operations completed the survey.
  • 6. 6 Sponsored by:6 Companies Represented Industry Size 500 - 1,000 25% 1,000 - 5,000 29% 5,000 - 10,000 16% More than 10,000 30% 2% 1% 1% 1% 1% 4% 5% 5% 5% 6% 6% 6% 10% 12% 18% 18% 0% 5% 10% 15% 20% Other Food and Beverage Hospitality and Entertainment Media and Advertising Non-Profit Retail Transportation Energy and Utilities Telecommunications Government Services Education Healthcare Manufacturing Financial Services Technology
  • 7. 7 Sponsored by:7 Participant Demographics LocationRole 6% 8% 17% 34% 52% 56% 0% 10% 20% 30% 40% 50% 60% Business analyst Business stakeholder who uses data to make decisions BI or Analytics Technology Owner (e.g. data architect, head of data platform) IT executive with data initiatives in my portfolio IT manager responsible for delivering data initiatives IT staff responsible for implementing and operating data infrastructure (e.g. database… United States or Canada 75% Europe 14% Mexico, Central America, or South America 4% Australia or New Zealand 3% Middle East or Africa 2% Asia 2%
  • 9. 9 Sponsored by: What challenges does your company face when managing your big data flows? Top 3 Challenges for Big Data Flows are Quality, Security and Reliable Operation 1% 32% 40% 47% 52% 60% 68% 0% 10% 20% 30% 40% 50% 60% 70% 80% We have no challenges Adapting pipelines to meet new requirements Upgrading big data infrastructure components (Kafka, Hadoop, etc.). Building pipelines for getting data into the data store Keeping data flow pipelines operating effectively Complying with security and data privacy policies Ensuring the quality of the data (accuracy, completeness, consistency)
  • 10. 10 Sponsored by: Does ‘bad data’ occasionally get into your data stores? 87% State ‘Bad Data’ Pollutes Their Data Stores Yes 87% No 13%
  • 11. 11 Sponsored by: Do you believe there is any ‘bad data’ in your data stores currently? 74% State ‘Bad Data’ is Currently in Their Data Stores Yes 74% No 26%
  • 12. 12 Sponsored by: How does your company build big data flow pipelines today? 77% of Companies Still Use Hand Coding to Build Big Data Flows 27% 63% 77% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Using big data ingestion tools such as StreamSets, NiFi, etc. Using ETL or data integration tools Coding with Python, Java, etc. or low-level frameworks such as Sqoop, Flume or Kafka
  • 13. 13 Sponsored by: On average, how often are changes or fixes made to typical data flow pipeline? 53% Change Data Flow Pipelines At Least Several Times a Month 3% 19% 31% 26% 12% 8% 0% 5% 10% 15% 20% 25% 30% 35% Several times a day Several times a week Several times a month Several times a quarter Several times a year Less often than several times a year
  • 14. 14 Sponsored by: When data structure or semantics unexpectedly change, how big is the impact on the operation of your big data flows (failures, slowdowns, data corruption, etc.)? 85% State Unexpected Structure and Semantic Changes Have Substantial Impact on Dataflow Operations 31% 54% 11%2%2% 0% 20% 40% 60% 80% 100% Significant impact Moderate impact Minor impact Structure and semantic changes have no effect on our big data flows Data structure and semantic changes never occur
  • 15. 15 Sponsored by: How would you assess your ability to detect each of the following issues in real-time? More Than Half of Companies Lack Real Time Information About Data Flow Quality 18% 5% 7% 7% 16% 33% 29% 37% 37% 46% 30% 43% 38% 37% 29% 13% 20% 16% 17% 9% 6% 3% 1% 1% 1% 0% 10%20%30%40%50%60%70%80%90%100% Personally identifiable information (credit card numbers, social security numbers) is being inappropriately placed in a data store The values of incoming data are diverging from historical norms Error rates are increasing Data flow throughput is degrading or latency is growing A specific data flow pipeline has stopped operating Excellent Good Average Poor None
  • 16. 16 Sponsored by: Only 12% Rated Their Performance as ‘Good’ or ‘Excellent’ Across All Five Key Data Flow Metrics 1. A specific data flow pipeline has stopped operating 2. Data flow throughput is degrading or latency is growing 3. Error rates are increasing 4. The values of incoming data are diverging from historical norms 5. Identify personally information within the data flows Five Key Data Flow Metrics Number of Key Data Flow Metrics Participants Represented as ‘Good’ or ‘Excellent’ 19% 17% 19% 20% 12% 12% 1 Metrics 0 Metrics All 5 Metrics 4 Metrics 3 Metrics 2 Metrics
  • 17. 17 Sponsored by: In your opinion, how valuable would it be to be able to detect each of these issues in real-time? Substantial Value In Real-Time Data Flow Detection Capabilities 40% 23% 33% 28% 42% 35% 46% 46% 49% 42% 18% 26% 17% 20% 14% 6% 4% 4% 3% 3% 0% 20% 40% 60% 80% 100% Identify personally information within the data flows The values of incoming data are diverging from historical norms Error rates are increasing Data flow throughput is degrading or latency is growing A specific data flow pipeline has stopped operating Very valuable Valuable Average value Limited value Not valuable
  • 18. 18 Sponsored by: Gap Between Current Pipeline Real-Time Visibility Capabilities and Stated Value 42% 16% 42% 46% 14% 29% 3% 9% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Assessed value Real-time ability Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable A specific data flow pipeline has stopped operating 62% 84%
  • 19. 19 Sponsored by: B. Data flow throughput is degrading or latency is growing Chasm Between Today’s Data Flow Throughput Metrics and What is Needed 28% 7% 49% 37% 20% 37% 3% 17% 1% 1% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Assessed value Real-time ability Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable 44% 77% Data flow throughput is degrading or latency is growing
  • 20. 20 Sponsored by: Significant Gap Between Error Rate Visibility Value and Current Capabilities 33% 7% 46% 37% 17% 38% 4% 16% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Assessed value Real-time ability Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable 44% 79% Error rates are increasing
  • 21. 21 Sponsored by: Chasm Between Value of Detecting Divergent Data and Current Capabilities 23% 5% 46% 29% 26% 43% 4% 20% 1% 3% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Assessed value Real-time ability Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable 34% 69% The values of incoming data are diverging from historical norms
  • 22. 22 Sponsored by: Large Gap Between Data Privacy Value and Current Capabilities 40% 18% 35% 33% 18% 30% 6% 13% 2% 6% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Assessed value Real-time ability Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable 51% 75% Identify personal information within the data flows
  • 23. 23 Sponsored by: How valuable is it to have a single control panel for comprehensive visibility and management across all of your data flows? 72% Desire A Single Pane of Glass Solution To Manage All Data Flows 24% 48% 24% 4% 0% 20% 40% 60% 80% 100% Very valuable Valuable Average value Limited value
  • 24. 24 Sponsored by: Which of the following do you consider to be the most effective approach to ensuring data quality? 50% State that Data Cleansing at the Source is the Most Effective Quality Practice Cleanse data as it flows in from the source 50% Cleanse and update data once it is in the store 27% Data scientists or business analysts cleanse data before using it 23%
  • 25. 25 Sponsored by: What is the operational impact of upgrading big data components (ingest technologies, message queues, data stores, search stores, etc.)? 81% State There is Significant Operational Impact to Upgrading Big Data Components 17% 64% 17% 2% 0% 20% 40% 60% 80% 100% Heavy impact Moderate impact Minor impact No impact
  • 26. 26 Sponsored by:26 For more information… About Dimensional Research Dimensional Research provides practical marketing research to help technology companies make smarter business decisions. Our researchers are experts in technology and understand how corporate IT organizations operate. Our qualitative research services deliver a clear understanding of customer and market dynamics. For more information, visit www.dimensionalresearch.com. About StreamSets Place holder For more information, visit www.streamsets.com.
  • 28. 28 Sponsored by: Tremendous Gaps Exist Between Currant Big Bata Flow Management Tool Capabilities and What is Needed Ability to Detect Area in Real-Time Compared Against Stated Value To Detect in Real-Time 18% 40% 5% 23% 7% 33% 7% 28% 16% 42% 33% 35% 29% 46% 37% 46% 37% 49% 46% 42% 30% 18% 43% 26% 38% 17% 37% 20% 29% 14% 13% 6% 20% 4% 16% 4% 17% 3% 9% 3% 6% 2% 3% 1% 1% 0% 1% 1% 1% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Personally identifiable information (credit card numbers, social security numbers) is being inappropriately placed in a data store The values of incoming data are diverging from historical norms Error rates are increasing Data flow throughput is degrading or latency is growing A specific data flow pipeline has stopped operating Excellent/ Very valuable Good/ Valuable Average/ Average value Poor/ Limited value None/ Not valuable Stated Value Current Ability Stated Value Current Ability Stated Value Current Ability Stated Value Current Ability Stated Value Current Ability
  • 29. 29 Sponsored by: Which of the following approaches for ensuring data quality does your company utilize? Various Approaches To Managing Data Quality Indicates a Lack of Best Practice 43% 54% 55% 0% 10% 20% 30% 40% 50% 60% Data scientists or business analysts cleanse data before using it Cleanse data as it flows in from the source Cleanse and update data once it is in the store
  • 30. 30 Sponsored by: Approximately, what percentage of data flow changes and fixes are made for day-to-day maintenance and troubleshooting purposes? Many Must Perform Maintenance and Troubleshooting on Data Flows Routinely 3% 10% 24% 27% 36% 0% 5% 10% 15% 20% 25% 30% 35% 40% More than 80% 60% - 80% 40% - 60% 20% - 40% Less than 20%