SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
TIPS from the Experts
Table of Contents 
Setup is Key 
Think wide 
Tool integration 
Evaluate and Adapt 
Sharing 
Encryption 
A data science mindset 
Innovation 
Real-time action
To see all of the tips in list 
form, click the button on the 
bottom of the slide. 
See in List Form
Grant Unlimited Access 
Create a data lake and give your business and 
data analysts access to all your data – 
structured and unstructured – with SQL engines 
like Hive. They will surprise you with the insight 
and value they can extract, and your 
development team will have less work 
answering ad-hoc queries. 
“ “ 
—Christian Prokopp, Principal Consultant at Big Data Partnership 
See in List Form
Select the Right Tools 
Very often the query is when to use 
MapReduce/Pig/Hive vs. HBase/Cassandra/Impala 
frameworks. NFR (Non Functional Requirements) 
have to be considered while deciding the 
framework. MapReduce/Pig/Hive are used for high 
throughput/high latency requirements as in the 
case of Batch processing/ETL. 
HBase/Cassandra/Impala are used for low 
throughput/low latency requirements as in the case 
of a customer filling out an online application. 
“ “ 
—Praveen Sripati, Hadoop trainer and author of Hadoop Tips 
See in List Form
Improve query performance by considering 
Presto with RCFile or ORC File format. 
Use Presto 
“ “ 
—Minesh Patel, Qubole 
See in List Form
Incorporate Machine 
Learning 
Use Robust Machine Learning Algorithms to 
extract the data – Data collection and massive 
storing is only the enabling infrastructure. You 
should leverage existing and also propriety 
machine learning algorithms, that will discover 
hidden patterns, and will learn from the data 
what is important for the analyst to view and 
examine, and what is not. 
“ “ 
—Idan Tendler, CEO of Fortscale 
See in List Form
Automation is Key 
There is a big need for automation in Big Data. 
Security is an important industry that has 
proven the value of Big Data. But, that has just 
as quickly proved that Big Data is also valueless 
without automation wrapped around it to make 
it practical. Only once you make Big Data 
practical can you begin to perform analytics, 
etc., which is where the value of Big Data in the 
security industry really gets unlocked. 
“ “ 
—Sean Brady, VP of Product Management at Vorstack 
See in List Form
Identify Easy Wins 
Segment the data based on demographic 
and/or firmographic information. This is an easy 
and inexpensive way to highlight trends in the 
primary customers and industries served. This 
information is very helpful when determining 
what new products and/or services should be 
offered. In addition, look for trends in 
behavioral transaction information and further 
optimize the customer’s experience with 
relevant marketing and messaging. 
“ “ 
—David Handmaker, CEO of Next Day Flyers 
See in List Form
Think Broad 
Identify all of the data you have access to and/or will 
produce, and explore possible audiences and use 
cases for it. Often times, big data plays are geared 
toward a fairly narrow audience and set of use 
cases based on the original inspiration for the 
solution. Or, there is not an active and explicit 
exploration of the full potential of what you have to 
offer. I can all but assure you that there are major 
opportunities for your offering that you haven’t 
even considered yet. The earlier you have a crisp 
view of the potential of your big data and offering, 
the better able you will be to build the right thing, in 
the right way, to exploit the potential of that idea. 
“ “ 
—Dirk Knemeyer, founder of Involution Studios 
See in List Form
Setup is Key
Careful and Smart Integration 
with BI tools 
Big Data tools ( Mapreduce/Hive etc. ) are known for 
their latency problems, but on the other hand they are 
excellent for processing petabytes of data in a 
distributed computing environment. When it comes to 
integration with any BI/reporting tools, big data 
technologies should be used in an appropriate manner 
so that you can avoid the negatives and leverage the 
strength of these technologies. 
For example – if you are building an integrated pipeline 
with BI tools, try to aggregate as much as you can and 
utilize the caching or cube technologies with the BI tools 
to make it a faster experience for the end user. Real 
time connectivity with big data sources like Hive/HDFS is 
not a great end user experience in the BI space, so it 
should be avoided. 
“ “ 
—Ashish Dubey, Solutions Architect at Qubole 
See in List Form
Invest in Your Pipeline 
Rule of thumb, invest 80% of your time in your 
data lake and data pipeline (mining, extracting, 
cleaning, transforming, loading), and 20% in the 
high level data science and machine learning 
effort. Data in the wild is complex, wrong, 
contradicting, hard to access and find. 
Consequently more, faster, and accurate data 
usually has a higher impact than more complex 
models and makes for a robust system. 
“ “ 
—Christian Prokopp, Principal Consultant at Big Data Partnership 
See in List Form
Don’t Rush Into Analysis 
Everyone with a Big Data project wants to rush 
straight into analysis. That is where things 
usually fall apart, however, because there is 
simply too much data flowing across the 
network and it is mostly in a format that 
current analytics software cannot handle. 
“ “ 
—Rick Aguirre, president of Cirries Technologies 
See in List Form
Start with Heavy Lifting 
Big Data success requires three steps of heavy lifting first, 
before you ever analyze it. 
Step 1 is data capture. 
Most of the Big Data torrent is a big nothing and not relevant. 
Decide what data you want to analyze and set up algorithms to 
locate and corral it. 
“ “ 
Step 1 is data control. 
You want to capture the data you need as it come 
across the network. It may not be relevant in just a few minutes, 
or you may need to store it for a number of years if, as one 
example, it is data that might be needed later for law 
enforcement purposes. 
Step 1 is data humanization. 
This is where you convert whatever format the data is in to a 
format that your analytics software can use. Only now, at this 
step, do you have the right data in the right format that you can 
then use for whatever kind of analytics you have in mind. 
—Rick Aguirre, president of Cirries Technologies 
See in List Form
Once data is collected then you have easy 
access for advanced analytics – don’t stop at 
only analyzing one log source or one dimension 
of data – analyze across log sources and 
multiple entities. For example, in order to 
discover advanced cyber attacks that leveraged 
users’ credentials, we profile across behavioral 
activity of users – including their permissions 
configuration, their access to files and systems 
and their web activity. We analyze their 
historical activity as well as comparing them 
against their peers. 
Think wide 
“ “ 
—Idan Tendler, CEO of Fortscale 
See in List Form
Use the ODBC Driver 
Perform BI Analytics and Visualization 
with the ODBC Driver. 
“ “ 
—Minesh Patel, Qubole 
See in List Form
Use a Subsample 
I always start by looking at a subsample of the 
data. You often get a very good impression of 
what the main focus of the data munging or 
cleaning will be just by looking at some 
numbers (or characters). 
“ “ 
—Benedikt Koehler, Data Scientist and Blogger at Beautiful Data 
See in List Form
Evaluate and Adapt
Measure Everything 
Measure and record everything, and keep an 
eye on your key metrics. Things change, and 
tests become obsolete, and sometimes in 
surprising ways especially when you depend 
on external data. For example, data sources 
you mine may introduce rolling changes, which 
are hard to catch as an error but easy to 
identify in metrics. 
“ “ 
—Christian Prokopp, Principal Consultant at Big Data Partnership 
See in List Form
Sharing is Caring 
Measure and record everything, and keep an 
eye on your key metrics. Things change, and 
tests become obsolete, and sometimes in 
surprising ways especially when you depend 
on external data. For example, data sources 
you mine may introduce rolling changes, which 
are hard to catch as an error but easy to 
identify in metrics. 
“ “ 
—Idan Tendler, CEO of Fortscale 
See in List Form
Encrypting data at rest is a good 
best practice. 
Encryption 
“ “ 
—Minesh Patel, Qubole 
See in List Form
Pick the Right Distribution 
A common question is whether to go for a 
distribution from Apache or a vendor. When 
there is enough expertise in the organization to 
know the internals of the different frameworks 
for integrating and resolving any issues quickly, 
then go with Apache Hive. If that expertise is 
not available, use a distribution through a 
vendor and get commercial support to resolve 
any issues that may arise. 
“ “ 
—Praveen Sripati, Hadoop trainer and author of Dattamsha 
See in List Form
Developing a Big Data strategy is all about 
starting small and making gradual steps in 
becoming more data-driven. Start with 
breaking down the data silos within your 
organization to gain the most insights from 
your data when you start analyzing it 
through a variety of tools. 
Start Small 
“ “ 
—Mark van Rijmenam – CEO / Founder BigData-Startups 
See in List Form
Have a Business Intent 
There is often a perception that there is gold in 
an organization’s data, and that if you just look 
hard enough, you will find it. In reality, this 
perception can lead to fruitless efforts with no 
real direction and no payoff. Instead, start with 
a business intent in mind. What are the actions 
you would take—and the value to your 
business—if data can provide the answer to a 
certain question? 
“ “ 
—Sean Stauth, Director, Client Services, Silicon Valley Data Science 
See in List Form
Update Your Strategy 
Your data strategy should be a living document 
that helps you get the most value from your 
data. As your goals, your technical environment, 
or the market change, keep it updated to help 
you follow those changes and stay on course. 
“ “ 
—Scott Kurth, VP, Advisory Services, Silicon Valley Data Science 
See in List Form
A Data Science Mindset
Data Science Mindset 
Have an always-on data science mindset — 
Successful big data initiatives start with a holistic 
360 view of the problem space. This includes 
understanding the inputs (data types, sources, 
features), the desired outputs (decisions, goals, 
predictions), and the constraints (model 
parameters, boundary conditions, optimization 
constraints). To achieve this perspective, one must 
be thinking like a scientist from start to finish: 
collect data, infer a testable hypothesis, design an 
experiment, test and evaluate the results, refine 
your hypothesis, and repeat (if necessary). 
“ “ 
—Kirk Borne, Data Scientist, Astrophysicist and 
Big Data Science Consultant 
See in List Form
Return on Innovation 
The most important ROI in Big Data Analytics 
projects is Return On Innovation. What are you 
doing that’s different and consequential? What 
sets you apart from the rest of the multitudes in 
this space? 
“ “ 
—Kirk Borne, Data Scientist, Astrophysicist and 
Big Data Science Consultant 
See in List Form
Focus on the Users 
Developing a big data platform requires focusing 
on the users. Serve a few users well, and let their 
processing scale up with your capabilities. 
“Premature platformization” or trying to satisfy too 
many use cases too early in the project leads to 
failures. Make the initial users successful, and the 
ecosystem will thrive and grow. 
“ “ 
—Owen O’Malley – Sr. Architect and Co-founder of Hortonworks 
See in List Form
Using the API: samples for Java SDK, 
Python SDK, and REST. 
Use the API 
“ “ 
—Minesh Patel, Qubole 
See in List Form
Take Real-Time Action 
If you cannot take real-time action, you have 
no need of real-time processing. There will 
always be batch processing workloads 
supporting the enterprise, and increasingly 
dynamic decision areas can be effectively 
supported by analytical systems because of 
advances in data architectures. 
“ “ 
—Sanjay Mathur, CEO, Silicon Valley Data Science 
See in List Form
Store Denormalized State 
State—the full context of an event, like a 
customer visit or the completion of a step in 
a manufacturing process—can be expensive 
to reassemble after the fact. This is 
particularly true with highly relational 
systems: witness the complex ETL (extract, 
transform, load) workloads that enterprise 
data warehouse systems struggle to scale. 
Storing denormalized state, e.g. rich logs, for 
analysis has proven highly successful for the 
web businesses of silicon valley, and those 
techniques can be applied to industries 
across the economy. 
“ “ 
—John Akred, CTO, Silicon Valley Data Science 
See in List Form
Build a Common Platform 
Whether you are thinking about migrating 
towards Big Data or whether you are just 
starting out with data all together, it helps to 
focus upon building and maintaining a 
common platform. Similar to software 
development platforms, data platforms 
should also include source control, change 
management, and testing scenarios. This will 
help reduce future migration costs and will 
lead to long-term sustainable, competitive 
data capabilities. 
“ “ 
—Ryan Kirk, SR. Data Scientist at Hipcricket 
See in List Form
Looking for additional big data tips and advice? 
Subscribe to Qubole's email newsletter. 
Sources: 
http://www.qubole.com/new-series-big-data-tips/ 
http://www.qubole.com/setup-is-key/ 
http://www.qubole.com/evaluate-and-adapt/ 
http://www.qubole.com/data-mindset/

Weitere ähnliche Inhalte

Was ist angesagt?

Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksDatabricks
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementDenodo
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
Data Warehouse in Cloud
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in CloudPawan Bhargava
 

Was ist angesagt? (20)

Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Data Warehouse in Cloud
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in Cloud
 

Andere mochten auch

Grace Hopper Conference Opening Keynote
Grace Hopper Conference Opening KeynoteGrace Hopper Conference Opening Keynote
Grace Hopper Conference Opening KeynoteHilary Mason
 
Extending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformExtending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformWSO2
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Big Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryBig Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryRashed Moslem
 
Big Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundBig Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundSPEC INDIA
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Hellmuth Broda
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Capgemini
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015 Den Reymer
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsAnthony Chen
 
Subscribed 2015: CEO's Keynote
Subscribed 2015: CEO's KeynoteSubscribed 2015: CEO's Keynote
Subscribed 2015: CEO's KeynoteZuora, Inc.
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017LinkedIn
 

Andere mochten auch (19)

Grace Hopper Conference Opening Keynote
Grace Hopper Conference Opening KeynoteGrace Hopper Conference Opening Keynote
Grace Hopper Conference Opening Keynote
 
Extending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformExtending WSO2 Analytics Platform
Extending WSO2 Analytics Platform
 
Big Data Commission Report
Big Data Commission ReportBig Data Commission Report
Big Data Commission Report
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Big Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryBig Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industry
 
Big Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundBig Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal Around
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of Things
 
Subscribed 2015: CEO's Keynote
Subscribed 2015: CEO's KeynoteSubscribed 2015: CEO's Keynote
Subscribed 2015: CEO's Keynote
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Ähnlich wie Expert Big Data Tips

Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITEdward Chenard
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business ModelingNeil Raden
 
Data as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and WhenData as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and WhenRocketSource
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardEdward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...amdia
 

Ähnlich wie Expert Big Data Tips (20)

Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and IT
 
Big Data : a 360° Overview
Big Data : a 360° Overview Big Data : a 360° Overview
Big Data : a 360° Overview
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 
Data as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and WhenData as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and When
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Thilga
ThilgaThilga
Thilga
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 

Mehr von Qubole

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome ThemQubole
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data AdoptionQubole
 
Big Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleBig Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleQubole
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on YarnQubole
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on CloudQubole
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleQubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveQubole
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposalQubole
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveQubole
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive QueriesQubole
 

Mehr von Qubole (20)

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data Adoption
 
Big Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleBig Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by Qubole
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data Industry
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - Qubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposal
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using Hive
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
 

Kürzlich hochgeladen

Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Kürzlich hochgeladen (20)

Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

Expert Big Data Tips

  • 1. TIPS from the Experts
  • 2. Table of Contents Setup is Key Think wide Tool integration Evaluate and Adapt Sharing Encryption A data science mindset Innovation Real-time action
  • 3. To see all of the tips in list form, click the button on the bottom of the slide. See in List Form
  • 4. Grant Unlimited Access Create a data lake and give your business and data analysts access to all your data – structured and unstructured – with SQL engines like Hive. They will surprise you with the insight and value they can extract, and your development team will have less work answering ad-hoc queries. “ “ —Christian Prokopp, Principal Consultant at Big Data Partnership See in List Form
  • 5. Select the Right Tools Very often the query is when to use MapReduce/Pig/Hive vs. HBase/Cassandra/Impala frameworks. NFR (Non Functional Requirements) have to be considered while deciding the framework. MapReduce/Pig/Hive are used for high throughput/high latency requirements as in the case of Batch processing/ETL. HBase/Cassandra/Impala are used for low throughput/low latency requirements as in the case of a customer filling out an online application. “ “ —Praveen Sripati, Hadoop trainer and author of Hadoop Tips See in List Form
  • 6. Improve query performance by considering Presto with RCFile or ORC File format. Use Presto “ “ —Minesh Patel, Qubole See in List Form
  • 7. Incorporate Machine Learning Use Robust Machine Learning Algorithms to extract the data – Data collection and massive storing is only the enabling infrastructure. You should leverage existing and also propriety machine learning algorithms, that will discover hidden patterns, and will learn from the data what is important for the analyst to view and examine, and what is not. “ “ —Idan Tendler, CEO of Fortscale See in List Form
  • 8. Automation is Key There is a big need for automation in Big Data. Security is an important industry that has proven the value of Big Data. But, that has just as quickly proved that Big Data is also valueless without automation wrapped around it to make it practical. Only once you make Big Data practical can you begin to perform analytics, etc., which is where the value of Big Data in the security industry really gets unlocked. “ “ —Sean Brady, VP of Product Management at Vorstack See in List Form
  • 9. Identify Easy Wins Segment the data based on demographic and/or firmographic information. This is an easy and inexpensive way to highlight trends in the primary customers and industries served. This information is very helpful when determining what new products and/or services should be offered. In addition, look for trends in behavioral transaction information and further optimize the customer’s experience with relevant marketing and messaging. “ “ —David Handmaker, CEO of Next Day Flyers See in List Form
  • 10. Think Broad Identify all of the data you have access to and/or will produce, and explore possible audiences and use cases for it. Often times, big data plays are geared toward a fairly narrow audience and set of use cases based on the original inspiration for the solution. Or, there is not an active and explicit exploration of the full potential of what you have to offer. I can all but assure you that there are major opportunities for your offering that you haven’t even considered yet. The earlier you have a crisp view of the potential of your big data and offering, the better able you will be to build the right thing, in the right way, to exploit the potential of that idea. “ “ —Dirk Knemeyer, founder of Involution Studios See in List Form
  • 12. Careful and Smart Integration with BI tools Big Data tools ( Mapreduce/Hive etc. ) are known for their latency problems, but on the other hand they are excellent for processing petabytes of data in a distributed computing environment. When it comes to integration with any BI/reporting tools, big data technologies should be used in an appropriate manner so that you can avoid the negatives and leverage the strength of these technologies. For example – if you are building an integrated pipeline with BI tools, try to aggregate as much as you can and utilize the caching or cube technologies with the BI tools to make it a faster experience for the end user. Real time connectivity with big data sources like Hive/HDFS is not a great end user experience in the BI space, so it should be avoided. “ “ —Ashish Dubey, Solutions Architect at Qubole See in List Form
  • 13. Invest in Your Pipeline Rule of thumb, invest 80% of your time in your data lake and data pipeline (mining, extracting, cleaning, transforming, loading), and 20% in the high level data science and machine learning effort. Data in the wild is complex, wrong, contradicting, hard to access and find. Consequently more, faster, and accurate data usually has a higher impact than more complex models and makes for a robust system. “ “ —Christian Prokopp, Principal Consultant at Big Data Partnership See in List Form
  • 14. Don’t Rush Into Analysis Everyone with a Big Data project wants to rush straight into analysis. That is where things usually fall apart, however, because there is simply too much data flowing across the network and it is mostly in a format that current analytics software cannot handle. “ “ —Rick Aguirre, president of Cirries Technologies See in List Form
  • 15. Start with Heavy Lifting Big Data success requires three steps of heavy lifting first, before you ever analyze it. Step 1 is data capture. Most of the Big Data torrent is a big nothing and not relevant. Decide what data you want to analyze and set up algorithms to locate and corral it. “ “ Step 1 is data control. You want to capture the data you need as it come across the network. It may not be relevant in just a few minutes, or you may need to store it for a number of years if, as one example, it is data that might be needed later for law enforcement purposes. Step 1 is data humanization. This is where you convert whatever format the data is in to a format that your analytics software can use. Only now, at this step, do you have the right data in the right format that you can then use for whatever kind of analytics you have in mind. —Rick Aguirre, president of Cirries Technologies See in List Form
  • 16. Once data is collected then you have easy access for advanced analytics – don’t stop at only analyzing one log source or one dimension of data – analyze across log sources and multiple entities. For example, in order to discover advanced cyber attacks that leveraged users’ credentials, we profile across behavioral activity of users – including their permissions configuration, their access to files and systems and their web activity. We analyze their historical activity as well as comparing them against their peers. Think wide “ “ —Idan Tendler, CEO of Fortscale See in List Form
  • 17. Use the ODBC Driver Perform BI Analytics and Visualization with the ODBC Driver. “ “ —Minesh Patel, Qubole See in List Form
  • 18. Use a Subsample I always start by looking at a subsample of the data. You often get a very good impression of what the main focus of the data munging or cleaning will be just by looking at some numbers (or characters). “ “ —Benedikt Koehler, Data Scientist and Blogger at Beautiful Data See in List Form
  • 20. Measure Everything Measure and record everything, and keep an eye on your key metrics. Things change, and tests become obsolete, and sometimes in surprising ways especially when you depend on external data. For example, data sources you mine may introduce rolling changes, which are hard to catch as an error but easy to identify in metrics. “ “ —Christian Prokopp, Principal Consultant at Big Data Partnership See in List Form
  • 21. Sharing is Caring Measure and record everything, and keep an eye on your key metrics. Things change, and tests become obsolete, and sometimes in surprising ways especially when you depend on external data. For example, data sources you mine may introduce rolling changes, which are hard to catch as an error but easy to identify in metrics. “ “ —Idan Tendler, CEO of Fortscale See in List Form
  • 22. Encrypting data at rest is a good best practice. Encryption “ “ —Minesh Patel, Qubole See in List Form
  • 23. Pick the Right Distribution A common question is whether to go for a distribution from Apache or a vendor. When there is enough expertise in the organization to know the internals of the different frameworks for integrating and resolving any issues quickly, then go with Apache Hive. If that expertise is not available, use a distribution through a vendor and get commercial support to resolve any issues that may arise. “ “ —Praveen Sripati, Hadoop trainer and author of Dattamsha See in List Form
  • 24. Developing a Big Data strategy is all about starting small and making gradual steps in becoming more data-driven. Start with breaking down the data silos within your organization to gain the most insights from your data when you start analyzing it through a variety of tools. Start Small “ “ —Mark van Rijmenam – CEO / Founder BigData-Startups See in List Form
  • 25. Have a Business Intent There is often a perception that there is gold in an organization’s data, and that if you just look hard enough, you will find it. In reality, this perception can lead to fruitless efforts with no real direction and no payoff. Instead, start with a business intent in mind. What are the actions you would take—and the value to your business—if data can provide the answer to a certain question? “ “ —Sean Stauth, Director, Client Services, Silicon Valley Data Science See in List Form
  • 26. Update Your Strategy Your data strategy should be a living document that helps you get the most value from your data. As your goals, your technical environment, or the market change, keep it updated to help you follow those changes and stay on course. “ “ —Scott Kurth, VP, Advisory Services, Silicon Valley Data Science See in List Form
  • 27. A Data Science Mindset
  • 28. Data Science Mindset Have an always-on data science mindset — Successful big data initiatives start with a holistic 360 view of the problem space. This includes understanding the inputs (data types, sources, features), the desired outputs (decisions, goals, predictions), and the constraints (model parameters, boundary conditions, optimization constraints). To achieve this perspective, one must be thinking like a scientist from start to finish: collect data, infer a testable hypothesis, design an experiment, test and evaluate the results, refine your hypothesis, and repeat (if necessary). “ “ —Kirk Borne, Data Scientist, Astrophysicist and Big Data Science Consultant See in List Form
  • 29. Return on Innovation The most important ROI in Big Data Analytics projects is Return On Innovation. What are you doing that’s different and consequential? What sets you apart from the rest of the multitudes in this space? “ “ —Kirk Borne, Data Scientist, Astrophysicist and Big Data Science Consultant See in List Form
  • 30. Focus on the Users Developing a big data platform requires focusing on the users. Serve a few users well, and let their processing scale up with your capabilities. “Premature platformization” or trying to satisfy too many use cases too early in the project leads to failures. Make the initial users successful, and the ecosystem will thrive and grow. “ “ —Owen O’Malley – Sr. Architect and Co-founder of Hortonworks See in List Form
  • 31. Using the API: samples for Java SDK, Python SDK, and REST. Use the API “ “ —Minesh Patel, Qubole See in List Form
  • 32. Take Real-Time Action If you cannot take real-time action, you have no need of real-time processing. There will always be batch processing workloads supporting the enterprise, and increasingly dynamic decision areas can be effectively supported by analytical systems because of advances in data architectures. “ “ —Sanjay Mathur, CEO, Silicon Valley Data Science See in List Form
  • 33. Store Denormalized State State—the full context of an event, like a customer visit or the completion of a step in a manufacturing process—can be expensive to reassemble after the fact. This is particularly true with highly relational systems: witness the complex ETL (extract, transform, load) workloads that enterprise data warehouse systems struggle to scale. Storing denormalized state, e.g. rich logs, for analysis has proven highly successful for the web businesses of silicon valley, and those techniques can be applied to industries across the economy. “ “ —John Akred, CTO, Silicon Valley Data Science See in List Form
  • 34. Build a Common Platform Whether you are thinking about migrating towards Big Data or whether you are just starting out with data all together, it helps to focus upon building and maintaining a common platform. Similar to software development platforms, data platforms should also include source control, change management, and testing scenarios. This will help reduce future migration costs and will lead to long-term sustainable, competitive data capabilities. “ “ —Ryan Kirk, SR. Data Scientist at Hipcricket See in List Form
  • 35. Looking for additional big data tips and advice? Subscribe to Qubole's email newsletter. Sources: http://www.qubole.com/new-series-big-data-tips/ http://www.qubole.com/setup-is-key/ http://www.qubole.com/evaluate-and-adapt/ http://www.qubole.com/data-mindset/