SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Accelerating Data Science
and
Real-Time Analytics
at Scale
Nadeem Asghar, Hortonworks, Field CTO and
Global Head Partner Engineering
Steve Roberts, IBM, Big Data Offering Manager
Data
Time
Available
Data
Understood
Data
Enterprise
Amnesia
80 million
wearable health
devices will
be available by
2017.
2.5
quintillion
bytes of data
generated daily
by connected
machines.
There
will be
28 times
more
sensor-
enabled
devices
than
people
by the
year 2020.
25 gigabytes
of data per hour
is generated by a
connected car.
90% of cars will
be connected by 2020.
153 exabytes
of healthcare
data generated by
devices in 2013.
Increasing to 2,314
exabytes in 2020.
1.7 megabytes
of data per
second
generated by
every human
being on the
planet by 2020.
Centralized
Mainframes
Cognitive Era
E-Business
Distributed
Computing
Smarter Planet
Office
Productivity
Client/
Server
Personal
Computer
Data
Warehousing
Big Data &
Predictive Analytics
Cognitive
A New Era of Computing Has Emerged
Data InsightContext
Transactional
Database
Business
Intelligence
Big Data &
Analytics
Actionable
Insight in context
Reporting
Cloud
© 2018 IBM Corporation
A recruiting and HR
company, chose an
IBM & Hortonworks
full stack solution to
support their
Hadoop/Spark
workloads and
accelerate their
analytics and AI
projects
Business problem
Job-matching is their core business and accuracy
and speed of this matching is critical to their
success. This requires the intake and analysis of
terabytes of data daily – including recruiter and
company information, job listings, hiring histories,
and resumes. Future requirement to apply AI to
more complex data such as images, sound and
video.
Benefits
• Proven performance
• World class support
• Reliable security for personal data
• Built on open technologies, avoiding vendor
lock-in
• Scalable software defined storage proven
for analytics
• POWER9 and PowerAI supports their AI
research and development
From Data to AIIntelligent Job Matching
accident
risk
rate
90%
inspection
times
10X
number of
inspections
AI at the Edge
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
à #1	Pure	Open	Source	Hadoop	Distribution
à 1000+	customers	and	2100+	ecosystem	partners
à Employs	the	original	architects,	developers	and	
operators	of	Hadoop	from	Yahoo!
à Best-in-class	24x7	customer	support
à Leading	professional	services	and	training	
à Data	Science	Leader
à OpenPOWERperformance	leadership
à Flexible,	software	defined	storage
à #1	SQL	Engine	for	complex,	analytical	workloads	
à Leader	in	On-premise	and	Hybrid	Cloud	solutions
+
IBM + Hortonworks = Unlocking Actionable Insights
7 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
DATA – More Volume and More Types
I N C R EAS I N G 	 D ATA	 V AR I ETY	 AN D 	 C O MP L EX I TY
USER	GENERATED	CONTENT
MOBILE	WEB
SMS/MMS
SENTIMENT
EXTERNAL	
DEMOGRAPHICS
HD	VIDEO
SPEECH	TO	TEXT
PRODUCT/
SERVICE	LOGS
SOCIAL	NETWORK
BUSINESS	
DATA	FEEDS
USER	CLICK	STREAM
WEB	LOGS
OFFER	HISTORY DYNAMIC	PRICING
A/B	TESTING
AFFILIATE	
NETWORKS
SEARCH	MARKETING
BEHAVIORAL	TARGETING
DYNAMIC	FUNNELSPAYMENT
RECORD
SUPPORT	
CONTACTS
CUSTOMER	
TOUCHESPURCHASE	DETAIL
PURCHASE
RECORD
SEGMENTATIONOFFER	DETAILS
P E T A B Y T E S
T E R A B Y T E S
G I G A B Y T E S
E X A B Y T E S
ERP
BIG 	 DATA
W EB
CRM
8 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Business Analytics Must Evolve To Deal With Data Tipping Point
PROVIDE	INSIGHT	INTO	THE	PAST	
via	data	aggregation,	data	mining,	
business	reporting,	OLAP,	
visualization,	dashboards,	etc.
UNDERSTAND	THE	FUTURE
via	statistical	models,	forecasting	
techniques,	machine	learning,	etc.
ADVISE	ON	POSSIBLE	OUTCOMES	
via	rules,	optimization	and	
simulation	algorithms
9 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data Science and Real-Time Analytics at Scale
End to End Data Science Workflow
Data	Engineering
DISCOVER
ACQUISITION
PROCESSING
CURATION
Data	Science
DATA	
WRANGLING
FEATURE	
ENGINING,VISUALIZATI
ON	AND	ANALYSIS
MODEL	BUILDING,	
TRAINING	AND	
TESTING
Deployment	&	Operationalize
REPORTS
DASHBOARDS
REAL-TIME	
SCORING
BATCH	
SCORING
REST	SERVICES
PERFORMANCE	
MGMT
SCHEDULING
Data	Science	Experience	(DSX)
Enterprise	Services:	Multi	Notebook	Support,	Versioning,	Collaboration,	Model	Management
Hortonworks	Data	Platform	(HDP)
Enterprise	Services:	Data,	GPU,	Deep	Learning,	Compute,	Security,	Governance,	Metadata,	Operations
Hortonworks	Data	Flow	(HDF)
Enterprise	Services:	Data	Ingestion	Schema	Registry,	CEP
Hortonworks	Data	Flow	(HDF)
Enterprise	Services:	Data	Ingestion	Schema	Registry,	CEP
10 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Use	Case	Deep	Dive	
Credit	Card	Fraud	Prevention
11 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Building a Model
à Show	of	hands,	how	many	have	built	a	“Model”?
à What	are	some	limitations?
– Conditional	based	logic:		if/else	binary	decisions
à If	you	need	a	lot	of	data	to	build	a	good	model,	what	tools	can	you	use?
– Data	volumes	can	eliminate	the	possibility	of	desktop	tools
à Sampling?
– Well…	 we	better	get	an	even	distribution	of	true	and	false	positives	in	each	sample,	but	wait	that	
requires	data	munging,	back	to	what	tools	can	we	use.
à Security	Concerns?
– Extracting	data	from	it’s	secure	resting	place	and	pushing	it	into	other	environments,	often	times	
unsecure	files	or	desktops	where	Matlab	or	R	can	be	installed.
à Collaboration
– Push	processing	to	the	data	using	modern	distributed	tooling.
12 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Credit Card Fraud Use Case
à Requirement:	Detect	fraudulent	transactions.			
à Goal:	Save	the	card	company	money	and	build	trust	amongst	card	users.		Cut	down	on	
fraudulent	crime
à Functional	Requirement:	Detect	fraud	in	under	2	seconds	at	point	of	sale.		Learn,	adapt	
and	make	smarter	decisions	over	time.
à Design
– Distance:		How	far	can	one	travel	over	a	period	of	time	before	it	is	fraudulent?
– Category:	How	can	we	detect	a	purchase	that	a	customer	wouldn’t	likely	make?
– Frequency:		How	can	we	detect	purchasing	patterns	that	do	not	resemble	the	card	holder?
à Ideas?
– White	board	some	conditional	logic,	egregiousness	vs	binary
– Back	test	the	data
– Build	a	model	per	card	holder?
13 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Rules, Statistics, Machine Learning
à Rule	Based	Logic
– Great	for	checking	conditions	that	can	prove	to	be	100%	accurate.		Easy	to	build	and	no	reason	to	
over	engineer.
– Example:	Spending	Limit.		Card	holder	limit	=	$2,000
• If	(currentPurchaseAmount+		balance	>	2,000)	then	deny	transaction
à Statistics
– Mean,	median,	mode,	variance,	deviation
– Anomaly	detection.		Outliers.				(i.e.	womens retail	example)
à Machine	Learning
– Supervised
– Unsupervised
– Trainable
– Adapt	over	time
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Discovery
à Gathered	all	Credit	Card	Transactions
– Problem	is	they	didn’t	make	sense
– No	identifiable	patterns,	no	log	normal	curves
– Gas	$45,	Chipotle	$8.50,	Steak	dinner	$88,	Amazon	shoes	$55
à Classification
15 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Outlier Detection: identify abnormal patterns
Example:	identify	anomalies
Features:
- Time	frequency
- Category	
- Amount
- Distance
16 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Fraud Detection Demo Technical Architecture
Real-Time	Data	
Movement
(Apache	Nifi)
Real	Time	Processing
(Storm)
Inbound	Messaging
(Kafka)
D A T A I N
M O T I O N
D A T A I N
M O T I O N
Distributed	 Storage:	HDFS
Many	Workloads:	 YARN
Real-time	Serving	(HBase)
Spark
(Machine	Learning)
UI	and	HTTP	PubSub
(Jetty	and	Tomcat)
Data	Science
(DSX)
Resource	Allocation
(Docker)
Interactive	Query
(Hive)
Authorization
(Ranger)
Governance
(Atlas)
All	Running	on	Top	of	IBM	Power	Hardware
17 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Use	Case	Demo
Credit	Card	Fraud	Prevention
18 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Page 18
Credit Fraud Analyst Inbox
19 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Page 19
Credit Fraud Analyst Investigation
20 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Page 20
Credit Fraud Analyst Action
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Page 21
Hortonworks Data Flow- Backbone for Bi-Directional Communication
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Demo Summary
Problems	Solved
• Data	Scientist	teams	can	collaborate	and	learn	new	tools	on	a	common	frameworks.
• Choice	of	open	source	tools,	notebooks,	and	languages.
• Run	favorite	notebook	on	all	data	in	their	HDP	cluster.
• Deploy	the	model	to	production.
• Leverage	the	production	model	to	deliver	insights	to	business.
• Monitor	the	health	and	performance	of	models	in	production.
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Page 23
Improved	
Experience	
/Reduced	Cost
Immediate	
Customer	
Feedback
Years	of	
Customer	
Transaction	Data
Fraud	Detection
Complete	
Customer	
Profile
Real	time	
ingest	of	
transactions
Proactively	identify	potential	
fraudulent	transactions	to	
protect	the	customer	and	
improve	customer	experience
• Proactively	monitor	every	credit	
card	transaction	using	machine	
learning	to	catch	potential	fraud
• Customer	Service	Analyst	reviews	
flagged	transactions	in	real	time	via	
a	next	generation	application	
running	on	the	connected	platform
• HDF	controls	real	time	flow	of	data	
in	and	out	of	the	connected	
platform	to	the	various	source	and	
destination	points
Innovate
Renovate
Purchase	
Behavior	
Insight
Journey to Fraud Detection
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data Science Solution
Community Open Source Scale & Enterprise Security
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Code in Scala/Python/R/SQL
• Zeppelin & Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
• Data Science at Scale
• Run Spark Jobs on HDP Cluster
• Secure Hadoop Support
• Ranger Atlas Support for Data
• Support for ABAC
Model Management
• Data Shaping Pipeline UI
• Auto-data preparation & modeling
• Advanced Visualizations
• Model management & deployment
• Documented Model APIs
Data Science Experience
Freedom:
Choose	the	right	tool	for	
your	team	and	business.
Productivity:
Make	both	experienced	and	
novice	data	scientists	more	
productive.
Trust:
Confidently	deploy	insights	
generated	from	the	most	
current	data	and	trends.
enterprise-ready
software distribution
built on open source
tools for ease
of development
performance
faster training times
for data scientists
+
IBM Power Systems
designed to deliver
breakthrough performance
for data
threads per core
processor cache
memory bandwidth
open innovation
+++
MOREvs.
x86
+ BETTER
L1 ßà L4
COMMUNITY
availability | scalability | reliability | serviceability
get more work done
fastest memory lives on cores
more data than ever is flowing
faster innovation and value
MEANS
26
Accelerate Data Science with Power Systems
Test results based on running a machine learning workload based on k-means clustering algorithm on data sets size ranging from 1GB to 15 GB. Test System details – Power Systems
S822 LC HPC – 20 Cores, 512 GB RAM and SSD, Power Systems S822LC Big Data – 20 Cores, 512 GB, HDDs, Intel Server with Broadwell E5 2640 v4 – 20 cores, 512 GB and SSD,
Intel Server with Broadwell E5 2699 v4 – 44 cores, 512 GB, HDD
• Increase Data Science Team productivity
• Reduce model training time
− 2.5X with S822LC for HPC vs E5-2640 v4
(with SSD)
− 1.5X with S822LC for Big Data vs E5-2699 v4
(with HDD)
• Leverage larger datasets for model
training
• 2.5X larger dataset in the same time (1200 Seconds -
~5GB for x86 server E5 2640 with SSD vs 13GB for
Power server S822 LC HPC with SSD)
0
600
1200
1800
2400
3000
3600
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Data Size (GB)
Elapsed time to form 5 clusters in 100 Iterations using
k-means clustering with one user
S822LC HPC with SSD S822LC BigData with HDD
E5 2699 v4 with HDD E5 2640 v4 with SSD
ElapsedTime(seconds)
The Perfect Blend of Data Science and an Enterprise Data Lake
28
Better
Together
datascience.ibm.com
Boost Data Science Team
Productivity: model training
in less than half the time
versus x86
Blazing Fast Insights for Line
of Business: A 1.7x
improvement in time to result
Secure and Reliable Data Access at Scale: Open, comprehensive data
lifecycle and security management on the most reliable servers.
For clients building a high
performing Data Science
practice with a fast, scalable,
enterprise Data Lake
Acomplete solution of Data Science
and Hadoop software, hardware and
quick start services.
29 © 2016 IBM Corporation
Image Name Software Versions Linux Version
HDP 2.6.2 HDP 2.6.2 RHEL 7.3
HDP 2.6.4 HDP 2.6.4 RHEL 7.4
HDP/HDF Security Governance Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4
HDP/HDF Credit Card Fraud Detection Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4
HDP/HDF IOT Trucking Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4
Hortonworks Preconfigured Images available on IBM POWER8
Size Flavor Options Description
Small 8 vCPUs, 24GB memory, 50GB disk
Medium 16 vCPUs, 32GB memory, 200GB disk
Large 24 vCPUs, 48GB memory, 500GB disk
1. Go to IBM Power DevelopmentCloud (PDC):Link
2. Follow the Get Started process via the “Go to Program to Get Started” link and register for IBM PDC as a Partner or Open
Source Developer
3. When you reach the IBM PDC “Make a Reservation” page,click Requestpromo code
4. SelectRed Hat Linux for the Image Category.Enter the vCPUs and memory using values from the size/flavor options in the
table below.In Other requirements field,enter one of the Image names from the table below.Click Submit.
5. Wait for an approval email.Then,follow the instructions in the Create Reservation guide to complete your reservation.
6. On the reservations page,select the company profile that shows VMaaS, enter the Promo code received in the email,and
click Apply.
7. In the next form, select the desired Flavor and Image name.
How to Get Started with Hortonworks on OpenPOWER Systems
• Learn more about the benefits of IBM Power Systems and OpenPOWER
• Join the Hortonworks Community: https://community.hortonworks.com/
• Learn more about the benefits of Hortonworks: http://hortonworks.com/training/
• Sign up for Free Data Science and Cognitive Computing courses:
https://cognitiveclass.ai/
• Try the solution: IBM benchmark centers, on the cloud or on your premise
Q&A
IBM Cloud / DOC ID / Month XX, 2017 / © 2017 IBM Corporation
Thank you
IBM Cloud / DOC ID / Month XX, 2017 / © 2017 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization JourneyHortonworks
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onDataWorks Summit
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning EverywhereDataWorks Summit
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic EcosystemsHortonworks
 
Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...DataWorks Summit
 

Was ist angesagt? (20)

Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
 
Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...
 

Ähnlich wie Accelerating Data Science and Real Time Analytics at Scale

Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumStarttech Ventures
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationHitachi Vantara
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
The 10 Best Data Analytics And BI Platforms And Tools In 2020
The 10 Best Data Analytics And BI Platforms And Tools In 2020The 10 Best Data Analytics And BI Platforms And Tools In 2020
The 10 Best Data Analytics And BI Platforms And Tools In 2020Bernard Marr
 
Watson and Cognitive Meetup April 2017
Watson and Cognitive Meetup   April 2017Watson and Cognitive Meetup   April 2017
Watson and Cognitive Meetup April 2017Rick Osowski
 
Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012Joergen Floes
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelKangaroot
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...Sustainable Brands
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyArthur_Hansen
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Dell AI and HPC University Roadshow
Dell AI and HPC University RoadshowDell AI and HPC University Roadshow
Dell AI and HPC University RoadshowBill Wong
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SLSkylabReddy Vanga
 

Ähnlich wie Accelerating Data Science and Real Time Analytics at Scale (20)

Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi Innovation
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Future of Big Data
Future of Big DataFuture of Big Data
Future of Big Data
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
The 10 Best Data Analytics And BI Platforms And Tools In 2020
The 10 Best Data Analytics And BI Platforms And Tools In 2020The 10 Best Data Analytics And BI Platforms And Tools In 2020
The 10 Best Data Analytics And BI Platforms And Tools In 2020
 
Watson and Cognitive Meetup April 2017
Watson and Cognitive Meetup   April 2017Watson and Cognitive Meetup   April 2017
Watson and Cognitive Meetup April 2017
 
Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...
Radical Optimization: How the Internet of Things, 3D Printing and Innovative ...
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Dell AI and HPC University Roadshow
Dell AI and HPC University RoadshowDell AI and HPC University Roadshow
Dell AI and HPC University Roadshow
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
 

Mehr von Hortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive DataHortonworks
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of DataHortonworks
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateHortonworks
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseHortonworks
 
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...Hortonworks
 
The Life of a Hadoop Administrator, with and without SmartSense
The Life of a Hadoop Administrator, with and without SmartSenseThe Life of a Hadoop Administrator, with and without SmartSense
The Life of a Hadoop Administrator, with and without SmartSenseHortonworks
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessHortonworks
 

Mehr von Hortonworks (15)

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...
How to Architect and Omnichannel Retail Solution to Achieve Real-Time Custome...
 
The Life of a Hadoop Administrator, with and without SmartSense
The Life of a Hadoop Administrator, with and without SmartSenseThe Life of a Hadoop Administrator, with and without SmartSense
The Life of a Hadoop Administrator, with and without SmartSense
 
Enterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to SuccessEnterprise Data Warehouse Optimization: 7 Keys to Success
Enterprise Data Warehouse Optimization: 7 Keys to Success
 

Kürzlich hochgeladen

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 

Kürzlich hochgeladen (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 

Accelerating Data Science and Real Time Analytics at Scale

  • 1. Accelerating Data Science and Real-Time Analytics at Scale Nadeem Asghar, Hortonworks, Field CTO and Global Head Partner Engineering Steve Roberts, IBM, Big Data Offering Manager
  • 2. Data Time Available Data Understood Data Enterprise Amnesia 80 million wearable health devices will be available by 2017. 2.5 quintillion bytes of data generated daily by connected machines. There will be 28 times more sensor- enabled devices than people by the year 2020. 25 gigabytes of data per hour is generated by a connected car. 90% of cars will be connected by 2020. 153 exabytes of healthcare data generated by devices in 2013. Increasing to 2,314 exabytes in 2020. 1.7 megabytes of data per second generated by every human being on the planet by 2020.
  • 3. Centralized Mainframes Cognitive Era E-Business Distributed Computing Smarter Planet Office Productivity Client/ Server Personal Computer Data Warehousing Big Data & Predictive Analytics Cognitive A New Era of Computing Has Emerged Data InsightContext Transactional Database Business Intelligence Big Data & Analytics Actionable Insight in context Reporting Cloud
  • 4. © 2018 IBM Corporation A recruiting and HR company, chose an IBM & Hortonworks full stack solution to support their Hadoop/Spark workloads and accelerate their analytics and AI projects Business problem Job-matching is their core business and accuracy and speed of this matching is critical to their success. This requires the intake and analysis of terabytes of data daily – including recruiter and company information, job listings, hiring histories, and resumes. Future requirement to apply AI to more complex data such as images, sound and video. Benefits • Proven performance • World class support • Reliable security for personal data • Built on open technologies, avoiding vendor lock-in • Scalable software defined storage proven for analytics • POWER9 and PowerAI supports their AI research and development From Data to AIIntelligent Job Matching
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved à #1 Pure Open Source Hadoop Distribution à 1000+ customers and 2100+ ecosystem partners à Employs the original architects, developers and operators of Hadoop from Yahoo! à Best-in-class 24x7 customer support à Leading professional services and training à Data Science Leader à OpenPOWERperformance leadership à Flexible, software defined storage à #1 SQL Engine for complex, analytical workloads à Leader in On-premise and Hybrid Cloud solutions + IBM + Hortonworks = Unlocking Actionable Insights
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DATA – More Volume and More Types I N C R EAS I N G D ATA V AR I ETY AN D C O MP L EX I TY USER GENERATED CONTENT MOBILE WEB SMS/MMS SENTIMENT EXTERNAL DEMOGRAPHICS HD VIDEO SPEECH TO TEXT PRODUCT/ SERVICE LOGS SOCIAL NETWORK BUSINESS DATA FEEDS USER CLICK STREAM WEB LOGS OFFER HISTORY DYNAMIC PRICING A/B TESTING AFFILIATE NETWORKS SEARCH MARKETING BEHAVIORAL TARGETING DYNAMIC FUNNELSPAYMENT RECORD SUPPORT CONTACTS CUSTOMER TOUCHESPURCHASE DETAIL PURCHASE RECORD SEGMENTATIONOFFER DETAILS P E T A B Y T E S T E R A B Y T E S G I G A B Y T E S E X A B Y T E S ERP BIG DATA W EB CRM
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Analytics Must Evolve To Deal With Data Tipping Point PROVIDE INSIGHT INTO THE PAST via data aggregation, data mining, business reporting, OLAP, visualization, dashboards, etc. UNDERSTAND THE FUTURE via statistical models, forecasting techniques, machine learning, etc. ADVISE ON POSSIBLE OUTCOMES via rules, optimization and simulation algorithms
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Science and Real-Time Analytics at Scale End to End Data Science Workflow Data Engineering DISCOVER ACQUISITION PROCESSING CURATION Data Science DATA WRANGLING FEATURE ENGINING,VISUALIZATI ON AND ANALYSIS MODEL BUILDING, TRAINING AND TESTING Deployment & Operationalize REPORTS DASHBOARDS REAL-TIME SCORING BATCH SCORING REST SERVICES PERFORMANCE MGMT SCHEDULING Data Science Experience (DSX) Enterprise Services: Multi Notebook Support, Versioning, Collaboration, Model Management Hortonworks Data Platform (HDP) Enterprise Services: Data, GPU, Deep Learning, Compute, Security, Governance, Metadata, Operations Hortonworks Data Flow (HDF) Enterprise Services: Data Ingestion Schema Registry, CEP Hortonworks Data Flow (HDF) Enterprise Services: Data Ingestion Schema Registry, CEP
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Building a Model à Show of hands, how many have built a “Model”? à What are some limitations? – Conditional based logic: if/else binary decisions à If you need a lot of data to build a good model, what tools can you use? – Data volumes can eliminate the possibility of desktop tools à Sampling? – Well… we better get an even distribution of true and false positives in each sample, but wait that requires data munging, back to what tools can we use. à Security Concerns? – Extracting data from it’s secure resting place and pushing it into other environments, often times unsecure files or desktops where Matlab or R can be installed. à Collaboration – Push processing to the data using modern distributed tooling.
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Credit Card Fraud Use Case à Requirement: Detect fraudulent transactions. à Goal: Save the card company money and build trust amongst card users. Cut down on fraudulent crime à Functional Requirement: Detect fraud in under 2 seconds at point of sale. Learn, adapt and make smarter decisions over time. à Design – Distance: How far can one travel over a period of time before it is fraudulent? – Category: How can we detect a purchase that a customer wouldn’t likely make? – Frequency: How can we detect purchasing patterns that do not resemble the card holder? à Ideas? – White board some conditional logic, egregiousness vs binary – Back test the data – Build a model per card holder?
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Rules, Statistics, Machine Learning à Rule Based Logic – Great for checking conditions that can prove to be 100% accurate. Easy to build and no reason to over engineer. – Example: Spending Limit. Card holder limit = $2,000 • If (currentPurchaseAmount+ balance > 2,000) then deny transaction à Statistics – Mean, median, mode, variance, deviation – Anomaly detection. Outliers. (i.e. womens retail example) à Machine Learning – Supervised – Unsupervised – Trainable – Adapt over time
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Discovery à Gathered all Credit Card Transactions – Problem is they didn’t make sense – No identifiable patterns, no log normal curves – Gas $45, Chipotle $8.50, Steak dinner $88, Amazon shoes $55 à Classification
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Outlier Detection: identify abnormal patterns Example: identify anomalies Features: - Time frequency - Category - Amount - Distance
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fraud Detection Demo Technical Architecture Real-Time Data Movement (Apache Nifi) Real Time Processing (Storm) Inbound Messaging (Kafka) D A T A I N M O T I O N D A T A I N M O T I O N Distributed Storage: HDFS Many Workloads: YARN Real-time Serving (HBase) Spark (Machine Learning) UI and HTTP PubSub (Jetty and Tomcat) Data Science (DSX) Resource Allocation (Docker) Interactive Query (Hive) Authorization (Ranger) Governance (Atlas) All Running on Top of IBM Power Hardware
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 19 Credit Fraud Analyst Investigation
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 21 Hortonworks Data Flow- Backbone for Bi-Directional Communication
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Summary Problems Solved • Data Scientist teams can collaborate and learn new tools on a common frameworks. • Choice of open source tools, notebooks, and languages. • Run favorite notebook on all data in their HDP cluster. • Deploy the model to production. • Leverage the production model to deliver insights to business. • Monitor the health and performance of models in production.
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 23 Improved Experience /Reduced Cost Immediate Customer Feedback Years of Customer Transaction Data Fraud Detection Complete Customer Profile Real time ingest of transactions Proactively identify potential fraudulent transactions to protect the customer and improve customer experience • Proactively monitor every credit card transaction using machine learning to catch potential fraud • Customer Service Analyst reviews flagged transactions in real time via a next generation application running on the connected platform • HDF controls real time flow of data in and out of the connected platform to the various source and destination points Innovate Renovate Purchase Behavior Insight Journey to Fraud Detection
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Science Solution Community Open Source Scale & Enterprise Security • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Code in Scala/Python/R/SQL • Zeppelin & Jupyter Notebooks • RStudio IDE and Shiny • Apache Spark • Your favorite libraries • Data Science at Scale • Run Spark Jobs on HDP Cluster • Secure Hadoop Support • Ranger Atlas Support for Data • Support for ABAC Model Management • Data Shaping Pipeline UI • Auto-data preparation & modeling • Advanced Visualizations • Model management & deployment • Documented Model APIs Data Science Experience Freedom: Choose the right tool for your team and business. Productivity: Make both experienced and novice data scientists more productive. Trust: Confidently deploy insights generated from the most current data and trends.
  • 25. enterprise-ready software distribution built on open source tools for ease of development performance faster training times for data scientists +
  • 26. IBM Power Systems designed to deliver breakthrough performance for data threads per core processor cache memory bandwidth open innovation +++ MOREvs. x86 + BETTER L1 ßà L4 COMMUNITY availability | scalability | reliability | serviceability get more work done fastest memory lives on cores more data than ever is flowing faster innovation and value MEANS 26
  • 27. Accelerate Data Science with Power Systems Test results based on running a machine learning workload based on k-means clustering algorithm on data sets size ranging from 1GB to 15 GB. Test System details – Power Systems S822 LC HPC – 20 Cores, 512 GB RAM and SSD, Power Systems S822LC Big Data – 20 Cores, 512 GB, HDDs, Intel Server with Broadwell E5 2640 v4 – 20 cores, 512 GB and SSD, Intel Server with Broadwell E5 2699 v4 – 44 cores, 512 GB, HDD • Increase Data Science Team productivity • Reduce model training time − 2.5X with S822LC for HPC vs E5-2640 v4 (with SSD) − 1.5X with S822LC for Big Data vs E5-2699 v4 (with HDD) • Leverage larger datasets for model training • 2.5X larger dataset in the same time (1200 Seconds - ~5GB for x86 server E5 2640 with SSD vs 13GB for Power server S822 LC HPC with SSD) 0 600 1200 1800 2400 3000 3600 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data Size (GB) Elapsed time to form 5 clusters in 100 Iterations using k-means clustering with one user S822LC HPC with SSD S822LC BigData with HDD E5 2699 v4 with HDD E5 2640 v4 with SSD ElapsedTime(seconds)
  • 28. The Perfect Blend of Data Science and an Enterprise Data Lake 28 Better Together datascience.ibm.com Boost Data Science Team Productivity: model training in less than half the time versus x86 Blazing Fast Insights for Line of Business: A 1.7x improvement in time to result Secure and Reliable Data Access at Scale: Open, comprehensive data lifecycle and security management on the most reliable servers. For clients building a high performing Data Science practice with a fast, scalable, enterprise Data Lake Acomplete solution of Data Science and Hadoop software, hardware and quick start services.
  • 29. 29 © 2016 IBM Corporation Image Name Software Versions Linux Version HDP 2.6.2 HDP 2.6.2 RHEL 7.3 HDP 2.6.4 HDP 2.6.4 RHEL 7.4 HDP/HDF Security Governance Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4 HDP/HDF Credit Card Fraud Detection Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4 HDP/HDF IOT Trucking Demo HDP 2.6.3, HDF 3.0.3 RHEL 7.4 Hortonworks Preconfigured Images available on IBM POWER8 Size Flavor Options Description Small 8 vCPUs, 24GB memory, 50GB disk Medium 16 vCPUs, 32GB memory, 200GB disk Large 24 vCPUs, 48GB memory, 500GB disk 1. Go to IBM Power DevelopmentCloud (PDC):Link 2. Follow the Get Started process via the “Go to Program to Get Started” link and register for IBM PDC as a Partner or Open Source Developer 3. When you reach the IBM PDC “Make a Reservation” page,click Requestpromo code 4. SelectRed Hat Linux for the Image Category.Enter the vCPUs and memory using values from the size/flavor options in the table below.In Other requirements field,enter one of the Image names from the table below.Click Submit. 5. Wait for an approval email.Then,follow the instructions in the Create Reservation guide to complete your reservation. 6. On the reservations page,select the company profile that shows VMaaS, enter the Promo code received in the email,and click Apply. 7. In the next form, select the desired Flavor and Image name.
  • 30. How to Get Started with Hortonworks on OpenPOWER Systems • Learn more about the benefits of IBM Power Systems and OpenPOWER • Join the Hortonworks Community: https://community.hortonworks.com/ • Learn more about the benefits of Hortonworks: http://hortonworks.com/training/ • Sign up for Free Data Science and Cognitive Computing courses: https://cognitiveclass.ai/ • Try the solution: IBM benchmark centers, on the cloud or on your premise
  • 31. Q&A IBM Cloud / DOC ID / Month XX, 2017 / © 2017 IBM Corporation
  • 32. Thank you IBM Cloud / DOC ID / Month XX, 2017 / © 2017 IBM Corporation