SlideShare ist ein Scribd-Unternehmen logo
1 von 28
DMT 3260
Citizens Bank Data Lake Implementation: Selecting
BigInsights ViON Spark/Hadoop Appliance
Dana Rafiee, Destiny Corporation
John DiFranco, Citizens Bank
DMT 3260
Order of Presentation
Destiny Background
The Data Scientist
Client Infrastructure Challenges
Tools Used at Clients
Client Architecture Case Studies
Citizens Bank
Financial Processing Organization
DMT
Citizens Bank, formerly part of the Royal Bank of Scotland, is implementing
a BigInsights Hadoop Data Lake with PureData System for Analytics
(Netezza) to support all of its internal data initiatives. The goal is to provide
an improved experience for customers and to grow market share. Along
their ETL journey, we’ve used Netezza SQL, Hadoop and finally IBM
BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the
productivity, maintenance and performance that Citizens was looking for,
and this all came prepackaged in the the ViON Hadoop Appliance that was
rolled into its data centers—greatly simplifying entry into the Hadoop world
Abstract
DMT 3260
Destiny Background
• Business and Technology Consulting Firm
• Advising Fortune 500 Corporations for 30 years
• Build Data Lakes, Warehouses, Reporting and
Analytics environments for large corporations and
government
• Business Consultants
• Data Warehouse/Modeling Specialists
• Advanced Analytic Practitioners
• SAS and IBM Business Partner
• Objective Opinions
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Who is the Data Scientist?
• Data science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from data in various forms, either structured
or unstructured.
• Statistics
• Machine learning
• Data mining
• Predictive analytics
• “Data Scientist is the new title for the Analyst”
• Paul Kent, VP of Big Data at SAS Institute
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Requirements of the Data Scientist Community
• Immediate access to data no matter where it exists
• Simple access to systems
• Legacy and Open Community Tools
• Ample resources to do their work
• Ability to store analytical results
• Fast Execution
• Access to In-House Data and External Data
• Nimble IT shop or I will find another option (Cloud)
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Why is the Playing Field Different Today?
• Legacy Data and Systems
• OLTP Systems of Record
• Mainframes
• Data Warehouses and Marts
• Dark Data (Archived)
• New Data Sources
• Social Media
• Internet of Things
• Streaming Data
• Data Brokers – Search Yourself?
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Some Big Data Use Cases
• Macy’s Inc. - Real-Time Pricing on 73 Million items based on demand and inventory.
• Tipp24AG - Betting on European lotteries with predictive analytics, building models in less than 10% of the time.
• Walmart – Text Analytics, machine learning and synonym mining to produce relevant web site search results increasing
conversions by 10-15%.
• Fast Food and Digital Menus – Long drive through lines display quick delivered items, while short lines display higher margin
items that take longer to prepare.
• Morton’s Steak House – For a publicity stunt, analyzed tweets about Morton’s, matched data to a frequent Morton’s diner and
then delivered him dinner has he landed in the airport.
• PredPol Inc. – Los Angles and Santa Cruz Police use data about earthquakes and crime to predict where crimes will happen
after an earthquake. There is up to a 33% reduction in crimes.
• Tesco PLC – Track 70 million refrigerator data points to be more proactive with maintenance and cut down energy costs.
• American Express – Predicting and reducing customer churn through analysis of historical buying patterns.
• Express Scripts Holding Co. – Through analysis, determined people were forgetting to take their medications. Invented beeping
medicine capsules and implemented automated phone calls.
• Infinity Property and Casualty Corp.- Re-analyzing dark data on claims now allow them to recover $12M in subrogation claims.
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
IT’s Challenges in Supporting the Data Scientists
• Building Proper Infrastructure to Support the Business
– Timely Access to data and systems
– Simple to use
– Open to new technologies and capabilities
– Accurate data
– Current data to support business needs
– Powerful enough to crunch all the data
– Fast or Cheap
– Robust and Reliable in an Open Environment
– On-Premise or Cloud or Hybrid
– Support Mandated Regulations
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
The Traditional IT Architecture
Mainframe Data WarehouseData Input Analyst
Information
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Why is it Not Enough?
• Inflexible
• Cannot capture new forms of data
• Cannot easily analyze new forms of data
• Cannot economically handle large data
volumes
• Cannot easily integrate with the Open
Community
• Long Lead Times for IT Projects
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Designing the New Infrastructure
• New Non-Standard Data Sources
• Structured
• Unstructured
• Streaming
• NOSQL forms
• External Sources
• Ability to Land All Data Economically
• Let the business decide what data is required
• New Analytics Requirements
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Some IT Infrastructure Considerations
• Limited Budgets and Resources
• Master Data Management
• Hadoop
– Bronze, Silver, Gold
– Single copy of the Data
– Spectrum Scale/GPFS
– Other Options
• Storage Mechanisms
– Elastic Storage Server
– DS8800, XIV
– Flash
• Types of Queries
• Historical Information
• Speed of Processing
– Fast, Expensive
– Slow, Cheap
• Location
– On-Premise
– Cloud
• Mobile Device Requirements
• Virtual Desktop
• Keeping Data In-Sync – Production and DR
– Update Strategies
– Replication Strategies
– Database
– SAN Store Utilities
• Data In-Flight
• Data Lineage
• Appliances
– PDA/Netezza
– SAP/Hana on Power
– DB2 Blu – On Premises
– DataAdapt Spark Hadoop Appliance
(BigInsights)
• Grid Processing
• Regulatory Compliance
• Data Governance
• In-house maintenance or Managed Service
• IEEE 802.3ba 40GbE, Direct Attached SAN,
NAS
• Politics
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Data Classifications
0
0.5
1
1.5
2
2.5
3
Bronze Silver Gold
Volume
Data Scientist Power User BI End User
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Discovery and Transformation of Data
• Tools to Analyze and Transform Data
– Data Stage
– Podium
– Trillium
– DataFlux
– Informatica
– Talend
• User Tools to Gain Insight into the Data
– Watson Explorer
– Attivio
• In-Database
• In Memory and Machine Learning
– Apache Spark – Micro Batches
– Apache Flink – Streaming Data Flow Engine and Memory
Management
• Other
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Building Analytics Processes and the
Challenges
• Three Categories
– Ad Hoc
– Standard Analysis and Reporting
– Statistical Models
• Challenges for IT
– Skill Sets of the Data Scientist and Power Users
– Playing Nicely Together
– Structure of the Data – Data Modeling vs. SQL Tools
– Location and Movement of the Data
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Case Studies
• Citizens Bank BigInsights Deployment
• Global Financial Advisors Deployment
• Financial Processing Organization Design
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Citizens Bank Original Environment
• Teradata Data Warehouse
• Raw Data and History (Staging from record systems)
• Conformed Data to a Data Model (Mapped to industry standard model)
• Data Marts (Fit for purpose business specific)
DMT 3260
Challenges with the Teradata Environment
• Processing on Teradata was slow due to:
• Traditional Teradata Data Warehouse Framework
• Reference Model
• Slow Time to Market
• Extremely Expensive in Labor Costs
• Extremely Expensive to add Additional Computing Capacity
• System and SAS costs increasing
DMT 3260
Looking for Alternatives
• Execution of an information Proof of Concept
• IBM
• Oracle
• Cloudera
• Hortonworks
DMT 3260
Conclusions and Choices Made
• The IBM BigInsights Appliance is the most cost effective
• Minimal engagement from internal infrastructure organization
• Delivered fully assembled with hardware and software
• Appliance Model value proposition similar to a Netezza Appliance
DMT 3260
Standard Tools at Citizens
• IBM BigSQL
• assurance that standard tools would work well with (DB2 LUW V 10.5)
• All products support this platform
• Oracle OBI-EE – Operational Reporting
• SAS for Statistical Modeling
• Tableau for Visual Reporting
• Datastage for ETL – centralized application development model
• Spectrum Scale(GPFS) vs. Hadoop for better management of the data
and less raw storage
• Fluid Query for connections to BigInsights
DMT 3260
POC on BigInsights Appliance
• Datastage processing running on Teradata was moved to BigInsights
• Client Connectivity, queries, testing and validation
• Proved that the platform could be used as the server and storage to run
enterprise data stage processing
DMT 3260
Results
• Moved Analytics processing from Teradata to Netezza
(cost/performance)
• Increase in SAS performance by running in Netezza database
• Repurposed some SAS costs
• Reduced data warehouse admin support costs (Teradata DBAs
reallocated)
• Implemented BigInsights Hadoop for a data lake (staging and
conformity)
• Avoided large capital outlays for additional Teradata capacity
• Reduction in Labor Effort to use the new platforms
DMT 3260
Future Plans
• Evaluating and Planning Implementation of dashDB (Bridge to Cloud) to
move some items to Cloud
• Instead of paying for another year of S&S, using the funds for Bridge to
Cloud
• Attractive price point
• Adding new applications (Risk) to Netezza and the Data Lake
DMT 3260
Complimentary Consultation
o Contact Us at: info@destinycorp.com
• Discovery Session
• Analysis of Architecture
• Business Process
• Governance
• High Level Recommendations
DMT 3260
Questions and Answers
DMT 3260
Contact Information
Dana Rafiee
Managing Director
Destiny Corporation
860-721-1684 x2007
drafiee@destinycorp.com
www.destinycorp.com
John DiFranco
SVP - Director of Enterprise Data Management
Citizens Bank
John.difranco@citizensbank.com
www.citizensbank.com
781-655-4489
Thank you for your time

Weitere ähnliche Inhalte

Was ist angesagt?

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Was ist angesagt? (20)

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
The System Administrator Role in the Cloud Era: Better Than Ever (ENT212) | A...
The System Administrator Role in the Cloud Era: Better Than Ever (ENT212) | A...The System Administrator Role in the Cloud Era: Better Than Ever (ENT212) | A...
The System Administrator Role in the Cloud Era: Better Than Ever (ENT212) | A...
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 

Andere mochten auch

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
 

Andere mochten auch (20)

Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overview
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
World of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseWorld of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics House
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 
Luxury 3.0- a new Retail Scenario for Product Mass Customization and On Deman...
Luxury 3.0- a new Retail Scenario for Product Mass Customization and On Deman...Luxury 3.0- a new Retail Scenario for Product Mass Customization and On Deman...
Luxury 3.0- a new Retail Scenario for Product Mass Customization and On Deman...
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 

Ähnlich wie Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 

Ähnlich wie Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Mehr von Seeling Cheung (7)

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data Access
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For Telecom
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 

Kürzlich hochgeladen

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Kürzlich hochgeladen (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

  • 1. DMT 3260 Citizens Bank Data Lake Implementation: Selecting BigInsights ViON Spark/Hadoop Appliance Dana Rafiee, Destiny Corporation John DiFranco, Citizens Bank
  • 2. DMT 3260 Order of Presentation Destiny Background The Data Scientist Client Infrastructure Challenges Tools Used at Clients Client Architecture Case Studies Citizens Bank Financial Processing Organization
  • 3. DMT Citizens Bank, formerly part of the Royal Bank of Scotland, is implementing a BigInsights Hadoop Data Lake with PureData System for Analytics (Netezza) to support all of its internal data initiatives. The goal is to provide an improved experience for customers and to grow market share. Along their ETL journey, we’ve used Netezza SQL, Hadoop and finally IBM BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the productivity, maintenance and performance that Citizens was looking for, and this all came prepackaged in the the ViON Hadoop Appliance that was rolled into its data centers—greatly simplifying entry into the Hadoop world Abstract
  • 4. DMT 3260 Destiny Background • Business and Technology Consulting Firm • Advising Fortune 500 Corporations for 30 years • Build Data Lakes, Warehouses, Reporting and Analytics environments for large corporations and government • Business Consultants • Data Warehouse/Modeling Specialists • Advanced Analytic Practitioners • SAS and IBM Business Partner • Objective Opinions Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 5. DMT 3260 Who is the Data Scientist? • Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured. • Statistics • Machine learning • Data mining • Predictive analytics • “Data Scientist is the new title for the Analyst” • Paul Kent, VP of Big Data at SAS Institute Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 6. DMT 3260 Requirements of the Data Scientist Community • Immediate access to data no matter where it exists • Simple access to systems • Legacy and Open Community Tools • Ample resources to do their work • Ability to store analytical results • Fast Execution • Access to In-House Data and External Data • Nimble IT shop or I will find another option (Cloud) Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 7. DMT 3260 Why is the Playing Field Different Today? • Legacy Data and Systems • OLTP Systems of Record • Mainframes • Data Warehouses and Marts • Dark Data (Archived) • New Data Sources • Social Media • Internet of Things • Streaming Data • Data Brokers – Search Yourself? Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 8. DMT 3260 Some Big Data Use Cases • Macy’s Inc. - Real-Time Pricing on 73 Million items based on demand and inventory. • Tipp24AG - Betting on European lotteries with predictive analytics, building models in less than 10% of the time. • Walmart – Text Analytics, machine learning and synonym mining to produce relevant web site search results increasing conversions by 10-15%. • Fast Food and Digital Menus – Long drive through lines display quick delivered items, while short lines display higher margin items that take longer to prepare. • Morton’s Steak House – For a publicity stunt, analyzed tweets about Morton’s, matched data to a frequent Morton’s diner and then delivered him dinner has he landed in the airport. • PredPol Inc. – Los Angles and Santa Cruz Police use data about earthquakes and crime to predict where crimes will happen after an earthquake. There is up to a 33% reduction in crimes. • Tesco PLC – Track 70 million refrigerator data points to be more proactive with maintenance and cut down energy costs. • American Express – Predicting and reducing customer churn through analysis of historical buying patterns. • Express Scripts Holding Co. – Through analysis, determined people were forgetting to take their medications. Invented beeping medicine capsules and implemented automated phone calls. • Infinity Property and Casualty Corp.- Re-analyzing dark data on claims now allow them to recover $12M in subrogation claims. Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 9. DMT 3260 IT’s Challenges in Supporting the Data Scientists • Building Proper Infrastructure to Support the Business – Timely Access to data and systems – Simple to use – Open to new technologies and capabilities – Accurate data – Current data to support business needs – Powerful enough to crunch all the data – Fast or Cheap – Robust and Reliable in an Open Environment – On-Premise or Cloud or Hybrid – Support Mandated Regulations Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 10. DMT 3260 The Traditional IT Architecture Mainframe Data WarehouseData Input Analyst Information Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 11. DMT 3260 Why is it Not Enough? • Inflexible • Cannot capture new forms of data • Cannot easily analyze new forms of data • Cannot economically handle large data volumes • Cannot easily integrate with the Open Community • Long Lead Times for IT Projects Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 12. DMT 3260 Designing the New Infrastructure • New Non-Standard Data Sources • Structured • Unstructured • Streaming • NOSQL forms • External Sources • Ability to Land All Data Economically • Let the business decide what data is required • New Analytics Requirements Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 13. DMT 3260 Some IT Infrastructure Considerations • Limited Budgets and Resources • Master Data Management • Hadoop – Bronze, Silver, Gold – Single copy of the Data – Spectrum Scale/GPFS – Other Options • Storage Mechanisms – Elastic Storage Server – DS8800, XIV – Flash • Types of Queries • Historical Information • Speed of Processing – Fast, Expensive – Slow, Cheap • Location – On-Premise – Cloud • Mobile Device Requirements • Virtual Desktop • Keeping Data In-Sync – Production and DR – Update Strategies – Replication Strategies – Database – SAN Store Utilities • Data In-Flight • Data Lineage • Appliances – PDA/Netezza – SAP/Hana on Power – DB2 Blu – On Premises – DataAdapt Spark Hadoop Appliance (BigInsights) • Grid Processing • Regulatory Compliance • Data Governance • In-house maintenance or Managed Service • IEEE 802.3ba 40GbE, Direct Attached SAN, NAS • Politics Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 14. DMT 3260 Data Classifications 0 0.5 1 1.5 2 2.5 3 Bronze Silver Gold Volume Data Scientist Power User BI End User Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 15. DMT 3260 Discovery and Transformation of Data • Tools to Analyze and Transform Data – Data Stage – Podium – Trillium – DataFlux – Informatica – Talend • User Tools to Gain Insight into the Data – Watson Explorer – Attivio • In-Database • In Memory and Machine Learning – Apache Spark – Micro Batches – Apache Flink – Streaming Data Flow Engine and Memory Management • Other Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 16. DMT 3260 Building Analytics Processes and the Challenges • Three Categories – Ad Hoc – Standard Analysis and Reporting – Statistical Models • Challenges for IT – Skill Sets of the Data Scientist and Power Users – Playing Nicely Together – Structure of the Data – Data Modeling vs. SQL Tools – Location and Movement of the Data Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 17. DMT 3260 Case Studies • Citizens Bank BigInsights Deployment • Global Financial Advisors Deployment • Financial Processing Organization Design Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 18. DMT 3260 Citizens Bank Original Environment • Teradata Data Warehouse • Raw Data and History (Staging from record systems) • Conformed Data to a Data Model (Mapped to industry standard model) • Data Marts (Fit for purpose business specific)
  • 19. DMT 3260 Challenges with the Teradata Environment • Processing on Teradata was slow due to: • Traditional Teradata Data Warehouse Framework • Reference Model • Slow Time to Market • Extremely Expensive in Labor Costs • Extremely Expensive to add Additional Computing Capacity • System and SAS costs increasing
  • 20. DMT 3260 Looking for Alternatives • Execution of an information Proof of Concept • IBM • Oracle • Cloudera • Hortonworks
  • 21. DMT 3260 Conclusions and Choices Made • The IBM BigInsights Appliance is the most cost effective • Minimal engagement from internal infrastructure organization • Delivered fully assembled with hardware and software • Appliance Model value proposition similar to a Netezza Appliance
  • 22. DMT 3260 Standard Tools at Citizens • IBM BigSQL • assurance that standard tools would work well with (DB2 LUW V 10.5) • All products support this platform • Oracle OBI-EE – Operational Reporting • SAS for Statistical Modeling • Tableau for Visual Reporting • Datastage for ETL – centralized application development model • Spectrum Scale(GPFS) vs. Hadoop for better management of the data and less raw storage • Fluid Query for connections to BigInsights
  • 23. DMT 3260 POC on BigInsights Appliance • Datastage processing running on Teradata was moved to BigInsights • Client Connectivity, queries, testing and validation • Proved that the platform could be used as the server and storage to run enterprise data stage processing
  • 24. DMT 3260 Results • Moved Analytics processing from Teradata to Netezza (cost/performance) • Increase in SAS performance by running in Netezza database • Repurposed some SAS costs • Reduced data warehouse admin support costs (Teradata DBAs reallocated) • Implemented BigInsights Hadoop for a data lake (staging and conformity) • Avoided large capital outlays for additional Teradata capacity • Reduction in Labor Effort to use the new platforms
  • 25. DMT 3260 Future Plans • Evaluating and Planning Implementation of dashDB (Bridge to Cloud) to move some items to Cloud • Instead of paying for another year of S&S, using the funds for Bridge to Cloud • Attractive price point • Adding new applications (Risk) to Netezza and the Data Lake
  • 26. DMT 3260 Complimentary Consultation o Contact Us at: info@destinycorp.com • Discovery Session • Analysis of Architecture • Business Process • Governance • High Level Recommendations
  • 28. DMT 3260 Contact Information Dana Rafiee Managing Director Destiny Corporation 860-721-1684 x2007 drafiee@destinycorp.com www.destinycorp.com John DiFranco SVP - Director of Enterprise Data Management Citizens Bank John.difranco@citizensbank.com www.citizensbank.com 781-655-4489 Thank you for your time