SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Big Data in the Cloud
S&P Capital IQ
S&P Capital IQ combines two of our
strongest brands - S&P, with its long
history and experience in the financial
markets and Capital IQ, which is known
among professionals globally for its
comprehensive company and financial
information and powerful analytical
tools.
Agenda

• Creation of Excel Plug-in with Global
  Data, Global Sales and US based servers
• High Performance data gets for Big
  Historical Time Series Data.
• QA
S&P Capital IQ Excel Plug-in




• Excel Plug-in provides thousands of data points
  on Demand
• Allows customers anywhere in the world to use
  our data assets on their desktops on demand
• It needs to be a fast user experience
  everywhere in the world
Global Customers US Data Center




                                 Average Response Time Milliseconds
From: London To: New Jersey                     400
From: New York To: New Jersey                    30
From: Melbourne To: New Jersey                  800
                                                      Response times rounded
Global Customers Global Data Center




                                Average Response Time Milliseconds
From: London To: Ireland                     400 to 40
From: New York To: New Jersey                30 to 30
From: Melbourne To: Singapore                800 to 60
                                                    Response times rounded
Cloud Architecture
                              HTTPS




                               HTTPS            HTTPS




New Jersey DC                                  HTTPS




                                       HTTPS



                                  HTTPS
How do we make it even faster?
                                                      Smart Cache
                                      Pre-send data



                                                        Router




- Move data the customers uses the most to their
  desktop.
- Automatically get the data for the customer.
- Learn to send the right data to the customer.
Smart Cache
                2.
                                     3.   Smart Cache
                                                            1.


       5.                       4.                      a

                                            Router
                            b


1. User Opts into Smart Cache
2. The system pre-sends data package to customer
3. User makes a request for data
    a. Smart Cache Checks Locally first
    b. Not local grab data from the cloud
4. Smart Cache sends usage logs
5. Pre-sent data package is altered for the customer
Smart Caching Data
                                                              Smart Cache
                                                   1.

                          4.                2.
                                  3.
                                                               Router
5.                   6.
                                                         7.




1.   Collect logs from smart cache
2.   Collect and decrypt cloud and local usage logs
3.   Apply logs to Mahout
4.   Use customer profile
5.   Mahout comes out with an update suggestion list
6.   Customer specific package is created
7.   Prepared package is ready for pick by smart cache
Smart Caching Data
             Lessons Learned
• The algorithm works similar to a website matching engine for
  shopping.
• Different in that the customer does not see the
  recommendations they just have a faster experience
• All data sets are used to learn but only large data sets are
  custom packaged for delivery
• Sometimes it is easier to just send the entire package when
  the data set is small enough and used by the customer.
• Don’t expect success day 1 or day 30 the longer you learn the
  more accurate it should become
• Not a replacement for simple logic
• Algorithm requires constant feeding and attention.
• There are cases where you can’t learn about your user such
  as when they share ID’s.
High Performance Data Gets
High Performance Data Gets




• Some data assets due to size are still routed back to the
  US
• Big Data sets ~10T of time series data
• As those data assets became more popular we needed
  to move the right data to the cloud
• Cannot synchronize the data so fast loads are required
• Single Milliseconds get times
High Performance Data Gets




• Using Hadoop learn what are the most used
  large data assets.
• Move the subset of data identified as the most
  used data to the cloud.
• Fast loading of millions of records
• Allow for Single Milliseconds data retrieval times
High Performance Data Gets

Cassandra                                                            Hbase
http://cassandra.apache.org/                                         http://hbase.apache.org/

Apache Cassandra is a highly scalable, eventually consistent,        HBase is an open-source, distributed, versioned, column-oriented
distributed, structured key-value store. Cassandra brings together   store modeled after Google's Bigtable: A Distributed Storage
the distributed systems technologies from Dynamo and the data        System for Structured Data by Chang et al. Just as Bigtable
model from Google's BigTable. Like Dynamo, Cassandra                 leverages the distributed data storage provided by the Google File
is eventually consistent. Like BigTable, Cassandra provides a        System, HBase provides Bigtable-like capabilities on top of Hadoop
ColumnFamily-based data model richer than typical key/value          and HDFS.
systems.
Cassandra was open sourced by Facebook in 2008, where it was         Hbase is similar to an RDBMS in that it has the concept of tables;
designed by Avinash Lakshman (one of the authors of Amazon's         however, columns in Hbase tables are not fixed in number or data
Dynamo) and Prashant Malik ( Facebook Engineer ). In a lot of        type and can have any data type which varies from one row to the
ways you can think of Cassandra as Dynamo 2.0 or a marriage of       other.
Dynamo and BigTable. Cassandra is in production use at Facebook
but is still under heavy development.
We tried to do a similar POC using Cassandra with a smaller subset
of data because of above mentioned hardware restrictions.
Unlike Hbase and RDBMS, there is no concept of a table. Instead
we have columns, column families and Keypsaces.
High Performance Data Gets
                 Cassandra          Hbase            Oracle
Data Get         400 Microseconds   1 Milliseconds   5 Seconds
Data Load        10 Minutes         10 Minutes       10 Minutes




       • Data Get
            – Time to pull 1 security and 1 data point
       • Data Load
            – Time take to load 6 million securities
High Performance Data Gets

• Virtual Oracle instances did not meet our
  performance needs.
• EMR needed for Hbase was not cost effective for
  data gets.
• Hbase is difficult to implement in AWS due to the
  hardware requirements of Hadoop
• Cassandra can be segmented logically for Big
  Data Assets with minimal to no performance
  degradation in AWS
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012Rob Thomas
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakesVishwas N
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesArnon Shimoni
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesKinetica
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 

Was ist angesagt? (20)

IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 

Ähnlich wie Big data cloud architecture

SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Cloud Native Day Tel Aviv
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 

Ähnlich wie Big data cloud architecture (20)

SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big data cloud architecture

  • 1.
  • 2. Big Data in the Cloud
  • 3. S&P Capital IQ S&P Capital IQ combines two of our strongest brands - S&P, with its long history and experience in the financial markets and Capital IQ, which is known among professionals globally for its comprehensive company and financial information and powerful analytical tools.
  • 4. Agenda • Creation of Excel Plug-in with Global Data, Global Sales and US based servers • High Performance data gets for Big Historical Time Series Data. • QA
  • 5. S&P Capital IQ Excel Plug-in • Excel Plug-in provides thousands of data points on Demand • Allows customers anywhere in the world to use our data assets on their desktops on demand • It needs to be a fast user experience everywhere in the world
  • 6. Global Customers US Data Center Average Response Time Milliseconds From: London To: New Jersey 400 From: New York To: New Jersey 30 From: Melbourne To: New Jersey 800 Response times rounded
  • 7. Global Customers Global Data Center Average Response Time Milliseconds From: London To: Ireland 400 to 40 From: New York To: New Jersey 30 to 30 From: Melbourne To: Singapore 800 to 60 Response times rounded
  • 8. Cloud Architecture HTTPS HTTPS HTTPS New Jersey DC HTTPS HTTPS HTTPS
  • 9. How do we make it even faster? Smart Cache Pre-send data Router - Move data the customers uses the most to their desktop. - Automatically get the data for the customer. - Learn to send the right data to the customer.
  • 10. Smart Cache 2. 3. Smart Cache 1. 5. 4. a Router b 1. User Opts into Smart Cache 2. The system pre-sends data package to customer 3. User makes a request for data a. Smart Cache Checks Locally first b. Not local grab data from the cloud 4. Smart Cache sends usage logs 5. Pre-sent data package is altered for the customer
  • 11. Smart Caching Data Smart Cache 1. 4. 2. 3. Router 5. 6. 7. 1. Collect logs from smart cache 2. Collect and decrypt cloud and local usage logs 3. Apply logs to Mahout 4. Use customer profile 5. Mahout comes out with an update suggestion list 6. Customer specific package is created 7. Prepared package is ready for pick by smart cache
  • 12. Smart Caching Data Lessons Learned • The algorithm works similar to a website matching engine for shopping. • Different in that the customer does not see the recommendations they just have a faster experience • All data sets are used to learn but only large data sets are custom packaged for delivery • Sometimes it is easier to just send the entire package when the data set is small enough and used by the customer. • Don’t expect success day 1 or day 30 the longer you learn the more accurate it should become • Not a replacement for simple logic • Algorithm requires constant feeding and attention. • There are cases where you can’t learn about your user such as when they share ID’s.
  • 14. High Performance Data Gets • Some data assets due to size are still routed back to the US • Big Data sets ~10T of time series data • As those data assets became more popular we needed to move the right data to the cloud • Cannot synchronize the data so fast loads are required • Single Milliseconds get times
  • 15. High Performance Data Gets • Using Hadoop learn what are the most used large data assets. • Move the subset of data identified as the most used data to the cloud. • Fast loading of millions of records • Allow for Single Milliseconds data retrieval times
  • 16. High Performance Data Gets Cassandra Hbase http://cassandra.apache.org/ http://hbase.apache.org/ Apache Cassandra is a highly scalable, eventually consistent, HBase is an open-source, distributed, versioned, column-oriented distributed, structured key-value store. Cassandra brings together store modeled after Google's Bigtable: A Distributed Storage the distributed systems technologies from Dynamo and the data System for Structured Data by Chang et al. Just as Bigtable model from Google's BigTable. Like Dynamo, Cassandra leverages the distributed data storage provided by the Google File is eventually consistent. Like BigTable, Cassandra provides a System, HBase provides Bigtable-like capabilities on top of Hadoop ColumnFamily-based data model richer than typical key/value and HDFS. systems. Cassandra was open sourced by Facebook in 2008, where it was Hbase is similar to an RDBMS in that it has the concept of tables; designed by Avinash Lakshman (one of the authors of Amazon's however, columns in Hbase tables are not fixed in number or data Dynamo) and Prashant Malik ( Facebook Engineer ). In a lot of type and can have any data type which varies from one row to the ways you can think of Cassandra as Dynamo 2.0 or a marriage of other. Dynamo and BigTable. Cassandra is in production use at Facebook but is still under heavy development. We tried to do a similar POC using Cassandra with a smaller subset of data because of above mentioned hardware restrictions. Unlike Hbase and RDBMS, there is no concept of a table. Instead we have columns, column families and Keypsaces.
  • 17. High Performance Data Gets Cassandra Hbase Oracle Data Get 400 Microseconds 1 Milliseconds 5 Seconds Data Load 10 Minutes 10 Minutes 10 Minutes • Data Get – Time to pull 1 security and 1 data point • Data Load – Time take to load 6 million securities
  • 18. High Performance Data Gets • Virtual Oracle instances did not meet our performance needs. • EMR needed for Hbase was not cost effective for data gets. • Hbase is difficult to implement in AWS due to the hardware requirements of Hadoop • Cassandra can be segmented logically for Big Data Assets with minimal to no performance degradation in AWS

Hinweis der Redaktion

  1. Maybe Remove