SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Data Works
Adrian Waddy , Nick Vaughan and Eloise Hindes
The Building Blocks of Big
Data
TheBankofEngland'sjourneytodeliveringa
bigdatacapability
Agenda
1. The Bank
Who are we, and what we do
3. Data Warehouse
Initial progress
2. Historic IT
Where are we starting from?
4. Hub 1 and 2
First adventures in Big Data
Any questions5. The Future
Where next? Scaling up
The Bank
Who we are and what we do
3
The Bank
4
“Arguably we are
now the most
powerful,
unelected
institution our
country has ever
seen. We need to
respond to that
by becoming
more open, more
accountable and
more
transparent.”
Spencer Dale
The Bank
5
1694:
‘Promote the publick Good
and Benefit of our
People…’
Current:
‘Promoting the good of the
people of the United
Kingdom by maintaining
monetary and financial
stability’
The Bank
6
The Bank
7
Operations RegulationPolicy
The Bank
8
Historic IT
Where are we starting from
9
Historic IT
10
1980s-1990s
Historic IT
11
1990s-2000s
Data Warehouse
Initial Progress
12
Data Warehouse
13
Growing
Demand
Automated
processing
High
availability
Improved
capabilities
Data Warehouse
14
• Affordable scaling
• Less silos
• Significant volumes
Data Warehouse
15
• Given that:
• A step change in
capability was
realised
• The progress made
could only be
described as a
success
• Why the need for a
change of direction?
Data Warehouse
16
Operations
Complexity of
estate
Regulation
EMIR
Policy
Changing Nature of
Roles
Data Warehouse
17
• Data is being stored in databases,
shared drives and a document
management solution - difficult to
search, retrieve, combine and analyse
data
• Many individuals are reliant on their
experience and internal network to
determine what data exists
• Analytical communities in the Bank
would like to collaborate more and to
use new tools and techniques that are
becoming standard in highly analytical
data environments
• Not all individuals have access to the
right tools or environment to be able to
run analysis
Data Warehouse
18
• The nature of Economic publications
were gradually moving from qualitative
to quantitative through the second half
of the 20th century
• In the 21st century and in particular in
response to the Financial Crisis there
was a marked acceleration in this
process
• The variety of mathematical and
statistical operations increasingly
appearing in Economics publications
need data on which to operate!
http://www.istl.org/12-fall/refereed4.html
European Market Infrastructure Regulation (EMIR)
• European Parliament & Council of the EU
• Implementation of G20 commitment
• Risk management regulation
• Avoidance of systemic risk
• Reduce likelihood and severity of future shocks
• Applies to…
• Over-the-counter derivatives (OTC) *
• Central counterparties (CCP)
• Trade Repositories (TR)
19
• What this meant for the Bank of England
• Oversight of OTC & exchange trades
• For UK entities supervised by the PRA
• 85 million transactions from 6 TRs
• 80 files of varying schemas (up to 20gb per file)
• 200+ columns per file
• A new data architecture to collect, store and process!
* $595 trillion market – Bank of International Settlements data end of June 2018
Central Banks & Granular Data – 2013
20
• ‘The Future of Regulatory Data and Analytics’
• A new data strategy?
• Micro-prudential data with macro–financial statistics?
• Storing and making use of granular datasets?
• Can heterogeneous data be harmonised?
• Who pays the costs for larger, faster and more accurate data?
• Individual privacy vs public transparency?
• Prudential Regulation Authority
• A new legal subsidiary of the BoE
• Supervisory & regulatory responsibilities
• Promote the safety & soundness of regulated firms
• Contribute to securing protection for policyholders
• A requirement to collect, store and process more data
Centre for Central Banking Studies – July 2014
• ‘Big Data and Central Banks’
• Diversification of data sources
• Legalities of enabling / constraining scope of granular data collections
• Development of inductive analytical approaches
• Advancement of data analysis capabilities, ML & AI
• Open Source tooling
• Importance of ‘Big Data’ to Central Banks in the years ahead
21
• Could Big Data…..
• Change the way that central banks operate?
• Transform how financial firms and other economic agents do business?
• Change the economy in ways that impact monetary and financial stability?
• Have implications for economic growth and employment?
https://www.aboveallimages.co.uk/wp-content/gallery/london/london_07.jpg
Bank of England Strategic Review – ‘One Mission, One Bank’
22
• ‘One Bank Data Architecture’
• Ability to share data across the Bank
• Reduce data silos
• Reduce the numbers of systems
• Improve discoverability
• Improve analytical capabilities via shared tooling
• Support genuine Big Data use cases
• Strategic data themes
• Management [Governance & Security]
• Collaboration [Sharing of Data]
• Standardisation [More robust processing]
• Exploitation [Tooling for gaining data insight]
Stage 1: The Appliance / Data Hub 1…
23
24
Landing Zone Raw Zone
DTCC zip
x20
UnaVista zip
X12
CME zip
x8
ICE zip
x9
RegisTR zip
x9
RefinedZone
ConsumeZone
StructuredZone
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
Source file format will change, although
change will not affect the ingestion and
unzip processes on the Raw Zone Stores historical data of source files in HDFS in its raw uncompressed format
Description
FTP process to load zip files into Data Hub cluster
Keep existing process that moves zip files, provided by the business, in the Landing Zone, into the Raw Zone.
Unzip process to extract raw data files
Keep existing process that unzips files to its raw format. The unzipped csv file is placed temporarily in a hdfs
directory. An external Hive table is created at this directory allowing the csv file to be queried using Hive or SparkSQL.
At the end of the process, this file is removed.
1
2
1 2
• Standard ETL process within market best practices for loading and
storage of data in its raw format
• N/A
LimitationsBenefits
Low Level Design
Raw Zone
Structured ZoneRaw Zone
csv
csv
csv
csv
csv
orc
orcorc
orcorc
orcorc
orcorc
orcorc
3 4
RefinedZone
ConsumeZone
3 Spark jobs that insert each source file into individual structured file table
Direct data ingestion from source file into a ORC Hive table. Each TR file data is ingested into a different structured ORC
table avoiding any mapping at this stage. Having one table per file also adds flexibility to the process, in terms of change
requests (changes are limited to the specific table and mapping rules to mapping table if a file added or an existing is
altered) and reprocessing workflow (only required to run partition of given file until mapping stage, reducing overall
workload).
• Allows easy access to the raw data, without any changes to it’s
underlying structure or format, with efficient compression for
storage and query efficiency
• Having individual tables for each file simplifies mapping process and
diminishes reprocessing workload
• File sizes on tables will be suboptimal,
although mitigated by the simplicity of
the mapping process and flexibility to
schema changes
LimitationsBenefits
4 Spark jobs that map each source file schema to a normalized schema for state information
Simplify mapping process, on both query complexity and performance axioms, by having individual spark mapping jobs to
a normalized state TR schema, both on table structure and on data types.
Table name Storage
format
Partitions Data sorted by Description
**_**_****_****_****** ORC year, month,
day
- One ORC table per TR,
file and version that
stores data in Hive
without columns
mapping
********_***** ORC year, month,
day, filetype
- One ORC table for state
TR data to store mapped
columns in a normalized
schema
25
Converts raw files into ORC and applies data type conversion and mapping rules to store information on a single table
Description
Low Level Design
Structured Zone
Structured
Zone
Raw
Zone
Refined ZoneStructured
Zone
orc orc
Landing Zone
zip
EXTRNAL DATA
ConsumeZone
5
6
Extracts external data source’s in
order to enrich and validate TR
data, maintaining historical data
for reprocessing purposes
Table name Storage
format
Partitions Data sorted by Description
****_****_***** ORC year, month,
day
assetclass,
counterparty
Stores TR data enriched
with external data
sources and additional
columns calculated
based on business rules.
These columns include
the de-duplication rule
set.
5 Load external data sources
Process that loads, unzips and inserts external data into Hive tables to use on data preparation step.
orcorc
Raw
Zone
Landing Zone
TRStateDataExternalReferenceData
6 Spark job that applies business rules and enriches source data with external table information
Calculate additional business columns and enrich with external reference data. Apply the de-duplication rule set and
Contract Continuity specifics
Creates materialized views for business consumption that is optimized for system performance
Description
26
• Centralized table that aggregates all TR state information on a single
point of access
• Segregation of concepts by calculating of business logic rules and
enrichment of source data with external sources on a separate layer
• Late arrival of files require a
reprocessing of daily partition
• Changes in business transformation
requirements require reprocessing of
the full table
LimitationsBenefits
Low Level Design
Refined Zone
Consume Zone
RawZone
StructuredZone
Refined Zone
orc orc
**_*****_****_*****_***_****
orc
**_*****_****_*****_***_****
Table name Storage
format
Partitions Data sorted by Use cases
**_*****_****_*****_***_**** ORC year, month,
day, assetclass
otc_or_etd, c1, c2 *****
*****
Contractual Continuity
*****
**_*****_****_*****_***_**** ORC year, month,
day
c1, c2 *****
**_*****_****_***** ORC year, month,
assetclass
c1, c2 Monthly time series
7
7 Spark job that creates materialized views physically optimized for standard in-house entry points of analysis
Replicates data in Refined Zone into the Consume Zone, with optimized technical partitions, to allow fast performance
while querying and data exploration based on different use cases of analysis. Process can be easily replicated to
accommodate different use cases by creating new partition keys.
Creates materialized views for business consumption that is optimized for system performance
orc
**_*****_****_*****
Description
• Captures generic entry points of analytical analysis
• Optimized to accommodate different analytical workloads based on
requirements
• Improves query performance due to physical partitioning of data
• Duplication of data and onus of
assessing the correct materialised view
is on the user. This could be mitigated
by including a OLAP cube, such as
Apache Druid
LimitationsBenefits
27
Low Level Design
Consume Zone
28
EMIR Trade Repositories framework
Landing
Zone
Structured ZoneRaw Zone Refined Zone Consume Zone
Data Governance
orc
mappings
orcorc orc
orc
orc
orc
TR DATA
zip csvzip
TRStateData
orcorczip csv
Reference data
29
EMIR
EMIR Project benefits for the wider Data Programme
Designed to set the right path for the Data Programme in 4 key aspects, aligned with the
One Bank Value:
Set the right technical
architecture to serve as a
standard for BoE Big Data
projects
Provide the drivers for a more
self-service Operating Model
Pair programming sessions for
on-the-job training and
coaching
Deliver a Data Quality and
plausibility Management
solution to be used across the
Data Programme
Architecture Self-service TOMData Science skills Data Plausibility
30
Demonstrate Data Science knowledge can be
upskilled
Sr. Data Scientist will deliver on-the-job training and coaching to FMID in order to upskill the existing team. From
this, we expect users to gain autonomy to develop new data analysis and ad hoc data exploration on existing
datasets in Data Hub.
31
1. How will training be delivered to
business areas?
2. What skills should be centralised
and what should stay in each
business team?
3. Upscale current team skillset or
expand resources?
Questions still open:
Training On the job coaching
Provide core skills and
understand how to use Big Data
tools
Pair programing and advisory work to
provide experience using Big Data
tools with R
How can Data Science skills be attained?
Data Hub 2
32
Automation Dynamic
Provisioning
Flexibility
Data Hub 2
33
Data Hub 2
34
• VMWare VxRack HCI offering
• EMC’s Isilon storage
• 392 cores per site, and circa 10 TB
RAM
• 320 TB of “usable” storage
• Storage: The equivalent to 7500
standard iphone Xs (1.32 tonnes of
iphones!!)
• Processing: The equivalent “cores”
as 84 standard iphone Xs
• Memory: The equivalent RAM as
4608 standard iphone Xs (a pile of
phones 35 ½ metres high)
Data Hub 2
35
Lessons Learned
36
People and Processes
Technology
Governance and
MetaData
Creativity
Tenacity
Experience
Questions

Weitere ähnliche Inhalte

Was ist angesagt?

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsDataWorks Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThoughtworks
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...DataWorks Summit
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...DataWorks Summit/Hadoop Summit
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationDataWorks Summit
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep diveDataWorks Summit
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...DataWorks Summit
 

Was ist angesagt? (20)

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, Benefits
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 

Ähnlich wie Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability

Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryModernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryDenodo
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseEdgar Alejandro Villegas
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an OverviewZahra Mansoori
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoDB
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 

Ähnlich wie Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability (20)

Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryModernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
BDA-Module-1.pptx
BDA-Module-1.pptxBDA-Module-1.pptx
BDA-Module-1.pptx
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability

  • 1. Data Works Adrian Waddy , Nick Vaughan and Eloise Hindes The Building Blocks of Big Data TheBankofEngland'sjourneytodeliveringa bigdatacapability
  • 2. Agenda 1. The Bank Who are we, and what we do 3. Data Warehouse Initial progress 2. Historic IT Where are we starting from? 4. Hub 1 and 2 First adventures in Big Data Any questions5. The Future Where next? Scaling up
  • 3. The Bank Who we are and what we do 3
  • 4. The Bank 4 “Arguably we are now the most powerful, unelected institution our country has ever seen. We need to respond to that by becoming more open, more accountable and more transparent.” Spencer Dale
  • 5. The Bank 5 1694: ‘Promote the publick Good and Benefit of our People…’ Current: ‘Promoting the good of the people of the United Kingdom by maintaining monetary and financial stability’
  • 9. Historic IT Where are we starting from 9
  • 14. Data Warehouse 14 • Affordable scaling • Less silos • Significant volumes
  • 15. Data Warehouse 15 • Given that: • A step change in capability was realised • The progress made could only be described as a success • Why the need for a change of direction?
  • 17. Data Warehouse 17 • Data is being stored in databases, shared drives and a document management solution - difficult to search, retrieve, combine and analyse data • Many individuals are reliant on their experience and internal network to determine what data exists • Analytical communities in the Bank would like to collaborate more and to use new tools and techniques that are becoming standard in highly analytical data environments • Not all individuals have access to the right tools or environment to be able to run analysis
  • 18. Data Warehouse 18 • The nature of Economic publications were gradually moving from qualitative to quantitative through the second half of the 20th century • In the 21st century and in particular in response to the Financial Crisis there was a marked acceleration in this process • The variety of mathematical and statistical operations increasingly appearing in Economics publications need data on which to operate! http://www.istl.org/12-fall/refereed4.html
  • 19. European Market Infrastructure Regulation (EMIR) • European Parliament & Council of the EU • Implementation of G20 commitment • Risk management regulation • Avoidance of systemic risk • Reduce likelihood and severity of future shocks • Applies to… • Over-the-counter derivatives (OTC) * • Central counterparties (CCP) • Trade Repositories (TR) 19 • What this meant for the Bank of England • Oversight of OTC & exchange trades • For UK entities supervised by the PRA • 85 million transactions from 6 TRs • 80 files of varying schemas (up to 20gb per file) • 200+ columns per file • A new data architecture to collect, store and process! * $595 trillion market – Bank of International Settlements data end of June 2018
  • 20. Central Banks & Granular Data – 2013 20 • ‘The Future of Regulatory Data and Analytics’ • A new data strategy? • Micro-prudential data with macro–financial statistics? • Storing and making use of granular datasets? • Can heterogeneous data be harmonised? • Who pays the costs for larger, faster and more accurate data? • Individual privacy vs public transparency? • Prudential Regulation Authority • A new legal subsidiary of the BoE • Supervisory & regulatory responsibilities • Promote the safety & soundness of regulated firms • Contribute to securing protection for policyholders • A requirement to collect, store and process more data
  • 21. Centre for Central Banking Studies – July 2014 • ‘Big Data and Central Banks’ • Diversification of data sources • Legalities of enabling / constraining scope of granular data collections • Development of inductive analytical approaches • Advancement of data analysis capabilities, ML & AI • Open Source tooling • Importance of ‘Big Data’ to Central Banks in the years ahead 21 • Could Big Data….. • Change the way that central banks operate? • Transform how financial firms and other economic agents do business? • Change the economy in ways that impact monetary and financial stability? • Have implications for economic growth and employment? https://www.aboveallimages.co.uk/wp-content/gallery/london/london_07.jpg
  • 22. Bank of England Strategic Review – ‘One Mission, One Bank’ 22 • ‘One Bank Data Architecture’ • Ability to share data across the Bank • Reduce data silos • Reduce the numbers of systems • Improve discoverability • Improve analytical capabilities via shared tooling • Support genuine Big Data use cases • Strategic data themes • Management [Governance & Security] • Collaboration [Sharing of Data] • Standardisation [More robust processing] • Exploitation [Tooling for gaining data insight]
  • 23. Stage 1: The Appliance / Data Hub 1… 23
  • 24. 24 Landing Zone Raw Zone DTCC zip x20 UnaVista zip X12 CME zip x8 ICE zip x9 RegisTR zip x9 RefinedZone ConsumeZone StructuredZone csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP Source file format will change, although change will not affect the ingestion and unzip processes on the Raw Zone Stores historical data of source files in HDFS in its raw uncompressed format Description FTP process to load zip files into Data Hub cluster Keep existing process that moves zip files, provided by the business, in the Landing Zone, into the Raw Zone. Unzip process to extract raw data files Keep existing process that unzips files to its raw format. The unzipped csv file is placed temporarily in a hdfs directory. An external Hive table is created at this directory allowing the csv file to be queried using Hive or SparkSQL. At the end of the process, this file is removed. 1 2 1 2 • Standard ETL process within market best practices for loading and storage of data in its raw format • N/A LimitationsBenefits Low Level Design Raw Zone
  • 25. Structured ZoneRaw Zone csv csv csv csv csv orc orcorc orcorc orcorc orcorc orcorc 3 4 RefinedZone ConsumeZone 3 Spark jobs that insert each source file into individual structured file table Direct data ingestion from source file into a ORC Hive table. Each TR file data is ingested into a different structured ORC table avoiding any mapping at this stage. Having one table per file also adds flexibility to the process, in terms of change requests (changes are limited to the specific table and mapping rules to mapping table if a file added or an existing is altered) and reprocessing workflow (only required to run partition of given file until mapping stage, reducing overall workload). • Allows easy access to the raw data, without any changes to it’s underlying structure or format, with efficient compression for storage and query efficiency • Having individual tables for each file simplifies mapping process and diminishes reprocessing workload • File sizes on tables will be suboptimal, although mitigated by the simplicity of the mapping process and flexibility to schema changes LimitationsBenefits 4 Spark jobs that map each source file schema to a normalized schema for state information Simplify mapping process, on both query complexity and performance axioms, by having individual spark mapping jobs to a normalized state TR schema, both on table structure and on data types. Table name Storage format Partitions Data sorted by Description **_**_****_****_****** ORC year, month, day - One ORC table per TR, file and version that stores data in Hive without columns mapping ********_***** ORC year, month, day, filetype - One ORC table for state TR data to store mapped columns in a normalized schema 25 Converts raw files into ORC and applies data type conversion and mapping rules to store information on a single table Description Low Level Design Structured Zone
  • 26. Structured Zone Raw Zone Refined ZoneStructured Zone orc orc Landing Zone zip EXTRNAL DATA ConsumeZone 5 6 Extracts external data source’s in order to enrich and validate TR data, maintaining historical data for reprocessing purposes Table name Storage format Partitions Data sorted by Description ****_****_***** ORC year, month, day assetclass, counterparty Stores TR data enriched with external data sources and additional columns calculated based on business rules. These columns include the de-duplication rule set. 5 Load external data sources Process that loads, unzips and inserts external data into Hive tables to use on data preparation step. orcorc Raw Zone Landing Zone TRStateDataExternalReferenceData 6 Spark job that applies business rules and enriches source data with external table information Calculate additional business columns and enrich with external reference data. Apply the de-duplication rule set and Contract Continuity specifics Creates materialized views for business consumption that is optimized for system performance Description 26 • Centralized table that aggregates all TR state information on a single point of access • Segregation of concepts by calculating of business logic rules and enrichment of source data with external sources on a separate layer • Late arrival of files require a reprocessing of daily partition • Changes in business transformation requirements require reprocessing of the full table LimitationsBenefits Low Level Design Refined Zone
  • 27. Consume Zone RawZone StructuredZone Refined Zone orc orc **_*****_****_*****_***_**** orc **_*****_****_*****_***_**** Table name Storage format Partitions Data sorted by Use cases **_*****_****_*****_***_**** ORC year, month, day, assetclass otc_or_etd, c1, c2 ***** ***** Contractual Continuity ***** **_*****_****_*****_***_**** ORC year, month, day c1, c2 ***** **_*****_****_***** ORC year, month, assetclass c1, c2 Monthly time series 7 7 Spark job that creates materialized views physically optimized for standard in-house entry points of analysis Replicates data in Refined Zone into the Consume Zone, with optimized technical partitions, to allow fast performance while querying and data exploration based on different use cases of analysis. Process can be easily replicated to accommodate different use cases by creating new partition keys. Creates materialized views for business consumption that is optimized for system performance orc **_*****_****_***** Description • Captures generic entry points of analytical analysis • Optimized to accommodate different analytical workloads based on requirements • Improves query performance due to physical partitioning of data • Duplication of data and onus of assessing the correct materialised view is on the user. This could be mitigated by including a OLAP cube, such as Apache Druid LimitationsBenefits 27 Low Level Design Consume Zone
  • 28. 28 EMIR Trade Repositories framework Landing Zone Structured ZoneRaw Zone Refined Zone Consume Zone Data Governance orc mappings orcorc orc orc orc orc TR DATA zip csvzip TRStateData orcorczip csv Reference data
  • 30. EMIR Project benefits for the wider Data Programme Designed to set the right path for the Data Programme in 4 key aspects, aligned with the One Bank Value: Set the right technical architecture to serve as a standard for BoE Big Data projects Provide the drivers for a more self-service Operating Model Pair programming sessions for on-the-job training and coaching Deliver a Data Quality and plausibility Management solution to be used across the Data Programme Architecture Self-service TOMData Science skills Data Plausibility 30
  • 31. Demonstrate Data Science knowledge can be upskilled Sr. Data Scientist will deliver on-the-job training and coaching to FMID in order to upskill the existing team. From this, we expect users to gain autonomy to develop new data analysis and ad hoc data exploration on existing datasets in Data Hub. 31 1. How will training be delivered to business areas? 2. What skills should be centralised and what should stay in each business team? 3. Upscale current team skillset or expand resources? Questions still open: Training On the job coaching Provide core skills and understand how to use Big Data tools Pair programing and advisory work to provide experience using Big Data tools with R How can Data Science skills be attained?
  • 32. Data Hub 2 32 Automation Dynamic Provisioning Flexibility
  • 34. Data Hub 2 34 • VMWare VxRack HCI offering • EMC’s Isilon storage • 392 cores per site, and circa 10 TB RAM • 320 TB of “usable” storage • Storage: The equivalent to 7500 standard iphone Xs (1.32 tonnes of iphones!!) • Processing: The equivalent “cores” as 84 standard iphone Xs • Memory: The equivalent RAM as 4608 standard iphone Xs (a pile of phones 35 ½ metres high)
  • 36. Lessons Learned 36 People and Processes Technology Governance and MetaData Creativity Tenacity Experience