SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Architecture Overview
Architecture Overview
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
• The architecture consists of 5 basic
components, a HTML5 Client and a file
backend
• Each instance of a component auto-
registers in the metadata master
• Every component defined here
• Is horizontally scalable
• Has load balancing
• And has failover capabilities
• All external communication goes
through the fully REST-ful api, where
each request is checked against a role-
based security system
• Next to the restful interface, it can also
deliver and retrieve results and data
through indirect methods (mail, sftp)
1
2
4
B
3
5
Web ClientA
1) Web Server
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The web server receives all requests,
checks them against the security model
and metadata, after which it sets out the
actions in the queuing system
• The setup of the security model,
metadata (including data descriptions
there) and the entire API (calls and
actions) are proprietary code
• Dependencies:
• Nginx, for the scalable http server
• uWSGI, for running python code
behind nginx
• Flask, a web framework for
handling sockets and sessions
1
2
4
B
A
3
5
2) Routing & Queuing
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The queue server receives all action
requests from the API, finds where it
can execute them and load balances
requests over these resources
• We have created the queues and auto-
registering setup to create the generic
framework functionality and to ensure
load balancing and fail over capabilities
• Dependencies:
• Celery, for the Python library
• RabbitMQ, the distribution broker
• Redis, for exchanging results
between the processes
1
2
4
B
A
3
5
3) Metadata
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The metadata server contains all
general data on users, databases and
security, as well the metadata on
available data for users (measures,
dimensions, tables and how these all
related to each other)
• Dependencies:
• MongoDB, for containing the
metadata
1
2
4
B
A
3
5
4) Dynamic Query Engine
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The dynamic query engine server
contains a number of data files (which it
automatically downloads and
synchronizes from the backend) and
can analyze and aggregate
• It can also auto-join tables on
commonalities, perform a wide range of
calculations and do several distributed
analytics operations on row-level
• Dependencies:
• Bcolz, for containing the data files
in a compressed, columnar format
• Pandas, for higher end operations
for the result data set (joins, sorts,
etc.)
1
2
4
B
A
3
5
5) Processing & Analytics
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The processing & analytics server
handles (asynchronous) calls to
perform file loading, exporting and
analytics calls
• This includes the creation and execution
of machine learning and statistical
models
• It also handles the conversion of raw
data files into the binary files and
updating relevant metadata
• Dependencies:
• Scikit-learn for machine learning
• Statsmodel for statistical models
• Pandas, for data manipulation
• Bcolz, for converting the data files
into a compressed, columnar
format
1
2
4
B
A
3
5
A) Web Client
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The web client is a full, web-based
HTML5 client that gives access to all
• Reporting
• Analytics
• File import
• User and Security Mgmt
• Server Mgmt
• The files are server by the webserver as
a static, with all calls go through the
standard API
• Dependencies:
• Jquery, for cross-browser javascript
simplification and ui
• Bootstrap, for layout
• D3.js, a library for visualizations
1
2
4
B
A
3
5
B) File Backend
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The file backend contains all raw files
and the processed (compressed,
columnar) files
• DQE instances automatically retrieve
their assigned files from the backend
when a file has been updated.
• Dependencies:
• AWS S3 for saving files
1
2
4
B
A
3
5
Architecture Comparison
Area Hadoop Cassandra Best In Class visualfabriq Difference
Data Non-structured & structured Structured, wide-column Teradata (structured, columnar) Structured, columnar,
compressed
Optimized for numerical data (means: no text analytics etc.)
Architecture Rack-aware, daemon based
Cluster
Peer-to-peer cluster Horizontally scaling, container-
based microservices
communicating through
rabbitmq queues
Easier to monitor & scale
Setup Complex Complex Up & running in one minute Much, much easier to setup and rollout
Cluster
Maintenance
Node creation and assignment
usually through commercial
cluster mgmt software
Peer-to-peer network; auto-
configures
Self-registering nodes that can be
assigned specific tasks and data in
a web interface
ETL Flume, Sqoop Bulk Loader Informatica, Talend Web based, drag & drop with
wizards
Web based, easy to use
Language Map/Reduce; add-ons for sql (pig,
hive, impala, etc.)
CQL SQL MOLAP-like; sql interface to be
build
SQL is the standard, but because of the built-in reporting
and analytics this is not something users will need
Compression No No MongoDb/WiredTiger Blosc-based Saves on average 20x in disk space while speeding up reads
Performance Slow, batch based; Spark can add
in-memory capability (speeds up
100x)
High, in-memory options High, disk-based with
compression delivering in 2-3x
range of in-memory
Out-of-the-box near in-memory performance with file-
based scaling; with advances of CPU speed, this might even
surpass traditional in-memory performance
Interface Restful API Restful API Restful API Restful API
Reporting Only in external tools (that
connect to sql-connector)
Only in external tools (that
connect to 3rd party connectors)
Tableau (HTML5, interactive,
beautiful)
Built-in HTML5, interactive,
extensible (d3.js based)
Only solution with out-of-the-box reporting with an easy-
to-use, modern web-based interface
Analytics Distributed map/reduce analytics
through Mahout
Only as optional, paid-for module SAS, SPSS Built-in HTML5, interactive
environment that incorporates
leading OS machine learning (sci-
kit learn), statistics (statsmodel)
and propietary (POS-analytics)
functionality; nb: the analytics
load is not fully distributed yet
Only solution with out-of-the-box analytics with an easy-to-
use, modern web-based interface
Security Kerberos-based security Data object security General, role-based security One point to manage all security from data access to
functionality (reporting, accessibility, etc.)
Open source Core is open source; several
performance acceleration &
mgmt tools are paid
Core is open source; analytics,
backup and other options are
paid
Core is open source; large cluster
mgmt tools and vertical-specific
analytics options are paid
Language Java Java Python (and Cython & C)

Weitere ähnliche Inhalte

Was ist angesagt?

How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCHow to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCAbdallah Mahmoud
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEPBIOVIA
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewNorberto Leite
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A GlanceStefan Christoph
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in RetailHari Shreedharan
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architectureRiccardo Perico
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho KettleDan Moore
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 

Was ist angesagt? (20)

Vip2p
Vip2pVip2p
Vip2p
 
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCHow to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature Preview
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A Glance
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
NoSql
NoSqlNoSql
NoSql
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecture
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho Kettle
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
AzureDocumentDB
AzureDocumentDBAzureDocumentDB
AzureDocumentDB
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
NATE-Central-Log
NATE-Central-LogNATE-Central-Log
NATE-Central-Log
 

Ähnlich wie Bquery Reporting & Analytics Architecture

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Microsoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxMicrosoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxsaadatali65
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Building Real World Application with Azure
Building Real World Application with AzureBuilding Real World Application with Azure
Building Real World Application with Azuredivyapisces
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationGennady Baranov
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxDay 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxAhsanFazalQureshi1
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 

Ähnlich wie Bquery Reporting & Analytics Architecture (20)

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Microsoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxMicrosoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptx
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Building Real World Application with Azure
Building Real World Application with AzureBuilding Real World Application with Azure
Building Real World Application with Azure
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxDay 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 

Kürzlich hochgeladen

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Kürzlich hochgeladen (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Bquery Reporting & Analytics Architecture

  • 2. Architecture Overview Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend • The architecture consists of 5 basic components, a HTML5 Client and a file backend • Each instance of a component auto- registers in the metadata master • Every component defined here • Is horizontally scalable • Has load balancing • And has failover capabilities • All external communication goes through the fully REST-ful api, where each request is checked against a role- based security system • Next to the restful interface, it can also deliver and retrieve results and data through indirect methods (mail, sftp) 1 2 4 B 3 5 Web ClientA
  • 3. 1) Web Server Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The web server receives all requests, checks them against the security model and metadata, after which it sets out the actions in the queuing system • The setup of the security model, metadata (including data descriptions there) and the entire API (calls and actions) are proprietary code • Dependencies: • Nginx, for the scalable http server • uWSGI, for running python code behind nginx • Flask, a web framework for handling sockets and sessions 1 2 4 B A 3 5
  • 4. 2) Routing & Queuing Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The queue server receives all action requests from the API, finds where it can execute them and load balances requests over these resources • We have created the queues and auto- registering setup to create the generic framework functionality and to ensure load balancing and fail over capabilities • Dependencies: • Celery, for the Python library • RabbitMQ, the distribution broker • Redis, for exchanging results between the processes 1 2 4 B A 3 5
  • 5. 3) Metadata Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The metadata server contains all general data on users, databases and security, as well the metadata on available data for users (measures, dimensions, tables and how these all related to each other) • Dependencies: • MongoDB, for containing the metadata 1 2 4 B A 3 5
  • 6. 4) Dynamic Query Engine Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The dynamic query engine server contains a number of data files (which it automatically downloads and synchronizes from the backend) and can analyze and aggregate • It can also auto-join tables on commonalities, perform a wide range of calculations and do several distributed analytics operations on row-level • Dependencies: • Bcolz, for containing the data files in a compressed, columnar format • Pandas, for higher end operations for the result data set (joins, sorts, etc.) 1 2 4 B A 3 5
  • 7. 5) Processing & Analytics Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The processing & analytics server handles (asynchronous) calls to perform file loading, exporting and analytics calls • This includes the creation and execution of machine learning and statistical models • It also handles the conversion of raw data files into the binary files and updating relevant metadata • Dependencies: • Scikit-learn for machine learning • Statsmodel for statistical models • Pandas, for data manipulation • Bcolz, for converting the data files into a compressed, columnar format 1 2 4 B A 3 5
  • 8. A) Web Client Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The web client is a full, web-based HTML5 client that gives access to all • Reporting • Analytics • File import • User and Security Mgmt • Server Mgmt • The files are server by the webserver as a static, with all calls go through the standard API • Dependencies: • Jquery, for cross-browser javascript simplification and ui • Bootstrap, for layout • D3.js, a library for visualizations 1 2 4 B A 3 5
  • 9. B) File Backend Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The file backend contains all raw files and the processed (compressed, columnar) files • DQE instances automatically retrieve their assigned files from the backend when a file has been updated. • Dependencies: • AWS S3 for saving files 1 2 4 B A 3 5
  • 10. Architecture Comparison Area Hadoop Cassandra Best In Class visualfabriq Difference Data Non-structured & structured Structured, wide-column Teradata (structured, columnar) Structured, columnar, compressed Optimized for numerical data (means: no text analytics etc.) Architecture Rack-aware, daemon based Cluster Peer-to-peer cluster Horizontally scaling, container- based microservices communicating through rabbitmq queues Easier to monitor & scale Setup Complex Complex Up & running in one minute Much, much easier to setup and rollout Cluster Maintenance Node creation and assignment usually through commercial cluster mgmt software Peer-to-peer network; auto- configures Self-registering nodes that can be assigned specific tasks and data in a web interface ETL Flume, Sqoop Bulk Loader Informatica, Talend Web based, drag & drop with wizards Web based, easy to use Language Map/Reduce; add-ons for sql (pig, hive, impala, etc.) CQL SQL MOLAP-like; sql interface to be build SQL is the standard, but because of the built-in reporting and analytics this is not something users will need Compression No No MongoDb/WiredTiger Blosc-based Saves on average 20x in disk space while speeding up reads Performance Slow, batch based; Spark can add in-memory capability (speeds up 100x) High, in-memory options High, disk-based with compression delivering in 2-3x range of in-memory Out-of-the-box near in-memory performance with file- based scaling; with advances of CPU speed, this might even surpass traditional in-memory performance Interface Restful API Restful API Restful API Restful API Reporting Only in external tools (that connect to sql-connector) Only in external tools (that connect to 3rd party connectors) Tableau (HTML5, interactive, beautiful) Built-in HTML5, interactive, extensible (d3.js based) Only solution with out-of-the-box reporting with an easy- to-use, modern web-based interface Analytics Distributed map/reduce analytics through Mahout Only as optional, paid-for module SAS, SPSS Built-in HTML5, interactive environment that incorporates leading OS machine learning (sci- kit learn), statistics (statsmodel) and propietary (POS-analytics) functionality; nb: the analytics load is not fully distributed yet Only solution with out-of-the-box analytics with an easy-to- use, modern web-based interface Security Kerberos-based security Data object security General, role-based security One point to manage all security from data access to functionality (reporting, accessibility, etc.) Open source Core is open source; several performance acceleration & mgmt tools are paid Core is open source; analytics, backup and other options are paid Core is open source; large cluster mgmt tools and vertical-specific analytics options are paid Language Java Java Python (and Cython & C)