SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
A short introduction to Vertica 
Tommi Siivola, Software Engineer 
RedHat Software Developer Meetup 10.09.2014
- Quick orientation 
- Columns 
- Projections 
- Clustering 
- Hybrid storage 
- Special features 
AGENDA
Quick orientation to Vertica 
- Big data database product from HP 
- For handling terabytes/petabytes of data 
- Column-oriented
Quick orientation to Vertica 
- What does that mean in practice? 
– Vertica is a relational database 
– Supports a subset of ANSI SQL-99 standard 
– JDBC/ODBC drivers 
– A command line client (vsql)
Quick orientation to Vertica 
- Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) 
- Amazon AMI available for running in Vertica in the cloud 
- Up to 1 TB of data and a cluster of 3 nodes without license 
(so called ”Community Edition” mode) 
- Larger setups require a license from HP
Concepts: column-oriented 
- Vertica stores data as columns, instead of each row as unit 
– Allows for efficient data compression 
– Can skip unwanted columns when querying 
– More efficient aggregate value calculations
Concepts: column-oriented 
ROWS VS. COLUMNS 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: column-oriented 
RUN LENGTH ENCODING 
2014-03-15 23.43 3 
(5 times) 23.97 4 
24.51 7 
25.05 6 
25.59 7 
2014-03-16 26.13 7 
(5 times) 26.67 4 
27.21 2 
27.75 3 
28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: column-oriented 
SKIP UNWANTED COLUMNS date value id 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7 
SELECT value, id FROM table
Concepts: projections 
- Data physically stored in projections 
- Projections similar to materialized views 
– Data optimized for querying during insert 
- Table has one or more projections 
- Projection contains one or more columns 
- Data can be duplicated in projections for query efficiency
Concepts: projections 
ONE DATA, MANY PROJECTIONS 
Sorted by date Sorted by id 
2014-03-16 27.21 2 
2014-03-15 23.43 3 
2014-03-16 27.75 3 
2014-03-15 23.97 4 
2014-03-16 26.67 4 
2014-03-15 25.05 6 
2014-03-15 24.51 7 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: clustering 
- Parallel processing 
– Data segments distributed across cluster nodes 
– Performance can be increased by adding hardware 
- Reliability (K-safety) 
– Tolerates nodes going offline 
- All nodes can respond to queries → queries can be load 
balanced between nodes
Concepts: clustering 
SEGMENTATION 
Node 1 
SEGMENT1 
Node 2 
SEGMENT2 
Node 3 
SEGMENT3 
Node 4 
SEGMENT4
Concepts: clustering 
K-SAFETY 
Node 1 
SEGMENT1 
SEGMENT2 
Node 2 
SEGMENT2 
SEGMENT3 
Node 3 
SEGMENT3 
SEGMENT4 
Node 4 
SEGMENT4 
SEGMENT1
Concepts: Hybrid storage 
- Read-optimized storage (ROS) 
– On disk 
– Heavily encoded & compressed 
- Write-optimized storage (WOS) 
– In memory 
– No encoding or compression
Concepts: Hybrid storage 
- Inserted data is first aggregated in WOS 
– Inserting to WOS is faster, due to lack of compression 
and disk write overheads 
- Background job moves data in batches from WOS to ROS 
– Writing to ROS is more efficient in batches 
– Querying is more efficient from ROS
Vertica feature: Pattern matching 
- Example: Finding sequences in 
web site log data 
- Find all sequences where user 
enters the site, browses and 
finally makes a purchase 
- Difficult to express in SQL 
- Vertica has SQL extension for 
finding patterns 
user action 
1 enter 
1 browse 
1 browse 
1 purchase 
2 enter 
2 browse 
3 enter 
3 browse 
3 purchase 
PATTERNS IN DATA
Vertica feature: Pattern matching 
- Example: find sequences where user enters a site, browses 
and makes a purchase 
SELECT uid,sid,ts,refurl,pageurl,action, 
event_name(),pattern_id(),match_id() 
FROM clickstream_log 
MATCH 
(PARTITION BY uid, sid ORDER BY ts 
DEFINE 
Entry AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%', 
Onsite AS pageurl ILIKE '%site.com%' AND action = 'V', 
Purchase AS pageurl ILIKE '%site.com%' AND action = 'P' 
PATTERN 
P AS (Entry Onsite* Purchase) 
ROWS MATCH FIRST EVENT);
Extending Vertica 
- Custom SQL functions can be created with R, Java or C++ 
- R can be used for creating scalar and transform functions 
- Java, all of the above + load functions 
- C++, all of the above + aggregate and analytic functions
Find out more 
- Vertica free downloads available at (requires registration) 
– my.vertica.com 
- Vertica documentation available at (no registration) 
– www.vertica.com/documentation 
- C-Store research project (Vertica predecessor) 
– db.csail.mit.edu/projects/cstore/
THANKS! 
Tommi Siivola, Software Engineer 
tommi.siivola@eficode.com 
+358 (0)50 371 9308 
eficode.fi 
”Automatisoi tai 
näivety” ja muita 
kirjoituksia 
Eficoden blogissa. 
EFICODE.FI/BLOGI

Weitere ähnliche Inhalte

Was ist angesagt?

MariaDB Optimization
MariaDB OptimizationMariaDB Optimization
MariaDB OptimizationJongJin Lee
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxNeoClova
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바NeoClova
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupTyler Wishnoff
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PGPgDay.Seoul
 
Oracle SQL Developer Tips and Tricks: Data Edition
Oracle SQL Developer Tips and Tricks: Data EditionOracle SQL Developer Tips and Tricks: Data Edition
Oracle SQL Developer Tips and Tricks: Data EditionJeff Smith
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101Sparkhound Inc.
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsShubham Tagra
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBMydbops
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 MinutesSveta Smirnova
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsMariaDB plc
 

Was ist angesagt? (20)

MariaDB Optimization
MariaDB OptimizationMariaDB Optimization
MariaDB Optimization
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Presto
PrestoPresto
Presto
 
Oracle SQL Developer Tips and Tricks: Data Edition
Oracle SQL Developer Tips and Tricks: Data EditionOracle SQL Developer Tips and Tricks: Data Edition
Oracle SQL Developer Tips and Tricks: Data Edition
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
 

Ähnlich wie A short introduction to Vertica

Zero to scaleable in ten minutes
Zero to scaleable in ten minutesZero to scaleable in ten minutes
Zero to scaleable in ten minutesMatt Walters
 
Make your first CloudStack Cloud successful
Make your first CloudStack Cloud successfulMake your first CloudStack Cloud successful
Make your first CloudStack Cloud successfulTim Mackey
 
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...MUG-Lyon Microsoft User Group
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Clustrix
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageAvere Systems
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2zhang hua
 
Azure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and BackAzure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and Backssuser6c6f84
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019Cédrick Lunven
 
Ucs invicta & application performance
Ucs invicta & application performanceUcs invicta & application performance
Ucs invicta & application performancesolarisyougood
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackChiradeep Vittal
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudICT-Partners
 

Ähnlich wie A short introduction to Vertica (20)

Zero to scaleable in ten minutes
Zero to scaleable in ten minutesZero to scaleable in ten minutes
Zero to scaleable in ten minutes
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Make your first CloudStack Cloud successful
Make your first CloudStack Cloud successfulMake your first CloudStack Cloud successful
Make your first CloudStack Cloud successful
 
Presentation
PresentationPresentation
Presentation
 
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Azure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and BackAzure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and Back
 
Kissy mvc
Kissy mvcKissy mvc
Kissy mvc
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
 
Ucs invicta & application performance
Ucs invicta & application performanceUcs invicta & application performance
Ucs invicta & application performance
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloud
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

A short introduction to Vertica

  • 1. A short introduction to Vertica Tommi Siivola, Software Engineer RedHat Software Developer Meetup 10.09.2014
  • 2. - Quick orientation - Columns - Projections - Clustering - Hybrid storage - Special features AGENDA
  • 3. Quick orientation to Vertica - Big data database product from HP - For handling terabytes/petabytes of data - Column-oriented
  • 4. Quick orientation to Vertica - What does that mean in practice? – Vertica is a relational database – Supports a subset of ANSI SQL-99 standard – JDBC/ODBC drivers – A command line client (vsql)
  • 5. Quick orientation to Vertica - Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) - Amazon AMI available for running in Vertica in the cloud - Up to 1 TB of data and a cluster of 3 nodes without license (so called ”Community Edition” mode) - Larger setups require a license from HP
  • 6. Concepts: column-oriented - Vertica stores data as columns, instead of each row as unit – Allows for efficient data compression – Can skip unwanted columns when querying – More efficient aggregate value calculations
  • 7. Concepts: column-oriented ROWS VS. COLUMNS 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 8. Concepts: column-oriented RUN LENGTH ENCODING 2014-03-15 23.43 3 (5 times) 23.97 4 24.51 7 25.05 6 25.59 7 2014-03-16 26.13 7 (5 times) 26.67 4 27.21 2 27.75 3 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 9. Concepts: column-oriented SKIP UNWANTED COLUMNS date value id 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 SELECT value, id FROM table
  • 10. Concepts: projections - Data physically stored in projections - Projections similar to materialized views – Data optimized for querying during insert - Table has one or more projections - Projection contains one or more columns - Data can be duplicated in projections for query efficiency
  • 11. Concepts: projections ONE DATA, MANY PROJECTIONS Sorted by date Sorted by id 2014-03-16 27.21 2 2014-03-15 23.43 3 2014-03-16 27.75 3 2014-03-15 23.97 4 2014-03-16 26.67 4 2014-03-15 25.05 6 2014-03-15 24.51 7 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 12. Concepts: clustering - Parallel processing – Data segments distributed across cluster nodes – Performance can be increased by adding hardware - Reliability (K-safety) – Tolerates nodes going offline - All nodes can respond to queries → queries can be load balanced between nodes
  • 13. Concepts: clustering SEGMENTATION Node 1 SEGMENT1 Node 2 SEGMENT2 Node 3 SEGMENT3 Node 4 SEGMENT4
  • 14. Concepts: clustering K-SAFETY Node 1 SEGMENT1 SEGMENT2 Node 2 SEGMENT2 SEGMENT3 Node 3 SEGMENT3 SEGMENT4 Node 4 SEGMENT4 SEGMENT1
  • 15. Concepts: Hybrid storage - Read-optimized storage (ROS) – On disk – Heavily encoded & compressed - Write-optimized storage (WOS) – In memory – No encoding or compression
  • 16. Concepts: Hybrid storage - Inserted data is first aggregated in WOS – Inserting to WOS is faster, due to lack of compression and disk write overheads - Background job moves data in batches from WOS to ROS – Writing to ROS is more efficient in batches – Querying is more efficient from ROS
  • 17. Vertica feature: Pattern matching - Example: Finding sequences in web site log data - Find all sequences where user enters the site, browses and finally makes a purchase - Difficult to express in SQL - Vertica has SQL extension for finding patterns user action 1 enter 1 browse 1 browse 1 purchase 2 enter 2 browse 3 enter 3 browse 3 purchase PATTERNS IN DATA
  • 18. Vertica feature: Pattern matching - Example: find sequences where user enters a site, browses and makes a purchase SELECT uid,sid,ts,refurl,pageurl,action, event_name(),pattern_id(),match_id() FROM clickstream_log MATCH (PARTITION BY uid, sid ORDER BY ts DEFINE Entry AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%', Onsite AS pageurl ILIKE '%site.com%' AND action = 'V', Purchase AS pageurl ILIKE '%site.com%' AND action = 'P' PATTERN P AS (Entry Onsite* Purchase) ROWS MATCH FIRST EVENT);
  • 19. Extending Vertica - Custom SQL functions can be created with R, Java or C++ - R can be used for creating scalar and transform functions - Java, all of the above + load functions - C++, all of the above + aggregate and analytic functions
  • 20. Find out more - Vertica free downloads available at (requires registration) – my.vertica.com - Vertica documentation available at (no registration) – www.vertica.com/documentation - C-Store research project (Vertica predecessor) – db.csail.mit.edu/projects/cstore/
  • 21. THANKS! Tommi Siivola, Software Engineer tommi.siivola@eficode.com +358 (0)50 371 9308 eficode.fi ”Automatisoi tai näivety” ja muita kirjoituksia Eficoden blogissa. EFICODE.FI/BLOGI