Ingres VectorWise’s differentiator is to unlock the power of modern chips. This must be the focus. No other relational database vendor does this.
This slide compares IngresVectorWise data processing performance against Oracle and HP Neoview. While especially HP Neoview is not a direct Ingres VectorWise competitor this product is included in the comparison because a number was publicly available (while generally most vendors don’t have any information publicly available). The links to the resources are included on the slide.Make an assertion that other relational databases will be in the same range as Oracle and HP Neoview.
This is an imaginary example. 1 table with (only) 10 columns, and assume every column is 1/10th of the table size. The performance difference gets even more staggering as the table has more columns (and in a data warehouse it is common to see wide tables).Both Oracle and HP Neoview use a row-based architecture (see below for Oracle Exadata). I.e. both Oracle and HP Neoview will have to load all table data in order to process this simple query. Ingres VectorWise only accesses the 2 columns it requires to answer the query so it requires only 2/10th of the data.Per core processing speaks for itself, so if we assume there is no parallelism then the data processing time is dramatically faster with Ingres VectorWise (~38 times faster than Oracle).HP Neoview will not run a query without using parallelism. However, you need to run at about 50x parallelism to get down to 13 seconds. This is going to require a lot more hardware and will introduce complexity. Similarly, Oracle can run in parallel. Or in Oracle you could create a materialized view that happens to contain only the 2 columns you need and only scan 20 rather than 100 GB. However then the query would still take 100 seconds and someone has to tune the database to make this happen.Oracle Exadata implicitly provides the same benefits columnar access provides if you run the query in parallel. I.e. it is certainly possible to get the query in Oracle (or HP Neoview for that matter) down to 13 seconds but it takes a lot more hardware and/or tuning effort. Ingres VectorWise provides this performance out-of-the-box, with no tuning.
This slide shows an example of the traditional database processing at the CPU level – tuple by tuple, versus Ingres VectorWises more efficient vector-based processing: apply the same operation to a set (vector) of data at the time.
Access to data on disks requires millions of CPU cycles, and can be achieved at 40 to 100 MB/s (for spinning disks – Solid State Disks (SSD) will deliver higher throughput). Access to RAM is a lot more efficient than access to disk. It requires a few hundred CPU cycles to access data in RAM, and a throughput of 2-3 GB/s can be achieved when data is read out of RAM. Access to chip cache however is by far the most efficient was to get access to data. It takes a handful of CPU cycles to access the data and throughput of up to 10 GB/s can be achieved in the disk cache.Ingres VectorWise pre-fetches compressed data from disk to load it into RAM (compressed) and uses the chip cache as the only true processing memory. As a result data processing with Ingres VectorWise is a lot more efficient than other relational database technologies.
Ingres VectorWise uses column-based storage. Analytic queries rarely access all columns of the tables accessed in the query. Column-based storage ensures that only relevant data is accessed.Ingres VectorWise features an innovative approach to incremental DML called Positional Delta Trees (PDTs). The PDTs enable efficient updates to the column-based store. Traditionally incremental DML has been an Achilles heel for column-based stores.
Ingres VectorWise automatically compresses all data that is stored in the column store. The compression algorithm varies per column depending on the data type, but is automatically chosen by Ingres VectorWise. The algorithms Ingres VectorWise uses are optimized for high speed decompression in order to support the high throughput requirements. Decompression is vectorised just like other functions that operate on the data.In order to obtain maximum throughput Ingres VectorWise pre-fetches compressed data blocks from disks, loads them compressed into memory (into the so-called Column Buffer Manager) and only decompresses the data when it is ready to be processed. As mentioned earlier the chip cache is used as the only true random access memory delivering optimum throughput.
As data is loaded into Ingres VectorWise the database automatically creates and maintains a storage index. The storage index is very small relative to the table size and stores minimum and maximum information for data blocks. Based on the information in the storage index the database can very quickly identify candidate data blocks. This is another way Ingres VectorWise minimizes IO necessary to answer queries.