Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Concepts of NonStop SQL/MX: Part 4 - Storage.
1. Technical white paper
Concepts of NonStop SQL/MX
Part 4. Introduction to SQL/MX Storage
Table of Contents
Introduction ............................................................................................................................................................. 2
Intended audience .............................................................................................................................................. 2
Examples............................................................................................................................................................. 2
Database storage .................................................................................................................................................... 2
Volume managers ............................................................................................................................................... 2
Introduction to SQL/MX storage.............................................................................................................................. 2
Oracle tablespaces compared to NonStop volumes........................................................................................... 2
NonStop SQL subvolumes .............................................................................................................................. 4
Similar but not equal ...................................................................................................................................... 5
Database blocks ............................................................................................................................................. 5
Extents are the allocation units ..................................................................................................................... 5
Creating tables and indexes................................................................................................................................ 5
Table and index partitions .................................................................................................................................. 6
Tables ............................................................................................................................................................. 6
Indexes ........................................................................................................................................................... 9
Conclusion ............................................................................................................................................................. 11
References............................................................................................................................................................. 11
For more information ............................................................................................................................................ 11
Other articles in this series ............................................................................................................................... 11
Other interesting reads ..................................................................................................................................... 11
2. Introduction
When customers migrate Oracle applications to NonStop SQL/MX, DBAs with an Oracle background may feel a bit lost, since
both products have their specific solutions to common practices such as, for example, disk storage and user access. This
article is one in a series, inspired by the Oracle Database: Concepts 11g Release 2 manual, and tries to explain some of those
differences in implementation. Other articles will address other differences.
Intended audience
This article is written for DBAs and developers who know Oracle and want to learn about NonStop SQL/MX. It may also be
useful for people who know NonStop, and would like to know about similarities and differences between the two products.
Examples
In this article, the Oracle examples use the SQL*Plus command interface, and the NonStop SQL/MX examples are made using
mxci equivalent. The SQL*Plus commands are indicated by the “SQL>” prompt; those in mxci are indicated by “>>”.
Database storage
In Oracle there are different ways to manage database storage. It has its own file system, called Oracle Automatic Storage
Management (ASM), which can be used to manage database files in a single machine or in a cluster environment. As an
alternative to using ASM, which is an extra layer of software, customers have used the Operating System (OS) file system to
store the data on a single system or a clustered file system to do the same in a cluster. “Raw partitions” are used to bypass
the operating system buffers to provide Oracle full access to the disk.
Volume managers
The software that manages mass-storage devices is called a Logical Volume Manager (LVM) because it allows a system
administrator to view all physical hard disk drives (HDD) or HDD partitions as a single storage source. Oracle ASM is a volume
manager for Oracle databases. Others LVM implementations can be found in HP-UX, Linux, IBM AIX, and the VERITAS file
system. On NonStop servers, the volume manager is integrated into the OS and the term “volume” refers to a logical disk. A
logical disk is a physical disk (RAID 0) or a mirrored pair (RAID 1). Starting with OS releases J06.12 and H06.23 physical disks
can be partitioned into multiple logical volumes. When a Storage Area Network (SAN) is connected to a NonStop server, a
volume is mapped to a Logical Unit which can be configured with the SAN as a RAID-n volume. A NonStop volume is
managed by a fault-tolerant process pair, the Disk Access Manager (DAM) also known as DP2. All access to data on a volume
is performed by the DAM; it contains part of the SQL execution engine, it is responsible for the caching of data for the
volume and for maintaining row-locks.
Introduction to SQL/MX storage
This paper may well have been called: “Where are the tablespaces?”, because, unlike Oracle, SQL/MX does not use
tablespaces for managing storage. An Oracle tablespace is a logical grouping of one or more data files. When Oracle tables
and indexes are created, they are assigned to a tablespace and physically placed inside the files that make-up the
tablespace. Conversely, in NonStop SQL/MX, tables and indexes are directly associated with files. The NonStop DBA uses a
“LOCATION” clause to assign a database object such as a table, an index, or a partition to a disk volume, and optionally to a
physical file. Both products isolate logical data structures, such as tables and indexes from the physical placement of the
data on disk. That way, changes in the physical structures, such as accommodation of growth, can be supported without
changing the business applications. The next sections describe the differences in more detail.
Oracle tablespaces compared to NonStop volumes
Tables in Oracle are created within tablespaces. A tablespace is a collection of one or more OS files. A tablespace is also a
container of segments; and segments are a logical way of defining Oracle objects that contain data. Different types of
segments exist including—tables, indexes, and undo and temporary segments. To facilitate database management, the
undo and temporary segments are usually placed in separate tablespaces: a typical Oracle database contains the SYSTEM
and SYSAUX tablespaces, one or more user tablespaces, and the UNDO and TEMP tablespaces. Figure 1 shows the
relationship between tablespace, OS files, and database objects. The figure shows that a table, T02, can be split over
multiple OS files, if the tablespace also spans multiple files.
2
3. Figure 1. Oracle tablespace containing two data files and three tables
NonStop SQL does not use tablespaces and segments to contain database objects. Instead it refers to volumes, and these
map to physical, optionally mirrored, disks. A volume is a container of files, and in NonStop SQL, files are used to store table
and index data. The equivalents to Oracle UNDO tablespaces are the AUDIT volumes that are dedicated to the AUDITTRAIL
data, which is used for file and volume recovery. These AUDIT volumes contain recovery data not only for SQL/MX databases
but also for SQL/MP and Enscribe files. Data recovery on a NonStop system is based on the data volume, independent of the
databases that are protected by the Transaction Monitoring Facility (TMF). TMF is the system wide, integrated, transaction
monitor, and resource manager that protects the data integrity of all audited files, regardless of the database product.
NonStop SQL does not require dedicated temporary space; by default, unallocated space on any volume can be used for
temporary scratch space. However, the system manager can dedicate certain volumes to, or explicitly exclude volumes from
scratch space usage.
Every NonStop node has a SYSTEM volume, called $SYSTEM, that is used by the Operating System for all users of the
system. By common practice, no user data is stored on this volume. Every Oracle database has a SYSTEM tablespace, which
contains metadata that is used by all users of the database. It is possible, but not common practice to store user data in
this tablespace.
3
4. Figure 2 shows the same three tables placed on volumes on a NonStop server. A table (T02) can span multiple volumes.
Note that a volume may also contain objects that belong to another database; it can also contain non-database files.
Figure 2. Volumes containing three NonStop SQL tables
NonStop SQL subvolumes
Part 2 of this series—“Introduction to Catalogs and other Objects”—provides additional information about table locations,
such as the role of a subvolume. Subvolumes are logical subdivisions of a volume, used to group database objects by
schema, similar to UNIX® subdirectories or Microsoft® Windows® folders. When a schema is created, the name of the
subvolume can be defined by the DBA, or the system will assign one. Subvolumes allow the system manager to organize all
the data objects of a schema regardless of the volume on which they are placed. Figure 3 shows how a test schema and a
production schema can share volumes based on explicitly named subvolumes.
Figure 3. Subvolume grouping of NonStop SQL tables
If the subvolume names are explicitly specified with meaningful names when a schema is created a system administrator
will be able to identify all data objects by their OS subvolume name without having to query database metadata. Note
however, that these meaningful names must start with “ZSD”, which identifies the subvolume as SQL/MX specific to the OS.
The example uses subvolume “ZSDTEST1” for a test schema and “ZSDPROD1” for a production schema. In this example, all
4
5. data objects for the test schema can be located using the following shell command. Note the use of the capital G to direct
the shell to the file system that is used for database files. 1
/bin/ls /G/*/zsdtest1/*
Similar but not equal
Reading the above, one might think that NonStop volumes correspond closely to Oracle tablespaces. However, there are
significant differences.
NonStop volumes are similar to tablespaces, because Oracle tables and indexes are located in one or more tablespaces just
like NonStop tables and indexes are placed on one or more volumes.
NonStop volumes are different, because one volume can hold data from multiple databases, even owned by different users.
For example, the SALES database can have a table created on a volume named $DATA01 and this volume can also hold
tables from the INVENTORY database. Volume names are unique for a NonStop node; Oracle tablespaces are unique
per database. For example, every Oracle database has a SYSTEM tablespace, but tablespaces cannot be shared
between databases.
Each NonStop node has its own volume names that are unique to the node. For example, a node called EAST can have
volumes called $SYSTEM and $DATA01 and another node, called WEST can use these same volume names: $SYSTEM and
$DATA01. When databases outgrow a single node, the volume names remain unique, because the NonStop OS (and NonStop
SQL) automatically prefix volume names with the node name in a distributed environment. This way, a table called
INVENTORY can be located (in fact, partitioned) on EAST.$DATA01 and WEST.$DATA01, transparently to the application.
A major difference between Oracle tablespaces and NonStop SQL volumes is how they are managed by the system. NonStop
SQL is a shared nothing architecture: a server consists of independently operating processors, each with its own memory
and copy of the OS and each with its own storage devices, the volumes. A NonStop server is in fact a cluster providing a
single-system image to the user. Every volume manager, called the Disk Access Manager (DAM), owns all data of that
volume: the locks, the cache or database buffers: nothing is shared with other DAMs.
In contrast, Oracle is a shared everything architecture: multiple processors share a single memory. There is only one copy of
the OS and it schedules the threads and processes to run, and they share the memory. Oracle maintains one single System
Global Area (SGA), and it contains the database buffers and the locks for all of the data that belongs to a database. When
Oracle uses multiple nodes in a RAC cluster, the data is located on a storage device that is shared by all the nodes in the
cluster. The tablespaces on each node map to the same data. There will be multiple SGAs, one on each node, but because
each node has an SGA that maps to the data on its node, cached data for a table can be in memory on different nodes and
Oracle must maintain data consistency across the nodes when this data is updated.
Database blocks
Before Oracle9i, all blocks in a database had to be of the same size. Beginning with this release, multiple block sizes are
possible within a database; however, all blocks in a tablespace must be of equal size. NonStop SQL/MX allows tables and
indexes to have blocks of either 4 KB or 32 KB. All partitions of a table or index must have the same block size, but the size
of the cache is configurable at the volume or DAM level, as needed, without application interruption. It is not uncommon to
define a table with small rows to have a block size of 4 KB while its index has a block size of 32K.
Extents are the allocation units
Both systems use extents as the basic, contiguous allocation unit. An Oracle segment or a NonStop SQL table or index
consists of an INITIAL or PRIMARY extent and optionally multiple secondary extents. Initially, a contiguous amount of space,
the INITIAL (Oracle) or PRIMARY (NonStop SQL) extent, is allocated on the disk. When this first extent is full, more extents
can be allocated as new rows are inserted, but these extents may not reside next to the other extents on the disk. Having
many extents may therefore cause slowdown of data access; especially in the case of full-table scans. Oracle and NonStop
SQL use the value of MAXEXTENTS to define a maximum size for a data object. However, the default value in Oracle is
“unlimited”; an SQL/MX DBA must pay attention to setting the values for the extent sizes as well as MAXEXTENTS to prevent
files from becoming full.
Creating tables and indexes
The paper “Introduction to catalogs and other objects” introduces NonStop SQL tables and indexes. This section expands
that discussion to aspects of data storage.
When a DBA creates tables and indexes, he or she must also decide where to place them; there must be space to store the
data, and access to this data should not cause bottlenecks in the system.
1
The /G filesystem is mapped to the traditional Guardian operating environment. All TMF protected files (NonStop SQL/MP, NonStop SQL/MX, and Enscribe) are
located in this filesystem; as are other objects such as control files and source code, Guardian scripts, and program files.
5
6. In Oracle, the DBA must first create tablespaces and map them onto OS files in the file system. In SQL/MX, when tables and
indexes are created, they are mapped directly to OS files.
An Oracle DBA needs to decide in which tablespace(s) the table or index will be created, and a NonStop DBA needs to decide
on which volume(s) a table or index will be created. Both DBMSs will use a default location when nothing is specified at
creation time—Oracle uses the user’s default tablespace 2; NonStop SQL will create the table on the user’s default volume. 3
Table and index partitions
Database management systems use partitions to spread the data over more locations, either to overcome the size
limitations of the storage hardware (to allow for very large tables), or to prevent disks from getting overloaded by a high
volume of requests in OLTP systems. A DBA may also choose to partition the data because maintenance operations such as
backup, reorganize, and relocate can often be performed at the partition level.
NonStop SQL supports partitions, since its first implementation in 1987, to allow a database file to exceed the size of a
single volume.
Oracle introduced partitions in 1997 with Oracle 8 to accommodate very large tables. Before the partitioning option existed,
the size of an Oracle table was limited to that of the tablespace; although its size could be increased by adding more data
files. With the partitioning option, a table can be partitioned over multiple tablespaces. The Oracle Partitioning Option is
only available in the Enterprise Edition and is a separately priced product. Besides range- and hash-partitioning, the
Oracle Partitioning Option allows many more ways to partition data including list partitioning, sub partitions, and
interval partitioning.
Both systems however, provide range and hash partitioning. Based on the partitioning key column values, the system
determines where (in which partition) to store or retrieve a row.
In Oracle, the partitioning key can be any column (or combination of columns). SQL/MX requires the partition key columns to
be part of the clustering key, and the clustering key columns must have the NOT NULL and NOT DROPPABLE constraints.
Tables
Oracle’s heap-organized tables do not have an equivalent in SQL/MX because all tables are index-organized using B-tree
indexing. SQL/MX differentiates between primary keys and storage, or clustering, keys, although DBAs frequently choose to
store SQL/MX tables by their primary keys.
Primary key
Every row in an SQL/MX table has a unique identifier, the Primary Key. However, if no primary key is specified when a table is
created, the system adds a column, called the SYSKEY, to uniquely identify each row. When rows are inserted, SQL/MX
automatically generates unique values for this SYSKEY.
Clustering and partition keys
The clustering key determines the order in which the rows are stored in the table or a partition and the structure of the
B-tree index through which they are accessed. In most cases, the clustering key will be the primary key. If no primary key is
defined, the STORE BY clause of the CREATE TABLE command allows the rows to be stored together (clustered). In this case,
SQL/MX adds a SYSKEY column for uniqueness. The clustering key also determines the partitioning key: the columns that
are used to partition the data must be part of the clustering key, but can be in a different order. In Figure 5, the table has a
STORE BY clause of order_id, prod_id, time_id, but the partitioning key is time_id.
Range partitions
In a range-partitioned table, rows are mapped to partitions based on the range of values of the partitioning key. In Oracle,
the boundary of a partition is based on the highest value that a partition can contain; SQL/MX determines the partition
boundary based on the lowest (or FIRST KEY) value. Both Oracle and SQL/MX allow additional storage parameters, such as
the partition name, the location (tablespace or volume), and extents parameters.
The examples show how a range-partitioned sales order table might be defined in Oracle 4 (Figure 4) and in SQL/MX
(Figure 5). Note that the partition boundaries and the partition keys are defined differently. Also, note that the Oracle
definition places all partitions of the table in the user’s default tablespace.
2
When a user is created in Oracle, a default tablespace can be assigned.
3
When a user is created in the NonStop OS, a default volume can be assigned.
4
This is a modified version of the example listed in the Oracle Concepts manual.
6
7. Figure 4. DDL for an Oracle partitioned table
SQL> CREATE TABLE sales_orders
( prod_id NUMBER(6)
, order_id NUMBER(6)
, cust_id NUMBER
, time_id DATE
, channel_id CHAR(1)
, promo_id NUMBER(6)
, quantity_sold NUMBER(3)
, amount_sold NUMBER(10,2)
)
PARTITION BY RANGE (time_id)
(PARTITION SALES_1998 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')),
PARTITION SALES_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')),
PARTITION SALES_2000 VALUES LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')),
PARTITION SALES_2001 VALUES LESS THAN (MAXVALUE)
);
Table created.
SQL>
Figure 5 shows how the table is defined in NonStop SQL/MX. The LOCATION clause is used to assign partitions to volumes
and to give partitions a meaningful name (SALES_1998 through SALES_2001). While it is possible to store all partitions on
the same volume, typically, separate volumes are defined. In SQL/MX, the first partition is implicitly defined (in the example,
this is SALES_1998, located on $DATA01) and the other partitions are added using the ADD FIRST KEY clauses.
Figure 5. DDL for an SQL/MX partitioned table
>> CREATE TABLE sales_orders
( prod_id NUMERIC(6) NOT NULL
, order_id NUMERIC (6) NOT NULL
, cust_id NUMERIC
, time_id DATE NOT NULL
, channel_id CHAR(1)
, promo_id NUMERIC(6)
, quantity_sold NUMERIC(3)
, amount_sold NUMERIC(10,2)
, primary key (order_id, prod_id, time_id)
)
STORE BY PRIMARY KEY
LOCATION $DATA01 NAME SALES_1998
RANGE PARTITION BY (time_id)
(
ADD FIRST KEY (DATE '1999-01-01') LOCATION $DATA02 NAME SALES_1999 ,
ADD FIRST KEY (DATE '2000-01-01') LOCATION $DATA03 NAME SALES_2000 ,
ADD FIRST KEY (DATE '2001-01-01') LOCATION $DATA04 NAME SALES_2001
);
--- SQL operation complete.
>>
In both tables the time_id columns determines in which partition the row is located; however, SQL/MX requires that the
partition key is also part of the storage key. In SQL/MX, there are different ways to define the table such that it can be
partitioned on the time_id column. One can use the STORE BY (time_id) clause, which adds a SYSKEY to the row. Another
way is to define a set of columns as a primary key. In this example, the combination of order_id, product_id, and time_id is
unique, and therefore these columns are used to form the storage and primary keys.
Note that in Oracle you will specify four partitions using the PARTITION clauses, whereas in NonStop SQL, the table
definition already includes the first partition and three more are specified using the ADD FIRST KEY clauses. Because
NonStop SQL tables are always index organized, equivalent to Index Organized Tables (IOT) in Oracle, every partition has its
own index structure included.
7
8. Hash partitions
When a table is hash-partitioned, the DBMS applies an internal hash algorithm 5 to the partitioning key, and the outcome of
the function determines the number of the partition in which to store the data. The purpose of the hashing algorithm is to
evenly distribute the rows across the partitions, such that each partition will contain about the same number of rows.
Defining a hash-partitioned table is easier for a DBA, who does not need to consider the distribution of values of the
partitioning key that could lead to unequal partition sizes. However, when a new partition is added to a hash-partitioned
table, the content of the table must be redistributed over all partitions. When a partition is added to a range-partitioned
table, the effect on the existing partitions is limited. For example, adding a new date-range to an existing range might only
be a metadata change that involves no data movement.
Figure 6. Hash-partitioned table DDL for Oracle and SQL/MX
Oracle hash-partitioned table SQL/MX hash-partitioned table
SQL> CREATE TABLE hash_sales_orders >> CREATE TABLE hash_sales_orders
( prod_id NUMBER(6) ( prod_id NUMERIC(6) NOT NULL
, order_id number(6) , order_id NUMERIC (6) NOT NULL
, cust_id NUMBER , cust_id NUMERIC
, time_id DATE , time_id DATE
, channel_id CHAR(1) , channel_id CHAR(1)
, promo_id NUMBER(6) , promo_id NUMERIC(6)
, quantity_sold NUMBER(3) , quantity_sold NUMERIC(3)
, amount_sold NUMBER(10,2) , amount_sold NUMERIC(10,2)
) , primary key (order_id, prod_id)
PARTITION BY hash (prod_id) )
partitions 4 LOCATION $DATA01 NAME PART01
; STORE BY PRIMARY KEY
Table created. HASH PARTITION BY (prod_id)
(
ADD LOCATION $DATA02 NAME PART02 ,
ADD LOCATION $DATA03 NAME PART03 ,
ADD LOCATION $DATA04 NAME PART04
);
--- SQL operation complete.
The table in Figure 6 shows how the earlier sales_orders table is defined in Oracle and in SQL/MX. In this example, Oracle will
generate partition names and store them all in the tablespace that has been set up as the user’s default (or the SYSTEM
tablespace). In the NonStop example, partitions are added using the ADD partition clause, which requires a LOCATION clause.
Notes:
• The partition name is optional.
• Multiple partitions may be placed on a single volume. This is usually only done to simulate a large database on a
small system.
• For convenience to Java, ODBC, and mxci users, NonStop SQL/MX supports a feature called Partition Overlay Support (POS), which allows the DBA to use
CQDs 6 to define the number of partitions and a range of volumes to choose from.
If these CQDs are set, SQL/MX will create a hash partitioned table automatically.
5
The SQL/MX function HASHPARTFUNC (partitioning_key , num_partitions) can be used to determine the partition number where a row would be stored if the
partitioning_key would be the hash key for a table with num_partitions partitions.
6
These Control Query Defaults (CQDs) are POS_NUMBER_OF_PARTNS and POS_LOCATIONS.
8
9. Indexes
An index is a data structure that improves the speed of data retrieval on a table. The most commonly used type of index
structure in database management systems is the B-tree. Oracle supports other index types besides B-trees; all NonStop
SQL indexes are B-trees. At the bottom of the tree are the leaf nodes that contain a pointer to the data or the data itself. In
Oracle, the leaf nodes in an index contain the ROWID, which is a pseudo column that returns the address of a row. NonStop
SQL does not use ROWIDs; every table must have a primary key (either user- or system supplied) and the leaf nodes of a
base table index (the Primary Key index) contain the base table data. Alternate indexes in NonStop SQL are separate data
structures that ultimately contain the primary key columns of the underlying base table.
Structurally, SQL/MX indexes are the same as SQL/MX tables. A query that accesses base table data using an index key in
fact performs a SQL join of the index and its table on the value of the primary key. This implies that, if all base table fields
that are required in a query are present in the index, the base table does not need to be accessed and an index-only query is
performed. An index can so be joined with another base table, provided the index columns cover the query.
Indexes can be hash- or range partitioned, just like ordinary tables. However, index key columns may contain NULL values,
whereas table keys may not be NULL.
The primary key data structure of any table is included in the table itself, just like in Oracle’s Index Organized tables. It is
therefore not needed to create separate indexes for the primary keys in SQL/MX tables. In many cases, even creation of
indexes on parts of a multi-column primary key is not required because the Multi Dimensional Access Method (MDAM) can be
used on all indexes including the primary key.
Unique indexes
In fact, all index entries are unique. If a DBA creates a non-unique index on a table, Oracle adds the ROWID and NonStop SQL
adds the primary key columns to enforce index uniqueness. Conceptually, NonStop SQL creates a non-unique index on
(COL1, COL2) as:
CREATE UNIQUE INDEX IX on TAB(COL1, COL2, PKEY);
Oracle conceptually creates the same index as:
CREATE UNIQUE INDEX IX on TAB(COL1, COL2, ROWID);
Partitioned indexes
Compared to Oracle’s indexes, all indexes in NonStop SQL/MX are “global indexes”; they are partitioned independently of
the table on which they are created. The number of partitions can be different from that of the base table and they do not
have to reside on the same volumes, but if so desired, they can be co-located on the same volumes and even partitioned the
same way as the base table. Defining index partitions depends on how the index is used by queries, just as the reason to
create an index depends on the use of the data by queries.
For example, in most cases an index is partitioned on the index key. The database can read one or more index blocks and
knows all the primary key values of the base table. However, if an index is partitioned on the same columns as the base
table, SQL/MX will read multiple partitions in parallel to find the primary key values of the base table.
Consider a table with primary key Col1 and an index on Col2 with contents, as pictured in Figure 7. If the index is partitioned
the same way as the base table, SQL/MX must read both index partitions to retrieve a given index key. However, if the index
is partitioned based on the values of the index key Col2, as is shown in Figure 8, it only needs to access a single partition to
find all the matching rows.
9
10. Figure 7. Index partitioned on storage key of base table
If the index is partitioned the same way as the base table, each index partition will contain as many rows as each base table
partition. This could benefit a parallel scan that accesses all the index partitions. If the index is partitioned on the index key,
OLTP type queries will benefit because all qualifying index rows are found together in one partition, this minimizes the
IO required.
Figure 8. Index partitioned on index key
10