SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Netezza Fundamentals
Introduction to Netezza for Application Developers
Biju Nair
04/29/2014
Version: Draft 1.5
Document provided for information purpose only.
© asquareb llc 1
Preface
As with any subject, one may ask why do we need to write a new document on the subject when there is
so much information available online. This question is more pronounced especially in a case like this
where the product vendor publishes detailed documentation. I agree and for that matter this document
in no way replaces the documentation and knowledge already available on Netezza appliance. The
primary objective of this document is
 To be a starting guide for anyone who is looking to understand the appliance so that they can be
productive in a short duration
 To be a transition guide for professionals who are familiar with other database management
systems and would like to or need to start using the appliance
 Be a quick reference on the fundamentals for professionals who have some experience with the
appliance
With the simple objective in focus, the book covers the Netezza appliance broadly so that the reader can
be productive in using the appliance quickly. References to other documents have been provided for
readers interested in gaining more thorough knowledge. Also joining the Netezza developer community
on the web which is very active is highly recommended. If you find any errors or need to provide
feedback, please notify to bnair at asquareb dot com with the subject “Netezza” and thank you in
advance for your feedback.
© asquareb llc 2
Table of Contents
1. Netezza Architecture ............................................................................................................................3
2. Netezza Objects ....................................................................................................................................9
3. Netezza Security..................................................................................................................................16
4. Netezza Storage ..................................................................................................................................17
5. Statistics and Query Performance ......................................................................................................22
6. Netezza Query Plan Analysis...............................................................................................................26
7. Netezza Transactions..........................................................................................................................31
8. Loading Data, Database Back-up and Restores ..................................................................................33
9. Netezza SQL ........................................................................................................................................38
10. Stored Procedures...........................................................................................................................41
11. Workload Management..................................................................................................................52
12. Best Practices..................................................................................................................................56
13. Version 7 Key Features....................................................................................................................58
14. Further Reading ..............................................................................................................................59
© asquareb llc 3
1. Netezza Architecture
Building a good foundation helps developing better and beautiful things. Similarly understanding the
Netezza architecture which is the foundation helps develop applications which use the appliance
efficiently. This section details the architecture at a level which will satisfy the objective to help use the
appliance efficiently.
Netezza uses a proprietary architecture called Asymmetric Massively Parallel Processing (AMPP) which
combines the large data processing efficiency of Massively Parallel Processing (MPP) where nothing
(CPU, memory, storage) is shared and symmetric multiprocessing to coordinate the parallel processing.
The MPP is achieved through an array of S-Blades which are servers on its own running its own
operating systems connected to disks. While there may be other products which follow similar
architecture, one unique hardware component used by Netezza called the Database Accelerator card
which is attached to the S-Blades. These accelerator cards can perform some of the query processing
stages while data is being read from the disk instead of the processing being done in the CPU. Moving
large amount of data from the disk to the CPU and performing all the stages of query processing in the
CPU is one of the major bottlenecks in the many of the database management systems used for data
warehousing and analytics use cases.
The main hardware components of the Netezza appliance are a host which is a Linux server, which can
communicate to an array of S-Blades each of which has 8 processor cores and 16 GB of RAM running
Linux operating system. Each processor in the S-Blade is connected to disks in a disk array through a
Database Accelerator card which uses FPGA technology. Host is also responsible for all the client
interactions to the appliance like handling database queries, sessions etc. along with managing the meta-
data about the objects like database, tables etc. stored in the appliance. The S-Baldes between themselves
and to the host can communicate through a custom built IP based high performance network. The
following diagram provides a high level logical schematic which will help imagine the various
components in the appliance.
© asquareb llc 4
The S-Blades are also referred as Snippet Processing Array or SPA in short and each CPU in the S-
Blades combined with the Database Accelerator card attached to the CPU is referred as a Snippet
Processor.
Let us use a simple concrete example to understand the architecture. Assume an example data warehouse
for a large retail firm and one of the tables store the details about all of its 10 million customers. Also
assume that there are 25 columns in the tables and the total length of each table row is 250 bytes. In
Netezza the 10 million customer records will be stored fairly equally across all the disks available in the
disk arrays connected to the snippet processors in the S-Blades in a compressed form. When a user
query the appliance for say Customer Id, Name and State who joined the organization in a particular
period sorted by state and name the following are the high level steps how the processing will happen
 The host receives the query, parses and verifies the query, creates the code to be executed to by
the snippet processors in the S-Blades and passes the code for the S-Blades
 The snippet processors execute the code and as part of the execution, the data block which
stores the data required to satisfy the query in a compressed form from the disk attached to the
snippet processor will be read into memory. The Database Accelerator card in the snippet
processor will un-compress the data which will include all the columns in the table, then it will
remove the unwanted columns from the data which in case will be 22 columns i.e. 220 bytes out
of the 250 bytes, applies the where clause which will remove the unwanted rows from the data
and passes the small amount of the data to the CPU in the snippet processor. In traditional
databases all these steps are performed in the CPU.
 The CPU in the snippet processor performs tasks like aggregation, sum, sort etc on the data
from the database accelerator card and parses the result to the host through the network.
 The host consolidates the results from all the S-Blades and performs additional steps like sorting
or aggregation on the data before communicating back the final result to the client.
The key takeaways are
 The Netezza has the ability to process large volume of data in parallel and the key is to make
sure that the data is distributed appropriately to leverage the massive parallel processing.
 Implement designs in a way that most of the processing happens in the snippet processors;
minimize communication between snippet processors and minimal data communication to the
host.
© asquareb llc 5
Based on the simple example which helps understand the fundamental components of the appliance and
how they work together, we will build on this knowledge on how complex query scenarios are handled in
the relevant sections.
Terms and Terminology
The following are some of the key terms and terminologies used in the context of Netezza appliance.
Host: A Linux server which is used by the client to interact with the appliance either natively or through
remote clients through OBDC, JDBC, OLE-DB etc. Hosts also store the catalog of all the databases
stored in the appliance along with the meta-data of all the objects in the databases. It also passes and
verifies the queries from the clients, generates executable snippets, communicates the snippets to the S-
Blades, coordinates and consolidates the snippet execution results and communicates back to the client.
Snippet Processing Array: SPA is an array of S-Blades with 8 processor cores and 16 GB of memory
running Linux operating system. Each S-Blade is paired with Database Accelerator Card which has 8
FPGA cores and connected to disk storage.
Snippet Processor: The CPU and FPGA pair in a Snippet Processing Array called a snippet processor
which can run a snippet which is the smallest code component generated by the host for query
execution.
Netezza Objects
The following are the major object groups in Netezza. We will see the details of these objects in the
following chapters.
 Users
 Groups
 Tables
 Views
 Materialized View
 Synonyms
 Database
 Procedures and User Defined Function
For anyone who is familiar with the other relational database management systems, it will be obvious
that there are no indexes, bufferpools or tablespaces to deal with in Netezza.
Netezza Failover
As an appliance Netezza includes necessary failover components to function seamlessly in the event of
any hardware issues so that its availability is more than 99.99%. There are two hosts in a cluster in all the
Netezza appliances so that if one fails the other one can takes over. Netezza uses Linux-HA (High
© asquareb llc 6
Availability) and Distributed Replicated Block Device for the host cluster management and mirroring of
data between the hosts.
As far as data storage is concerned, one third of every disk in the disk array stores primary copy of user
data, a third stores mirror of the primary copy of data from another disk and another third of the disk is
used for temporary storage. In the event of disk failure the mirror copy will be used and the SPU to
which the disk in error was attached will be updated with the disk holding the mirror copy. In the event
of error in a disk track, the track will be marked as invalid and valid data will be copied from the mirror
copy on to a new track.
If there are any issues with one of the S-Blades, other S-Blades will be assigned the work load. All
failures will be notified based on event monitors defined and enabled. Similar to the dual host for high
availability, the appliance also has dual power systems and all the connection between the components
like host to SPA and SPA to disk array also has a secondary. Any issues with the hardware components
can be viewed through the NZAdmin GUI tool.
Netezza Tools
There are many tools available to perform various functions against Netezza. We will look at the tools
and utilities to connect to Netezza here and other tools will be detailed in the relevant sections.
For Administrators one of the primary tools to connect to the Netezza is the NzAdmin. It is a GUI
based tool which can be installed on a Windows desktop and connect to the Netezza appliance. The tool
has a system view which it provides a visual snapshot of the state of the appliance including issues with
any hardware components. The second view the tool provides is the database view which lists all the
databases including the objects in them, users and groups currently defined, active sessions, query history
and any backup history. The database view also provides options to perform database administration
tasks like creation and management of database and database objects, users and groups.
The following is the screen shot of the system view from the NzAdmin tool.
© asquareb llc 7
The following is the screen shot of the database view from the NzAdmin tool.
The second tool which is often used by anyone who has access to the appliance host is the “nzsql”
command. It is the primary tool used by administrators to create, schedule and execute scripts to
perform administration tasks against the appliance. The “nzsql” command invoke the SQL command
interpreter through which all Netezza supported SQL statements can be executed. The command also
has some inbuilt options which can be used to perform some quick look ups like list of list of databases,
users, etc. Also the command has an option to open up an operating system shell through which the user
© asquareb llc 8
can perform OS tasks before exiting back into the “nzsql” session. As with all the Netezza commands,
the “nzsql” command requires the database name, users name and password to connect to a database.
For e.g.
nzsql –d testdb –u testuser –p password
Will connect and create a “nzsql” session with the database “testdb” as the user “testuser” after which
the user can execute SQL statements against the database. Also as with all the Netezza commands the
“nzsql” has the “-h” help option which displays details about the usage of the command. Once the user
is in the “nzsql” session the following are the some of the options which a user can invoke in addition to
executing Netezza SQL statements.
c dbname user passwd Connect to a new database
d tablename Describe a table view etc
d{t|v|i|s|e|x} List tablesviewsindexessynonymstemp tablesexternal tables
h command Help on particular command
i file Reads and executes queries from file
l List all databases
! Escape to a OS shell
q Quit nzsql command session
time Prints the time taken by queries and it can be switched off by time again
One of the third party vendor tools which need to be mentioned is the Aginity Workbench for Netezza
from Aginity LLC. It is a GUI based tool which runs on Windows and uses the Netezza ODBC driver
to connect to the databases in the appliance. It is a user friendly tool for development work and adhoc
queries and also provides GUI options to perform database management tasks. It is highly
recommended for a user who doesn’t have the access to the appliance host (which will be most of the
users) but need to perform development work.
© asquareb llc 9
2. Netezza Objects
Netezza appliance comes out of the box loaded with some objects which are referred to as system
objects and users can create objects to develop applications which are referred as user objects. In this
section we will look into the details about the basic Netezza objects which every user of the appliance
need to be aware of.
System Objects:
Users
The appliance comes preconfigured with the following 3 user ids which can’t be modified or deleted
from the system. They are used to perform all the administration tasks and hence should be used by
restricted number of users.
User id Description
root The super user for the host system on the appliance and has all the access as a super
user in any Linux system.
nz Netezza system administrator Linux account that is used to run host software on Linux
admin The default Netezza SQL database administrator user which has access to perform all
database related tasks against all the databases in the appliance.
Groups
By default Netezza comes with a database group called public. All database users created in the system
are automatically added as members of this group and cannot be removed from this group. The admin
database user owns the public group and it can’t be changed. Permissions can be set to the public group
so that all the users added to the system get those permissions by default.
Databases
Netezza comes with two databases System and a model database both owned by Admin user. The
system database consists of objects like tables, views, synonyms, functions and procedures. The system
database is primarily used to catalog all user database and user object details which will be used by the
host when parsing, validating and creation of execution code for queries from the users.
User Objects:
Database
Users with the required permission or an admin user can create databases using the create database sql
statement. The following is a sample SQL to create a database called testdb and it can be executed in an
nzsql session or other query execution tool.
create database testdb;
© asquareb llc 10
Table
The database owner or user with create table privilege can create tables in a database. The following is a
sample table creation statement which can be executed in an nzsql session or other query execution tool.
create table employee (
emp_id integer not null,
first_name varchar(25) not null,
last_name varchar(25) not null,
sex char(1),
dept_id integer not null,
created_dt timestamp not null,
created_by char(8) not null,
updated_dt timestamp not null,
updated_by char(8) not null,
constraint pk_employee primary key(emp_id)
constraint fk_employee foreign key (dept_id) references department(dept_id)
on update restrict on delete restrict
) distribute on random;
Anyone who is familiar with other DBMS systems, the statement will look familiar except for the
“distribute on” clause details of which we will see in a later section. Also there are no storage related
details like tablespace on which the table needs to be created or any bufferpool details which are handled
by the Netezza appliance. The following is the list of all the data types supported by the Netezza
appliance which can be used in the column definitions of tables.
Data Type Description/Value Storage
byteint (int1) -128 to 127 1 byte
smallint (int2) -32,768 to 32,767 2 bytes
Integer (int or int4) -35,791,394 to 35,791,394 4 bytes
bigint (int8) -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 8 bytes
numeric(p,s) Precision p can range from 1 to 38 and scale from 0 to P p < 9 – 4 bytes
10< p<18–8 bytes
19<p<38–16 bytes
numeric Same as numeric(18,0) 4 – 16 bytes
decimal Alias to numeric 4 – 16 bytes
float(p) Floating point number with precision p between 1 & 15 p < 7 – 4 bytes
6 < p < 16 – 8bytes
real Alias for float(6) 4 bytes
double precision Alias for float(15) 8 bytes
Character(n)/char(n) Fixed length, blank padded to length n. The default value
of n is 1. The maximum character string size is 64,000.
If n is equal to 16
or less then n bytes.
If n is greater than
16, disk usage is the
same as varchar (n).
character varying(n)
/varchar(n)
Variable length to a maximum length of n. No blank
padding, stored as entered. The maximum character
string size is 64,000.
N+2 or fewer bytes
depending on the
actual data.
nchar(n) Fixed length unicode, blank padded to length n. The
maximum length of 16,000 characters.
© asquareb llc 11
Data Type Description/Value Storage
nvarchar(n) Variable length unicode to a maximum length of n. The
maximum length of 16,000 characters.
Boolean / bool With value true (t) or false (f). 1 byte
date Ranging from January 1, 0001, to December 31, 9999. 4 bytes
timestamp Date part and a time part, with seconds stored to 6
decimal positions. Ranging from January 1, 0001
00:00:00.000000 to December 31, 9999 23:59:59.999999.
8 bytes
Refer the reference document for additional data types and details.
Other than validating the data to be inconsistent with the column data type and the not null constraint,
Netezza doesn’t enforce any of the constraints like the primary key or foreign key when inserting or
loading data into the tables for performance reasons. It is up to the application to make sure that these
constraints are satisfied by the data being loaded into the tables. Even though the constraints are not
enforced by Netezza defining them will provide additional hints to the query optimizer to generate
efficient snippet execution code which in turn helps performance.
Users can also create temporary table in Netezza which will get dropped at the end of the transaction or
session in which the temp table is created. The temporary table can be created by adding the “temporary
or temp” clause as part of the create table statement and the statement can include all the other clauses
applicable for the creation of a regular table. The following is an example of creating a temporary table
create temporary table temp_emp(
id integer constraint pk_emp primary key,
first_name varchar(25)
) distribute on hash(id);
Another way a user can create a table is to model the new table based on a query result. This can be
accomplished in Netezza using the create table as command and are referred as CTAS tables in short.
The following are some sample CTAS statements
create table ctas_emp as select * from emp where dept_id = 1;
create table ctas_emp_dept as select emp.name, dept.name from emp, dept where
emp.dept_id = dept.id;
Since the new table definition is based on the query result which is executed as part of CTAS statement,
there as many possibilities for which the users can use this like redefining the columns in a current table
while populating the new table at the same time. If the user doesn’t want to populate data but create only
the table structure a “limit 0” can be included at the end of the query as in the following example.
create table ctas_emp as select * from emp limit 0;
Tables can be deleted from the database using the DROP sql statement which drops the table and its
content. The following is an example
© asquareb llc 12
drop table employee;
Once tables are created they can be modified using the alter statement. Renaming the table, changing the
owner of the table, adding or dropping a new column, renaming a column, adding or dropping a
constraint are some of the modifications which can be made through the alter statement. The following
are sample alter statements which can be executed through nzsql session or a query execution tool.
alter table employee add column education_level int1;
alter table employee modify column (first_name varchar(30));
alter table employee drop column updated_by;
alter table employee owner to hruser;
The following are some of the points to be aware of when using the alter table statement.
 Modifying the column length is only applicable to columns defined as varchar.
 If a table gets renamed the views attached to the table will stop working
 Column data types can’t be changed through an alter statement
 If a table is referenced by a stored procedure adding or dropping a column is not permitted. The
stored procedure needs to be dropped first before adding or dropping a column and then the
stored procedure needs to be recreated.
View
The database owner or user with create view privilege can create views in a database. Netezza stores the
select SQL statement for the view and doesn’t materialize the data and store the data on disk. The SQL
gets executed pulling the data from the base table whenever a user accesses the view creating a virtual
table. The following is a sample view creation statement which can be executed in an nzsql session or
other query execution tool.
create or replace view it_employees as select * from employee where dept_id = 100;
Materialized View
Materialized views are created to project a subset of columns from a table and data sorted on some of
the projected columns. When created the system materializes and stores the sorted projection of the base
table’s data in a new table on the disk. Materialized views can be queried directly or can be used to
improve performance against the base table. In the latter case, the materialized view acts like an index
since the system stores an additional column in the view with the details about from where in the base
table that data originated.
A database owner or user with create materialized view privilege can create materialized view in a
database. The following is a sample materialized view creation statement which can be executed in an
nzsql session or query execution tools.
© asquareb llc 13
create materialized view employee_mview as select emp_id, first_name, last_name,
dept_id from employee order by dept_id, first_name;
This materialized view will improve the performance of the following query against the base table since
the view stores where in the base table the data satisfying the query is stored and in turn acts like an
index in a traditional database.
select * from employee where dept_id = 100 and first_name = ‘John’;
The following are the restrictions in the creation of materialized views
 Only one table can be specified in the FROM clause of the create statement
 There can be no where clause in the select clause of the create statement
 The columns in the projection list must be columns from the base table and no expressions
 The columns in the ORDER BY clause should be one of the columns in the select statement
 NULLS LAST or DESC cannot be used in the ORDER BY clause
 External, temporary, system or clustered base tables can’t be used as base table for materialized
views
As and when records are inserted into the base table the system also adds data to the materialized view
table. But the new data getting inserted will not be in the sorted order and hence there will be some data
in the materialized data which are not in the sorted order. In order to get the materialized view in sorted
order, the views need to refreshed periodically or threshold of unsorted data as a percentage of total
records can be set so that the system performs the refresh when the percentage of unsorted records in
the table exceeds the threshold.
Synonym
A synonym is an alternate way of referencing tables and views. It allows users to create easy to type
names for long table or view names. An admin user or any user with create synonym privilege can create
synonyms in a database and the following is a sample statement to create synonym through an nzsql
session or other query execution tool.
create synonym techies for it_employees;
Synonyms can be created against a non-existent table or view since it is not verified during creation. But
if the table or view referred by a synonym is not existent during runtime an error message will be
returned. Synonyms cannot be created for temporary tables, remote databases or other synonyms.
Sequence
© asquareb llc 14
Sequences are database objects through which users can generate unique numbers and can then be used
in an application like creation of unique keys. The following is a sample sequence creation statement
which can be used to populate the id column in the employee table.
create sequence seq_emp_id as integer start with 1 increment by 1 minvalue 1 no
maxvalue no cycle;
Since no max value is used, the sequence will be able to hold up to the largest value of the sequence type
which in this case is 35,791,394 for integer type. Also in this case once the sequence reaches its
maximum value it doesn’t start reusing the old values since no cycle option is used to prevent reuse.
Values generated by sequences can be accessed using “next value for sequence” and following are some
examples.
select next value for seq_emp_id;
select *, next value for seq_emp_id from emp;
Few points to note about using sequences
 No two users of a sequence will ever get the same value from a sequence while requesting for
the next value
 When a transaction using a sequence rolls back the value read from the sequence will not be
rolled back i.e. there can be gaps in the sequence number due to transaction rollbacks
 Netezza appliance assigns sequence number ranges to all SPUs available in the system which gets
cached for performance reasons. That would mean the sequences generated by the various SPUs
will create gaps in the sequence numbers at any point in time since the ranges will not be over
lapping.
 System will be forced to flush cached values of sequences in situations like stopping of the
system, system or SPU crashes or during some alter sequence statements which will also create
gaps in the sequence number generated by a sequence.
Existing sequences can be modified using “alter sequence” statement. Some of the modifications are
changing the owner, renaming the sequence, restarting the sequence with a new start value, change the
increment value or max value and whether the sequence should or should not reuse the old values once
it has reached its max value. The following is a sample “alter sequence” statement.
alter sequence seq_emp_id increment by 2;
Also an existing sequence can be dropped using a “drop sequence” statement and the following is an
example.
drop sequence seq_emp_id;
© asquareb llc 15
Procedures
We will look into the details about stored procedures when we discuss about application development.
Functions
We will look into the details about user defined functions when we discuss about application
development.
© asquareb llc 16
3. Netezza Security
This section deals with security in the context of user access to the appliance. In Netezza there are two
levels of access controls, one at the host OS level and second at the Netezza database level. By setting
the required access restrictions at these two levels, the appliance can be secured effectively.
OS Level Security
Netezza host uses industry standard Linux operating system customized for performance and
functionality required for the appliance. As in any Linux installations, user access restrictions need to be
put in place using user ids, user groups and passwords. Out of the box, the appliance is configured with a
“root” user id which is the Linux super user and the user id “nz” which is the Netezza system
administrator id which is used to run Netezza on the host. The “root” user id can be used to create other
user ids for users who need to access the appliance natively through the host command shell. Since the
host access is required to perform very restricted tasks primarily administration tasks, the number of user
ids created to access the appliance should be fairly small. Restrictions on what users can perform can be
set by creating Linux user groups with different access restrictions and attaching the relevant users to the
groups. Setting password selection rules like mix of alphabets, numbers, special characters, minimum
password length etc. along with password expiry for users is a good practice.
Database Level Security
Access to databases is controlled using user ids and passwords which are separate from the OS level user
id and password. If an user need to be able to access to a Netezza database natively through a “nzsql”
session on the host, the user need to use a OS level user id and password to log in to the host and then
need to invoke the “nzsql” command using the database level user id and password which has access to
the particular database of interest. Access to databases, objects with in a database and the type of
activities which can be performed on them are all controlled by the privileges granted to the user id to
perform the task. Netezza also supports user groups as with the Linux operating system where privileges
can be assigned to groups and similar users can be attached to the group so that it is easier to manage
access to databases. When a user id is attached to more than one group the user id gets combination of
all the privileges assigned to the groups to which the user id is attached to. The following is a sample
“create user” statement
create user dbdev with password ‘zksrfas92834’;
The following is a sample “create group” statement
create group devgrp with user dbdev, scott;
© asquareb llc 17
4. Netezza Storage
As discussed earlier, each disk in the appliance is partitioned into primary, mirror and temp or swap
partitions. The primary partition in each disk is used to store user data like database tables, the mirror
stores a copy of the primary partition of another disk so that it can be used in the event of disk failures
and the temp/swap partition is used to store the data temporarily like when the appliance does data
redistribution while processing queries. The logical representation of the data saved in the primary
partition of each disk is called the data slice. When users create database tables and loads data into it,
they get distributed across the available data slices. Logical representation of data slices is called the data
partition. For TwinFin systems each S-Blade or SPU is connected to 8 data partitions and some only to 6
disk partitions (since some disks are reserved for failovers). There are situations like SPU failures when a
SPU can have more than 8 partitions attached to it since it got assigned some of the data partitions from
the failed SPU. The following diagram illustrates this concept
The SPU 1001 is connected to 8 data partitions numbered 0 to 7. Each data partition is connected to one
data slice stored on different disks. For e.g., the data partition 0 points to the data slice 17 stored on the
disk with id 1063. The disk 1063 also stores the mirror of the data slice 18 stored on disk 1064. The
following diagram illustrates what happens when the disk 1070 fails.
Immediately after the disk 1070 stops responding, the disk 1069 will be used by the system to satify
queries for which data is required from data slice 23 and 24. Disk 1069 will serve the requests using the
data in both its primary and mirror partition. This will also create a bottleneck which inturn impacts
query performances. In the meantime, the contents in disk 1070 are regenerated on one of the spare
© asquareb llc 18
disks in the disk array which in this case is disk 1100 using the data in disk 1069. Once the regen is
complete the SPU data partition 7 is updated to point to the data slice 24 on disk 1100. The regen
process removes the bottleneck of disk 1069 to perform optimally.
In the situation where a SPU fails, the appliance assigns all the data partitions to other SPUs in the
system. Pairs of disks which contains the mirror copy of each others data slice will be assigned to other
SPUs which will result in additional two data partitioned to be managed by the target SPU. If for e.g. if a
SPU currently manages data partitions 0 to 7 and if the appliance reassings two data partitions from a
failed SPU, the SPU will have 10 data partitions to manage and it will be numbered from 0 to 9.
Data Organization
When users create tables in databases and store data into it, data gets stored in disk extents which is the
minimum storage allocated on disks for data storage. Netezza distributes the data in data extents across
all the available data slices based on the distribution key specified during the table creation. A user can
specify upto four columns for data distribution or can specify the data to be distributed randomly or
none at all during the table creation process. If an user provides no distribution specification, Netezza
uses one of the columns to distribute the data and the selection of which cannot be influenced. When
the user specifies particular column for distribution then Netezza uses the column data to distribute the
records being inserted across the dataslices. Netezza uses hashing to determine the dataslice into which
the record needs to be stored. When the user selects random as the option for data distribution, then
the appliance uses round robin algorithm to distribute the data uniformly across all the available
dataslices. The following are some sample table create statements using the distribute clause.
create table employee (
emp_id integer not null,
first_name varchar(25) not null,
last_name varchar(25) not null,
sex char(1),
dept_id integer not null,
created_dt timestamp not null,
created_by char(8) not null,
updated_dt timestamp not null,
updated_by char(8) not null,
constraint pk_employee primary key(emp_id)
constraint fk_employee foreign key (dept_id) references department(dept_id)
on update restrict on delete restrict
) distribute on hash(emp_id);
create table nation_state (
ns_id integer not null constraint pk_nation_state primary key,
nation varchar(25),
state_code varchar(5),
state varchar(25)
) distribute on random;
The key is to make sure that the data for a table is uniformly distributed across all the data slices so that
there are no data skews. By distributing data across the data slices, all the SPUs in the system can be
utilized to process any query and in turn improves performance. Also the performance of any query
depends on the slowest SPUs handling the query. So if the data is not uniformly distributed then some
© asquareb llc 19
of the SPUs will have more data in the data slice attached to it called data skew which will impact the
processing time and in turn the overall query performance. Selecting columns with high cardinality for
the distribution is a good practice to follow.
Even if a column with high cardinality like a date column is chosen to distribute data, there is a
possibility of creating processing skew. For e.g. using the date column as the distribution key, the data
gets distributed fairly evenly across all the data slices. But if most of the queries are looking for data for a
particular month which is fairly often in a data warehousing environment, then only a particular set of
data slices may need to be processed by the appliance which in turn will only utilize a subset of SPUs
causing the query performing sub optimally. This is called processing skew and needs to be prevented by
understanding the processing requirements and choosing the correct distribution keys.
When creating CTAS tables the following pattern is followed in terms of how the data distribution will
happen in the new table created when an explicit distribution criteria is not defined
 If a CTAS table is created on a single source table and the source table is randomly distributed,
then the CTAS table will be distributed in the first column in the CTAS table.
 If a CTAS table is created on a single source table with defined distribution on certain columns,
the CTAS table will inherit the source tables distribution
 If a CTAS table is created by joining two tables, the distribution key of the resulting CTAS table
will be the join key of the two tables
 If a CTAS table is created by joining multiple tables and a group by clause, the distribution key
of the resulting CTAS table will be the keys in the group by clause.
Zone Maps
When table data gets stored in extents on disk, Netezza keeps track of the minimal value and the
maximum value of columns of certain data types in data structures called zone maps. The zone maps are
created and maintained automatically for all the columns in the table which is of following data types.
 All Integer types (int1, int2, int4, int8)
 Date
 Timestamp
By keeping track of the min and max values, Netezza will be able to avoid reading disk extends which
will be most of the disk extends in a large data warehouse environment. For e.g. if one of the fact table
stores one million records per month for the last 10 years which is more than a quarter billion records
and if the queries process only a month’s data, Netezza will be able to read only the one million record.
And if there are 96 snippet processors and if the data is uniformly distributed across all the data slices,
then the amount of data read into the snippet processors will be a little over 100,000 records which is
small to process. Enabling the appliance to utilize the zone map feature by selecting the best data types
for columns in the tables is another key point to take into consideration during design.
When two tables are joined together often like a customer table and order table, the distribution key
selection of the two tables can play an important role in the performance of the queries. If the
© asquareb llc 20
distribution key is on the join column, for e.g. customer id column in both the customer and order table,
the data distribution will result in the records with the same customer id values ending up in the same
data slice for both the tables. When a query joining the table is being processed, since the matching data
from both the tables are in the same data slice the snippet processor will be able to perform the join
locally and the send the result without performing additional work which in turn improves the
performance of the query. If the tables are not distributed on the columns often used for join, matching
data from both the tables will end up in different data slices which means, the snippet processor needs to
perform additional work to satisfy the join. The appliance will choose to temporarily redistribute one of
the tables on the join column if the other is already distributed on the join column and then the snippet
processors can perform the join locally. If both the tables are not distributed on the join column, then
the appliance may redistribute both the tables before the snippet processors can perform the join. If a
table stored relatively small number of records then Netezza can decide to broadcast the whole table to
all the SPUs so that each one has its own copy for processing. What this means is the host needs to
consolidate the table data from all the data slices and send it across to all the SPUs the complete table
data.
Clustered Base Tables (CBT)
Along with the option to distribute data during table definition, Netezza also provides an option to
organize the distributed data with in a data slice. For e.g., we may have distributed employee table on
employee id but wanted to have employee records from the same department to be stored closely
together, the column dept id in the table can be specified in the create table statement as in the following
example
create table employee (
emp_id integer not null,
first_name varchar(25) not null,
last_name varchar(25) not null,
sex char(1),
dept_id integer not null,
created_dt timestamp not null,
created_by char(8) not null,
updated_dt timestamp not null,
updated_by char(8) not null,
constraint pk_employee primary key(emp_id)
constraint fk_employee foreign key (dept_id) references department(dept_id)
on update restrict on delete restrict
) distribute on hash(emp_id)
organize on(dept_id);
Netezza allows up to four columns to organize on. When data gets stored on the data slice, records with
the same organize on column values will get stored in the same or close by extends. Organize on
improves queries on fact tables when they are defined with the frequently joined columns to organize the
data. All the columns specified in the organize on clause are zone mapped and by knowing the range of
values of these columns stored in each physical extent, Netezza can eliminate reading unwanted extents
during a join query which improves the query performance. Zone mapping of additional data types are
supported when columns are specified in organize on clause and the following is the list of data types
including data types which are by default zone mapped.
© asquareb llc 21
Default zone map data types
 Integer - 1-byte, 2-byte, 4-byte, and 8-byte
 Date
 Timestamp
Additional data types which can be zone mapped due to organize on clause
 Char - all sizes, but only the first 8 bytes are used in the zone map
 Varchar - all sizes, but only the first 8 bytes are used in the zone map
 Nchar - all sizes, but only the first 8 bytes are used in the zone map
 Nvarchar - all sizes, but only the first 8 bytes are used in the zone map
 Numeric - all sizes up to and including numeric(18)
 Float
 Double
 Bool
 Time
 Time with timezone
 Interval
Organize on column definitions can be modified using the table alter statement. But any columns
included in organize on clause cannot be dropped by the table. If a table is altered to be a clustered base
table, any new records inserted into the table will be organized appropriately. In order to organize the
existing records in the table, “GROOM TABLE” needs to be executed to take advantage of the data
reorganization by queries. It is a good practice to have fact tables defined as clustered base tables with
data organized on often joined columns to improve multi-dimensional lookup. At the same time care
needs to be taken on the data organization columns by understanding the often executed queries and
also minimizing the number of columns on which the data needs to be organized on. Compared to
traditional indexes or using materialized views to improve performance, CBTs have a major advantage in
terms of not using additional space but organizing the table data in place. One point to note is that by
changing the data organization, it will impact the compression which may result in the increase or
decrease in the size of the table storage after converting a table to a CBT.
© asquareb llc 22
5. Statistics and Query Performance
Netezza query optimizer relies on the statistics from catalog to come up with an optimal query execution
plan. So it is imperative that the statics need to be kept current without which the execution plan
generated by the optimizer may be sub-optimal resulting in poor query performance. The following are
the statistics the optimizer relies on
Object Type Statistics
Table Number of records in the tables used in the query
Table columns Minimum value in the column
Maximum value in the column
Count of null value in the column
Count of distinct values in the column (dispersion or cardinality)
Some examples of how the statistics can be used by the optimizer
 If a column is null able additional code may be generated to check whether the value is null or
not
 Knowing the number of records in the table, min and max values along with the number of
distinct values can help estimate the number of relevant records which can be returned for the
query assuming there is uniform distribution
 Based on the min and max values the optimizer can determine the type of math required to be
performed like 64 or 128 bit computation.
These statistics in the catalog are generated using the “GENERATE STATISTICS” command which
collects them and updates the catalog. Admin users, table owners and users who have the GENSTATS
privilege can execute the generate statistics command on tables in databases. The following are some
examples
generate statistics; -- Generates statistics on all tables in a database –
generate statistics on emp; -- Generates statistics on table emp –
generate statistics on emp(emp_id, dept_id);-- Generates statistics on specified columns –
Since generate statistics reads every record in the table to generate the statistics, the data is of very high
accuracy.
Netezza automatically maintains the statistics which may be of lesser accuracy compared to what is
generated by the generate statistics command. The following table lists the various commands and the
type of statistics automatically maintained by Netezza. Even though the statistics may not be very
accurate it is better than not having any
Command Row Count Min/Max vals Null count Distinct vals Zone Map
Create Table As Y Y Y Y Y
Insert Y Y N N Y
Update Y Y N N Y
Delete N N N N N
Groom N N N N Y
© asquareb llc 23
Note that the groom command just removes the rows deleted logically; there is no change in statistics
except for the zone map values.
The following are some recommended scenarios when generate statistics needs to be executed for
performance
 Significant changes to the database in terms of data
 Static tables intended for a long time
 Object in queries where there is three or more joins
 Slower query response after changes to tables
 On columns used in where, order by, group by and having clauses
 When temporary table used in a join contains large volumes of data
Netezza also computes just in time statistics using sampling under the following conditions. This is not a
replacement to running generate statistics and also not accurate due to the sampling.
 Query has tables that stores more than 5 million records
 Query has at least one column restriction like where clause
 Only if the query involves user tables
 Tables that participate in a query join or if a table has an associated materialized view
Along with sampling the JIT statistics collection process utilize zone maps to collect several pieces of
information like
 Number of rows to be scanned for the target table
 Number of extents that need to be scanned for the target table
 Max number of extents that need to be scanned on the data slices with the greatest skew
 Number of rows to be scanned for all the tables in each join
 Number of unique values in target table columns involved in join or group by processing
Note that when CTAS tables are created, Netezza schedules generate statistics to collect all required
statistics for the new table. This automatic scheduling of generate statistics can be influenced by the
following two postgresql.conf file settings.
 enable_small_ctas_autostats enables or disables auto stats generation on small CTAS tables
 ctas_auto_min_rows specifies the threshold number of rows above which the stats generation is
scheduled and the default value is 10000.
Groom
When data is updated in Netezza, the current record is marked for deletion and a new record is created
with updated values instead of updating the current record. Also when a record in a table is deleted, it is
not physically removed from storage instead the record is marked as logically deleted and will not be
visible for future transactions. Similarly when a table is altered to add or drop a column a new version of
© asquareb llc 24
table is created and queries against the table are serviced by using the different versions. Due to the
logical deletes which results in additional data in the tables and joins on versions of altered tables, query
performance get impacted. In order to prevent the query performance degradation, users of Netezza
appliance need to “GROOM” the table.
The groom command is used to maintain user tables and
 Reclaiming physical space by removing logically deleted rows in tables
 Migrate records from previous versions of tables into the current version and leave with only
one version of the altered table
 Organizing tables according to the organize on columns defined in a alter table command
are some of the key functions of the groom command. By default groom synchronizes with the latest
backup set and what that means is logically deleted data will not be removed if that data is not backed up
until up to the latest backup set. What that means is it is best to schedule grooms after database backups
so that the database objects are kept primed. The following are good practices with groom
 Groom tables that often receive large updates and deletes
 Schedule “Generate Statistics” after groom on tables so that the stats are up to date and accurate
 If there is a need to remove all the records in a table use truncate instead of delete so that there
will be no need for a groom since truncate will reclaim the space
The following are some example groom commands
groom table emp ; -- to delete logically deleted rows and reclaim space --
groom table emp versions; -- to remove older versions of an altered table and copy rows to new –
groom table emp records ready; -- to organize only the records which are ready to be organized –
Note that when groom is getting executed, it doesn’t lock the tables and users will be able to access to
perform data manipulation on the table data.
Explain
In order to understand the plan the Netezza appliance will use to execute a query, users can use the
explain SQL statement. Understanding the plan will also provide an opportunity for the users to make
any changes to the query or the physical data structure of the tables involved to make the query perform
better. The following are some of the sample “explain” statements.
explain select * from emp;
explain distribution select emp.name, dept.name from emp, dept where emp.dept_id =
dept.id;
explain plangraph select * from emp;
The output from the explain statement can be generated in verbose format or graphical format which
can be opened in a web browser window. The output details all the snippets which will be executed in
© asquareb llc 25
sequence on SPUs and also the code which will be executed on the host. The details include the type of
activity like table scan, aggregation etc., the estimated number of rows, the width of the rows, the cost
involved to execute that snippet since the Netezza query optimizer is a cost based optimizer, the object
involved, the confidence on the estimation etc. The following is a sample explain output
QUERY VERBOSE PLAN:
Node 1.
[SPU Sequential Scan table "OWNER" as "B" {}]
-- Estimated Rows = 16, Width = 11, Cost = 0.0.. 0.0, Conf = 100.0
Projections:
1:B.OWNER_NAME 2:B.OWNER_ID
[SPU Broadcast]
[HashIt for Join]
Node 2.
[SPU Sequential Scan table "ORDER" as "A" {}]
-- Estimated Rows = 454718, Width = 8, Cost = 0.0 .. 8.1, Conf = 80.0
Restrictions:
(A.OWNER_ID NOTNULL)
Projections:
1:A.ORDER_ID 2:A.OWNER_ID
Node 3.
[SPU Hash Join Stream "Node 2" with Temp "Node 1" {}]
-- Estimated Rows = 519678, Width = 23, Cost = 0.0 .. 19.7, Conf = 64.0
Restrictions:
(A.OWNER_ID = B.OWNER_ID)
Projections:
1:A.APPLICATION_ID 2:B.OWNER_NAME
Cardinality:
B.ORDER_ID 9 (Adjusted)
[SPU Return]
[Host Return]
QUERY PLANTEXT:
Hash Join (cost=0.0..19.7 rows=519678 width=23 conf=64) {}
(spu_broadcast, locus=spu subject=rightnode-Hash)
(spu_join, locus=spu subject=self)
(spu_send, locus=host subject=self)
(host_return, locus=host subject=self)
l: Sequential Scan table "A" (cost=0.0..8.1 rows=454718 width=8 conf=80) {}
(xpath_none, locus=spu subject=self)
r: Hash (cost=0.0..0.0 rows=16 width=11 conf=0) {}
(xpath_none, locus=spu subject=self)
l: Sequential Scan table "B" (cost=0.0..0.0 rows=16 width=11 conf=100) {}
(xpath_none, locus=spu subject=self)
© asquareb llc 26
6. Netezza Query Plan Analysis
As with most database management systems, Netezza generates a query plan for all queries executed in
the system. The query plan details optimal execution path as determined by Netezza to satisfy each
query. The Netezza component which generates and determines the optimal query path from the
available alternatives is called the query optimizer and it relies on the number of data available about the
database objects involved in the query executed. The Netezza query optimizer is a cost based optimizer
i.e. the optimizer determines a cost for the various execution alternatives and determines the path with
the least cost as the optimal execution path for a particular query.
Optimizer
The optimizer relies on number of statistics about the objects in the query executed.
 Number of rows in the tables
 Minimum and maximum values stored in columns involved in the query
 Column data dispersion like unique value, distinct values, null values
 Number of extends in each tables and the total number of extends on the data slice with the
largest skew
Given that the optimizer relies heavily on the statistics to determine the best execution plan, it is
important to keep the database statistics up to date using the “GENERATE STATISTICS” commands.
Along with the statistics the optimizer also takes into account the FPGA capabilities of Netezza when
determining the optimal plan.
When coming up with the plan for execution, the optimizer looks for
 The best method for data scan operations i.e. to read data from the tables
 The best method to join the tables in the query like hash join, merge join, nested loop join
 The best method to distribute data in between SPUs like redistribute or broadcast
 The order in which tables can be joined in a query join involving multiple tables
 Opportunities to rewrite queries to improve performance like
o Pull up of sub-queries
o Push down of tables to sub-queries
o De-correlation of sub-queries
o Expression rewrite
Query Plan
Netezza breaks query execution into units of work called snippets which can be executed on the host or
on the snippet processing units (SPU) in parallel. Queries can contain one or many snippets which will
be executed in sequence on the host or the SPUs depending on what is done by the snippet code.
Scanning and retrieving data from a table, joining data retrieved from tables, performing data
aggregation, sorting of data, grouping of data, dynamic distribution of data to help query performance
are some of the processes which can be accomplished by the snippet code generated for query execution.
© asquareb llc 27
The following is a sample query plan which shows the various snippets executed in sequence, what they
accomplish, in what order the snippets are executed and where they get executed when the query is run.
QUERY VERBOSE PLAN:
Node 1.
[SPU Sequential Scan table "ORGANIZATION_DIM" as "B" {}]
-- Estimated Rows = 22193, Width = 4, Cost = 0.0 .. 843.4, Conf = 100.0
Restrictions:
((B.ORG_ID NOTNULL) AND (B.CURRENT_IND = 'Y'::BPCHAR))
Projections:
1:B.ORG_ID
Cardinality:
B.ORG_ID 22.2K (Adjusted)
[HashIt for Join]
Node 2.
[SPU Sequential Scan table "ORGANIZATION_FACT" as "A" {}]
-- Estimated Rows = 11568147, Width = 4, Cost = 0.0 .. 24.3, Conf = 90.0 [BT:
MaxPages=19 TotalPages=1748] (JIT-Stats)
Projections:
1:A.ORGANIZATION_ID
Cardinality:
A.ORGANIZATION_ID 22.2K (Adjusted)
Node 3.
[SPU Hash Join(right exists) Stream "Node 2" with Temp "Node 1" {}]
-- Estimated Rows = 11568147, Width = 0, Cost = 843.4 .. 867.7, Conf = 57.6
Restrictions:
(A.ORGANIZATION_ID = B.ORG_ID) AND (A.ORGANIZATION_ID = B.ORG_ID)
Projections:
Node 4.
[SPU Aggregate]
-- Estimated Rows = 1, Width = 8, Cost = 930.6 .. 930.6, Conf = 0.0
Projections:
1:COUNT(1)
[SPU Return]
[HOST Merge Aggs]
[Host Return]
The Netezza snippet scheduler determines which snippets to add to the current work load based on the
session priority, estimated memory consumption versus available memory, and estimated SPU disk time.
Plan Generation
When a query is executed, Netezza dynamically generates the execution plan and C code to execute the
each snippet of the plan. The recent execution plans are stored in the data.<ver>/plan directory under
the /nz/data directory. The snippet C code is stored under the /nz/data/cache directory and this code is
used to compare against the code for new plans so that the compilation of the code can be eliminated if
the snippet code is already available.
Apart from Netezza generating the execution plan dynamically during query execution, users can also
generate the execution plan (without the C snippet code) using the “EXPLAIN” command. This process
will help users identify any potential performance issues by reviewing the plan and making sure that the
path chosen by optimizer is inline or better than expected. During the plan generation process, the
optimizer may perform statistics generation dynamically to prevent issues due to out of statistics data
particularly when the tables involved store large volume of data. The dynamic statistics generation
process uses sampling which is not as perfect as generating statistics using the “GENERATE
Snippet 1 reading data from table “B” and
this runs on all the SPUs
Snippet 2 reading data from table “A” and
this runs on all the SPUs
Snippet 3 performing a hash join on the data
returned from snippet 1 & 2. Runs on all SPUs
Snippet 4 performs count of the number of rows
from the query snippet 3. Runs on all SPUs
Each SPU returns the count to host which aggregates the
numbers and return the total count to user
Table from which data is
retrieved in this snippet
© asquareb llc 28
STATISTICS” command which scans the tables. The following are some variations of the “EXPLAIN”
command
EXPLAIN [VERBOSE] <SQL QUERY>; //Detailed query plan including cost estimates
EXPLAIN PLANTEXT <SQL QUERY>; //Natural language description of query plan
EXPLAIN PLANGRAPH <SQL QUERY>; //Graphical tree representation of query plan
Data Points in Query Plan
Following are some of the key data points in the query plan which will help identify any performance
issues in queries
…
Node 2.
[SPU Sequential Scan table "ORGANIZATION_FACT" as "A" {}]
-- Estimated Rows = 10007, Width = 4, Cost = 0.0 .. 24.3, Conf = 54.0 [BT:
MaxPages=19 TotalPages=1748] (JIT-Stats)
Projections:
1:A.ORGANIZATION_ID
Cardinality:
A.ORGANIZATION_ID 22.2K (Adjusted)
…
If the estimated number of rows is less than what is expected, it is an indication that the statistics on the
table is not current. It will be good to schedule “generate statistics” against the table to get the current
statistics. If the estimated cost is high it is a good candidate to look further and make sure that the higher
cost is not due to query coding inefficiency or database object definitions.
QUERY VERBOSE PLAN:
Node 1.
[SPU Sequential Scan table "ORDERS" as "O" {(O.O_ORDERKEY)}]
-- Estimated Rows = 500000, Width = 12, Cost = 0.0 .. 578.6, Conf = 64.0
Restrictions:
((O.O_ORDERPRIORITY = ‘HIGH’::BPCHAR) AND (DATE_PART('YEAR'::"VARCHAR",
O.O_ORDERDATE) = 2012))
Projections:
1:O.O_TOTALPRICE 2:O.O_CUSTKEY
Cardinality:
O.O_CUSTKEY 500.0K (Adjusted)
[SPU Distribute on {(O.O_CUSTKEY)}]
[HashIt for Join]
…
Each SPU executing the snippet retrieves only the relevant columns from the tables and applies the
restrictions using FPGA reducing the data to be processed by the CPU in the SPU which in turn helps
with the performance. The columns retrieved and the restrictions applied are detailed in the projections
section of the execution plan. The optimizer may decide to redistribute data retrieved in a snippet if the
tables joined in the query are not distributed on the same key value as in this snippet. If large volume of
data is being distributed by a snippet like redistribution of a fact table, it is a good candidate to check
whether the table is physically distributed on optimal keys. Be aware that if the snippet is projecting a
small set of columns from a table with many columns, when it redistributes the amount of data will be
small – number of rows * the columns projects. If the amount of data is small, redistributing dynamically
by a snippet execution may not be a major overhead since it is done using the high performing/high
bandwidth IP based network between the SPUs without the involvement of the NZ host.
Number of rows estimated to be
returned by this snippet
This is the confidence in
percentage on the estimated cost
This is the estimated cost of this
query snippet
The restrictions (where clause) applied to the
data read from the table in this snippet
Relevant table columns retrieved in this
snippet. The other columns are discarded
The resulting data is redistributed to all SPUs
with CUSTKEY as distribution key
The CUSTKEY value is hashed so that it can be
used to perform a hash join
© asquareb llc 29
…
Node 2.
[SPU Sequential Scan table "STATE" as "S" {(S.S_STATEID)}]
-- Estimated Rows = 50, Width = 100, Cost = 0.0 .. 0.0, Conf = 100.0
Projections:
1:S.S_NAME 2:S.S_STATEID
[SPU Broadcast]
[HashIt for Join]
…
Data from a snippet can also be broadcast to all the snippets. What that means is that the data retrieved
from all the SPUs executing the snippet returns the data to the host and the host consolidates the data
and sends all the consolidated data to every SPU in the server. Broadcast is efficient in case of small
tables joined with a large table and both cannot be distributed physically on the same key. As in the
example above, the state code is used in the join to generate the name of the state in the query result.
Since the fact and STATE table are not distributed in the same key the optimizer broadcasts the STATE
table to all the SPUs so that the joins can be done locally on each SPU. Broadcasting of STATE table is
much more efficient since the data stored in the table is miniscule compared to what NZ can handle.
…
Node 5.
[SPU Hash Join Stream "Node 4" with Temp "Node 1" {(C.C_CUSTKEY,O.O_CUSTKEY)}]
-- Estimated Rows = 5625000, Width = 33, Cost = 578.6 .. 1458.1, Conf = 51.2
Restrictions:
(C.C_CUSTKEY = O.O_CUSTKEY)
Projections:
1:O.O_TOTALPRICE 2:N.N_NAME
Cardinality:
C.C_NATIONKEY 25 (Adjusted)
…
NZ supports hash, merge and nested loop joins and hash joins are the most efficient. The key data
points to check at in a snippet performing a join is to see whether the correct join is selected by the
optimizer and whether the number of estimated rows is reasonable. For e.g. if a column participating in a
join is defined as floating point instead of an integer type, the optimizer cannot use hash join when the
expectation was to use hash join. The problem can be fixed by defining the table column to use integer
type. If the estimated number of rows is very high or the estimated cost is high, it may point to not
having the correct join conditions or missing a join condition by mistake resulting in incorrect join result
set.
Common reasons for performance issues
The following are some of the common reasons for performance issues
 Not having correct distribution keys resulting in certain data slice storing more data resulting in
data skew. Performance of a query depends on the performance of the slice storing the most
data for the tables involved in a query.
 Even if the data is distributed uniformly, not taking into account processing patterns in
distribution results is process skew. For e.g. data is distributed by month but if the process looks
for a month of data, then the performance of the query will be degraded since the processing
needs to be handled by a subset of SPUs and not all of the SPUs in the system.
Data retrieved from the snippet is
broadcast to all the snippets
This snippet performs a join of the data
from the previous snippets
© asquareb llc 30
 Performance gets impacted if a large volume of data (fact table) gets re-distributed or broadcast
during query execution.
 Not having zone maps of columns since the data types are not the ones for which NZ can
generate Zone Maps like numeric(x, y), char, varchar etc.
 Not having the table data organized optimally for multi table joins as in the case of multi-
dimensional joins performed in a data warehouse environment.
Steps to do query performance analysis
The following are high level steps to do query performance analysis
 Identify long running queries
o Identify if queries are being queued and the long running queries which is causing the
queries to be queued through NZAdmin tool or nzsession command.
o Long running queries can also be identified if query history is being gathered using
appropriate history configuration
 For long running queries generate query plans using the “EXPLAIN” command. Recent query
execution plans are also stored in *.pln files in the data.<ver>/plans directory under the
/nz/data directory.
 Look for some of the data points and reasons for performance issues details in the previous
sections.
 Take necessary actions like generate statistics, change distributions, grooming tables, creation of
materialized views, modifying or using organization keys, changing column data types or
rewriting query.
 Verify whether the modifications helped with the query plan by regenerating the plan using the
explain utility.
 If the analysis is performed in a non-development environment, it is key to make sure that the
statistics reflect the values expected or in production.
© asquareb llc 31
7. Netezza Transactions
All database systems which allow concurrent processing should support ACID transactions i.e. the
transactions should be atomic, consistent, isolated and durable. All Netezza transactions are ACID in
nature and in this section we will see how ACIDity is maintained by the appliance.
By default Netezza SQLs are executed in auto-commit mode i.e. the changes made by a SQL statement
takes in effect immediately after the completion of the statement as if the transaction is complete. If
there are multiple related SQL statements where all the SQL execution need to fail if any one of them
fails, user can use the BEGIN, COMMIT and ROLLBACK transaction control statements to control
the transaction involving multiple statements. All SQL statements between a BEGIN statement and
COMMIT or ROLLBACK statement will be treated as part of a single transaction i.e if anyone of the
SQL statement fails, all the changes made by the prior statements before the failure will be reverted back
leaving the database data in a state as it was before the start if execution of the BEGIN sql statement.
There are SQL statements like BEGIN which are restricted from including inside a BEGIN,
COMMIT/ROLLBACK block which users need to be aware of.
Traditionally database systems used logs to manage transaction commits and rollbacks. All changes
which are part of a transaction are maintained in a log and based on whether user commits or rolls back
the transaction the changes are made durable i.e. written to the storage. Netezza doesn’t use logs and all
the changes are made on the storage where user data is stored which also helps with the performance. To
accomplish this for table data changes, Netezza maintains three additional hidden columns (createxid,
deletexid and row id) per table row which stores the transaction id which created the row, the transaction
id which deleted the row and a unique row id assigned to the data row by the system. The transaction ids
are uniquely assigned ids for each transaction executed in the system and they are assigned in increasing
order. When a row is created the deletexid is set to “0” and when it gets deleted it is set to the
transaction id which executed the delete request. When an update is made to a table row, the row data is
not updated in place rather a new row is created with the updated values and the old row deletexid is
updated with the id of the transaction which issued the update command. In case a transaction rolls back
a delete operation, the deletexid of the rows deleted is updated with “0” and in the case of rolling back
update operation, the deletexid of the new row created is set to “1” and the deletexid of the old row is
updated with “0”. One key point to note is that the rows are deleted logically and not physically during
delete operations and additional rows are getting created during update operations. What that means is
groom needs to be executed regularly on tables which undergo large volume of deletes and updates to
reclaim the space and improve performance.
The transaction isolation is controlled by the transaction isolation levels supported by any database
system and the one used by the individual transaction. By ANSI standards there are four isolation levels
 Uncommitted Read
 Committed Read
 Repeatable Read
 Serializable
The following table details the data consistency which can be expected with each isolation levels
© asquareb llc 32
Isolation Level Uncommitted Data Non Repeatable Data Phantom Data
Uncommitted Y Y Y
Committed N Y Y
Repeatable read N N Y
Serializable N N N
Uncommitted Data: Transaction can see data from another transaction which is still not committed
Non Repeatable Data: When a transaction tries to read the same data, the data is changed by another
transaction but committed after the previous read.
Phantom Data: When a transaction tries to read the same data, the data previously read is not changed
but new rows has been added which satisfies the previous query criteria i.e. new rows has been added to
the data rows.
Netezza allows only the “serializable” transaction isolation level and if a transaction is found to be not
serializable then it will not get scheduled. This works fine since the appliance is primarily meant for a
data warehousing environment where there are very minimal or no updates once data is loaded. Data
versioning which can provide a consistent view of data for transactions running concurrently combined
with supporting only serializable transactions, Netezza doesn’t need to lock the synchronization
mechanism used in database systems traditionally to maintain concurrency. For some DDL statements
like dropping a table, the appliance will require exclusive access on objects and that would mean the
DDL statement execution will have to wait until any other operation like SELECT on the object is
complete.
© asquareb llc 33
8. Loading Data, Database Back-up and Restores
Loading and unloading data, creating back-ups and the process to restore is a vital component for any
database systems and in this section we will look into how they are done in Netezza.
External Tables
Users can define a file as a table and can access them as any other table in the database using SQL
statements. Such tables are called external tables. The key difference is that the even though data from
external tables can be accessed through SQL queries the data is stored in files and not in the disks
attached to the SPUs. Netezza allows selects and inserts on external tables but not updates deletes or
truncation. That means users can insert data into the external tables from a normal table in the database
which will get stored in the external file. Also data can be selected from the external table and inserted
into a database table and these two actions are equivalent of data unload and load. Any user with LIST
privilege and CREATE EXTERNAL TABLE privilege can create external tables. The following are
some sample statements to create external tables and how they can be used
-- creates an external table with the same definition as an existing customer table --
create external table customer_ext same as customer using (dataobject
(‘/nfs/customer.data’) delimiter ’|’);
-- create statement used the column definitions provided to create the external table --
create external table product_ext(product_id bigint, product_name varchar(25),
vendor_id bigint) using (dataobject(‘/tmp/product.out’ delimiter ’,’);
-- create statement uses the definition of existing account table to create the external table and stores the data in the
account.data file in an Netezza internal format --
create external table ‘/tmp/account.data’ using (format ‘internal’ compress true) as
select * from account;
-- read data from the external table and insert into an existing database table --
insert into temp_account select * from external ‘/tmp/account.data’ using (format
‘internal’ compress true);
-- storing data into an external table which in turn into the file. This file will be readable and used with other DBMS --
insert into product_ext select * from product;
Once external tables are created they can be altered or dropped. Statements used to drop and alter a
normal table can be used to perform these actions on external tables. Note that when an external table is
dropped, the file associated with the table is not removed and need to be removed through OS
commands if that is what is intended. Also queries can’t have more than one external table and unions
can’t be performed on two external tables. When user performs an insert operation on an external table,
the data in the file associated with the external table (if existing) will be deleted before new data is written
i.e. if required backup of the file may need to be made.
External tables are one of the options to backup and restore data in a Netezza database even though it
works at the table level. Since external tables can be used to create data files from table data in a readable
format with user specified delimiters and line escape characters, they can be used to move data between
Netezza and other DMBS systems. Note that the host system is not to store data and hence alternate
arrangements need to be made if the user intends to move large volume of data like using nfs mounts to
store the files created from the external tables.
© asquareb llc 34
NZLoad CLI
NZLoad is a Netezza command line interface which can be used to load data into database tables.
Behind the scenes NZLoad creates an external table for the data file to be loaded, does select data from
the external table and inserts the data into the target table and then drops the external table once the
insert of all the records are complete. All these are done by executing a sequence of SQL statements and
the complete load process is considered part of a single transaction i.e. if there is an error in any of the
steps the whole process is rolled back. The utility can be executed on a Netezza host locally or from a
remote client where Netezza ODBC client is installed. Even though the utility uses the ODBC driver, it
doesn’t require a Data Source Name on the machine it is executed since it uses the ODBC driver directly
bypassing the ODBC driver manager. The following is a sample NZLoad statement
nzload –u nz –pw nzpass –db hr –t employee –df /tmp/employee.dat –delimiter ‘,’ –lf
log.txt –bf bad.txt –outputDir /tmp –fileBufSize 2 –allowReplay 2
A control file can also be used to pass parameters using the –cf option to the nzload utility instead of
passing through the command line. This also allows more than one data file to be loaded concurrently
into the database with different options and each load will be considered as a different transaction. The
control file option –cf and the data file option –df are mutually exclusive i.e they can’t be used at the
same time. The following is a same nzload command assuming the load parameters are specified in the
load.cf file.
nzload –u nz –pw nzpass –host nztest.company.com –cf load.cf
NZLoad exits with an error code of 0 - for successful execution, 1 – for failed execution i.e. no records
were inserted, 2 – for successful loads where the maximum number of rows which were in error during
load did not exceed the maximum specified in the NZLoad command request.
When NZLoad is getting executed against a table, other transactions can still use the table since the load
command will not commit until the end and hence the data being loaded will not be visible to other
transactions executed concurrently. Also the NZLoad sends chunks of data to be loaded along with the
transaction id to all the SPUs which in turn stores the data immediately onto storage and thus taking up
space. If for any reason the transaction performing the load is cancelled or rolled back at any point, the
data will still remain in storage even though it will not be visible to future transactions. In order to claim
the space taken by such records, users need to schedule a groom against the table. Note NZReclaim
utility can also be used on earlier versions of Netezza but it is getting deprecated from version 6. The
user id used to run the NZLoad utility should have the privilege to create external tables along with
select, insert and list access on the database table.
Database Backup and Restore
Unlike traditional database systems, Netezza has a host component where all the database catalogs,
system tables, configuration files etc. are stored and the SPU component where user data is stored. Both
these components need to be backed-up in sync for a successful restore even though in restore situations
© asquareb llc 35
like database or table restore the host component may not be required. Netezza provides nzhostbackup
and nzhostrestore utilities to make host backups and restore along with the nzbackup and nzrestore
utilities to manage the user data backup and restoration. This is in addition to “external tables” which
can be used to backup and restore individual tables which can also be used to make back-ups of all the
tables in a database and hence creating a database backup.
Host Backup and Restore
Netezza host can be backed up using the nzhostbackup command utility. It backs up the Netezza data
directory which stores the system tables, database catalogs, configuration files, query plans, cached
executable code for the SPUs etc which are required for the correct functioning and access to user
databases are stored. When the command is executed, it will pause the system to make a checkpoint
before taking a backup of the /nz/data directory. Once the backup is complete the system is resumed to
its original state and due to this requirement to pause the system, it would be good to schedule the host
backup when there is no or very less activity on the system.
In the event of any issues with the Netezza host, the host can be restored using the nzhostrestore
utility. The restore will restore only the host components like the database catalog entries, system tables
etc to the state when the host backup was made. What that means is if there were actions like dropping
of user tables or truncation or grooming of tables after the host backup, the host restore will not be able
to revert back these actions. This may result in the inconsistent state of the user objects and corrective
action needs to be taken to bring them to the state in sync to when the host backup was made. Also in
the case of new tables created after the host backup, when the host restore is done the utility will create
SQL scripts to drop these tables which are orphaned since the entries are not in the catalog. To avoid
any such inconsistencies, it is always recommended not to schedule the host backup whenever there is a
major change to the databases. The following are sample nzhostbackup and nzhostrestore commands.
nzhostbackup /tmp/backup/nz-devhost-bk-01212011.tar.gz
nzhostrestore /tmp/backup/nz-uathost-bk-12212011.tar.gz
During the host restore process the utility checks for the version of the catalog on the host compared to
what is in the backup. If they are not the same or if for some reason the utility is not able to identify the
versions, then the restore process will fail. This verification step can be skipped using the –catverok of
the nzhostrestore utility but it is not recommended. Note that the nz host backup doesn’t back up the
operating system components or configurations and they need to be backed up separately.
User Database Backup and Restore
User database and data separate from the host components can be backed up and restored using the
nzbackup and nzrestore utilities.
The nzbackup allows the user to make a full, cumulative or differential backup of a database. The full
backup makes a backup of the complete database, while the differential backup makes a backup of the
changes which were made after the previous backup either full or cumulative. The cumulative backup
creates a backup of all the changes which were made after the previous full backup. Users executing the
© asquareb llc 36
nzbackup command need to have backup privilege on the specific database level or at the global level.
The following is an example where a full back up is scheduled every month say on day 1, a differential
back up scheduled every day and a cumulative backup scheduled every week.
When the differential back up is run, it captures all the data after the previous backup. If the database
needs to be restored as of day 4 after the monthly backup, the monthly backup needs to be used along
with the differential backup 1,2 & 3. If the database needs to be restored as of day 8, the then the
monthly backup, along with the cumulative backup and the differential back 6 need to be used. There is
no point in time recovery in Netezza as with other traditional databases where databases can be
recovered to a point in time between databases backup which uses logs which are not available in
Netezza. The following are some example Netezza backup commands.
-- creates a full backup of the hrdb database --
nzbackup –dir /tmp/backups –u nzbackup –p nzpass –db hrdb
-- creates a differential or incremental backup of the findb database --
nzbackup –dir /nfs/backups –db findb –differential
-- creates a cumulative backup of the hrdb database --
nzbackup –dir /nfs/backups –db hrdb –cumulative
-- creates a backup of the schema from the hrdb database and not the data --
nzbackup –dir /nfs/backups –db hrdb –schema-only
-- creates the backup of the users and permissions at user database and system level --
nzbackup –dir /nfs/backups –users –u nzbackup –p nzpass
As you may have noticed, there are two other uses for the nzbackup utility other than the database data
backup. One is to generate the schema of a database which then can be used to create another database
with only the entities in the source database and not the data. Second is to back up all the users, groups
and the privileges in the system so that it can be used to restore the user details or create the same set of
users and privileges in another system. The history of database backup can be viewed using the –history
option of the nzbackup command.
nzbackup –history
nzbackup –history –db hrdb
A full back-up along with all the differential and cumulative backups after that until the next full backup
is considered part of a backup set. Each backup set is identified by an id along with the backups made
within the backup set identified by a sequence number. All these details are displayed when a user makes
a request for the backup history for a database.
Databases can be restored using the nzrestore command line utility and the user needs to have the
restore privilege. A full database restore cannot be made against an existing database which can be done
Monthly full
Daily Diff 1
iff
Monthly full
Daily Diff 2
iff
Monthly full
Daily Diff 3
iff
Monthly full
Daily Diff 4
iff
Monthly full
Daily Diff 5
iff
Monthly full
Daily Diff 6
iff
Monthly full
Weekly Cumulative
iffMonthly full
1 2 3 4 5 6 7 8
Daily Diff 7
iff
Monthly full
9
© asquareb llc 37
in traditional databases. Instead the existing database needs to be dropped or new database needs to be
specified to perform a full database restore. As with nzbackup utility the option of –schema-only can be
used to create only the objects in the new database and not load the data from the backup. Also only the
users and privileges can be restored using the –users option of the nzrestore utility. The following are
some example restore commands
-- Performs a restore of the hrdb database from a full backup --
nzrestore –db hrdb –u nzbackup –p nzpass –dir /tmp/backups –v
-- creates all the objects in fundb into the new database testfindb but doesn’t load any data --
nzrestore –db testfindb –sourcedb findb –schema-only –u nzbackup –p nzpass –dir
/nfs/backups
-- Creates all the users and applies all the privileges as in the backup but doesn’t drop users or alters existing privileges --
nzrestore –users –u nzbackup –p nzpass –dir /nfs/backups
-- Restores the database findb upto and until the backup sequence 2 with in the backupset 200137274 --
nzrestore –db findb –u nzbackup –p nzpass –dir /tmp/backups –backupset 200137274 –
increment 2
Database backups can be used to restore specific table(s) to a point in time using the nzrestore –tables
option. One or more tables can be restored by listing them along with the –tables option. If the table
being restored is in the database then the table will be overwritten and if it doesn’t exist, the table will get
created. Table level restore is applicable only for regular tables.
© asquareb llc 38
9. Netezza SQL
As with other popular DBMS systems currently available in the market, SQL in Netezza supports SQL-
92 standard. The SQLs can be executed through the nzsql command line utility or through any JDBC,
ODBC or OLE DB connectivity.
SQL Statements
The following are the SQL statements available for the various object types in Netezza.
Database
SQL Statement Description
Create Database To create a new Netezza database
Drop Database To drop an existing Netezza database
Alter Database To rename or change owner of an existing database. Renaming a
database will require recompilation of any views in the database.
Also all materialized views will be converted to normal views and
will need to be replaced.
The following are some limits at the database level
Limit Type Limit
Name Length Maximum number of characters 128 bytes
Connections Maximum number of connections to the server:2000 default:500
When connected to one database, objects in another database can be accessed by qualifying the object
name with “database name.schema name” for e.g. “hrdb.hradmin.emp” can be used to access the emp
table in hrdb database under hradmin schema. The schema name/object owner name is optional and the
object can be accessed by “database nama..object name” e.g “hrdb..emp” since object names are unique
in a database. While objects in a second database can be queried or used in query joins, no data updates
are allowed across databases.
Tables
SQL Statement Description
CREATE TABLE To create a new Netezza table
DROP TABLE To drop an existing Netezza table
ALTER TABLE To rename or change owner or modify (add, drop) columns or
modify constraints in an existing table
SELECT col_name(s) FROM tbl To retrieve data from a database table
INSERT INTO tbl To insert data into a database table
UPDATE tbl SET col_nme = val To update existing data in a table
DELETE FROM tbl To delete data from a table
TRUNCATE TABLE tbl To remove all records from the table. Truncate removes the data
and recovers the storage instead of logically deleting the rows as in
delete statement
© asquareb llc 39
SQL Statement Description
UNION, UNION ALL To combine data from more than one table or row set UNION
can be used in conjunction with SELECT statement
INTERSECT, INTERSECT ALL To retrieve common set of rows in more than one table or row set
INTERSECT can be used in conjunction with SELECT statement
EXCEPT[DISTINCT],
MINUS[DISTINCT, EXCEPT
ALL, MINUS ALL
To retrieve set of rows from one table which are not in another
row set/table EXCEPT or MINUS can be used in conjunction
with SELECT statement
ALTER VIEWS ON tbl
MATERIALIZE REFRESH
To re-materialize all the materialized views which are based on a
particular table
The following are some limits at the table level
Limit Type Limit
Name/Column Name Length Maximum number of characters 128 bytes
Max Column Count Maximum number of columns per table: 1600
Distribution Keys Maximum number of distribution columns: 4
Char/Varchar Field length Maximum number of characters in char/varchar type: 64000
Row Size Max row size in a table: 65,535
Synonyms
Synonyms can be used to refer tables and views in the same database with a different name or refer to
objects in a remote database without the “database.schema” qualifier. The following are the supported
SQL statements for Synonyms
SQL Statement Description
CREATE SYNONYM name
FOR table/view
To create a synonym and it doesn’t verify whether the table or view
for which it is getting created exists or not. If the underlying object
is non-existent a runtime error will be thrown when user tries to
access the Synonym.
DROP SYNONYM name To drop an existing synonym
ALTER SYNONYM name To rename or change owner of an existing synonym
Views
Views are created to give a different perception of data like restricted table column or join of data from
two sets of data and/or to control access to the data in the database. The following are the supported
SQL statements for views
SQL Statement Description
CREATE OR REPLACE VIEW To create a new Netezza view
DROP VIEW To drop an existing Netezza view
ALTER VIEW To rename or change owner of an existing view
© asquareb llc 40
Materialized Views
The following are the supported SQL statements on Materialized Views
SQL Statement Description
CREATE OR REPLACE
MATERIALIZED VIEW
To create a new materialized view or replace an existing one
when the base table gets changed that impacts materialized view
DROP VIEW To drop an existing materialized view (same as normal views)
ALTER VIEW vw MATERIALIZE
REFRESH|SUSPEND
Use “suspend” option to suspend a view so that activites like
base table refresh or groom can run. Use “refresh” option to
activate a suspended materialized view. The “refresh” option is
also used to re-materialize the view so that the rows are in the
correct order.
A system default can be set to specify the percentage of rows in a materialized view that can be out of
order and the default value is 20%. Once set the command “ALTER VIEWS ON MATERIALIZE
REFRESH” will re-materialize all the views in the database which have the set percentage of rows which
are not in the correct order.
Functions
Netezza comes with many built in functions are the following are some of them under various
categories.
Category Functions
Aggregates rollup, cube, grouping sets, count, sum, max, min, avg
Casting to_char, to_date, to_number, to_timestamp, cast(value as type),::
Date Time extract(field from datetime value) where field = epoch, year, month, day,dow,doy
Checking null value nvl(), nvl2()
String Functions Trim, position, substring, lower, upper, like, not like, character_length
Fuzzy String Search le_dst, dle_dst
Value Functions current_date, current_time, current_timestamp, current_user, current_db
Math Functions acos, asin, atan, atan2, cos, cotx, degrees, pi, radiants, sin, tan, random, ceil, abs
This is just a sample list and other built in functions along with the details can be obtained from the
Netezza database user guide.
© asquareb llc 41
10. Stored Procedures
Stored procedures are programs written in NZPLSQL language to perform database operations. They
are stored in Netezza databases and runs on Netezza host when executed. NZPLSQL is an interpreted
language based in Postgres’ PL/pgSQL. The key advantages of using stored procedures are
 It helps avoid unnecessary network traffic between the appliance and the application requesting
for data which is a major advantage when dealing with large volumes of data in a ware housing
environment.
 By controlling the access restrictions users can be allowed to perform database operations
through stored procedures without providing access to underlying objects.
 By encapsulating business logic in stored procedures there is only one place to make
modifications i.e. helps maintenance
NZPLSQL along with regular SQL statements provides a complete programing language with
capabilities like conditional statement, looping functionality, declaring variables and manipulating it
through expressions, creating and calling sub routines etc. All regular SQL statements except the ones
which are not allowed to be with in a BEGIN/COMMIT block can be used in a stored procedure. NZ
SQL Procedures can accept multiple input variables and return one output either a scalar value or a
result set. The stored procedures can be created, altered, executed and dropped through NZSQL
command utility or through an ODBC, JDBC, OLE-DB connection to a database. The following is a
sample set of SQL statements with respect to stored procedures
/*
Stored procedure to update employee salary
*/
create or replace procedure update_emp_sal(int4) returns int4 execute as owner
language nzplsql as
begin_proc
declare
int4 ret_code;
-- Variable to store the first value passed as input to the stored procedure --
Inc_Percent alias for $1;
begin
update hr..emp set salary = salary+(salary*inc_percent*100);
raise notice ‘Updated employee salary successfully’;
-- Section to handle all unhandled exception --
exception when others the
raise notice ‘Error in updating employee salary’;
End;
End_proc;
-- To display the details about the stored procedure --
show procedure update_emp_sal(int4);
-- To execute the stored procedure --
exec procedure update_emp_sal(12);
-- Another way to execute the stored procedure --
call procedure update_emp_sal(-2);
-- To drop the stored procedure --
drop procedure update_emp_sal(int4);
Stored procedures are not case sensitive i.e. they get converted into capital letters irrespective of what
case letter is used. So if there are variables which are spelled the same but use different case letters, they
© asquareb llc 42
all are considered duplicates. Create or replace procedure statement replaces if a procedure with the same
name already in the database or creates a new one if none exists.
Admin user and database owner by default have all the permissions to perform actions regarding stored
procedures. Users with grant permission option also provide permissions to other users or groups to
alter, create or replace, drop, execute stored procedures. The following are some sample statements
-- grant permission to create stored procedure to the group devgrp --
grant create procedure to group devgrp;
-- grant all permissions regarding stored procedures to user mike --
grant all on procedure to mike;
-- revoke permission to create procedure from the group qagrp --
revoke create procedure from group qagrp;
-- revoke permission to alter on the procedure update_emp_sal(int4) from user tom --
revoke alter on update_emp_sal(int4) from tom;
-- provide execute permission on the procedure update_emp_sal(int4) to user sally --
grant execute on update_emp_sal(int4) to sally;
-- grant drop access on the procedure update_emp_sal(int4) to user ted --
grant drop on update_emp_sal(int4) to ted;
Note that a procedure is uniquely identified using the name along with the input parameters. Procedures
with the same name but with different input parameter lists are considered different stored procedures.
When a procedure is created it can be defined whether the procedure should execute with the authority
of the user who created it or with the authority of the user who is executing it. The “EXECUTE AS
OWNER” or “EXECUTE AS CALLER” parameter in the CREATE or ALTER procedure statement
determines the authority used during execution. By specifying that the procedure needs to be executed as
owner, even if the user executing the procedure doesn’t have the necessary authority on the objects
accessed by the caller of the procedure, the procedure will be executed successfully as long as the owner
who created the procedure has the necessary privileges on the objects. For e.g. if the emp table data can
be updated by only the user in the hrgroup and the stored procedure update_emp_sal(int4) is created by
someone from the hrgroup, the user sally will be able to execute the procedure successfully even if the
user is not part of the hrgroup. The reason is that the procedure was created with “execute as owner”
option and the owner has the privilege to update the table. If the standard is that only the user with the
necessary permissions on the underlying objects can only be able to manipulate data through a
procedure, then procedures can be created using “execute as caller” option.
-- alter procedure update_emp_sal(int4) to execute as caller --
alter procedure update_emp_sal(int4) execute as caller;
After the execution of this alter statement, if the user sally executes the procedure it will fail since the
user is not part of the hrgroup and doesn’t have the update privilege. When a stored procedure is
executed all the objects referred in the procedure will need to be available in the database where it is
getting executed unless the object is fully qualified. If an object from a remote database is used in the
procedure, data can only be queried from the object in the remote database and can’t be inserted,
updated or deleted.
When database backups are restored the stored procedures all get restored and can be without errors
unless the version of Netezza where the restore is done doesn’t support procedures or some of the
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers
Netezza fundamentals for developers

Weitere ähnliche Inhalte

Was ist angesagt?

Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Yury Velikanov
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)Girish Srivastava
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresJitendra Singh
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB plc
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overviewnomathjobs
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 

Was ist angesagt? (20)

Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 

Andere mochten auch

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 

Andere mochten auch (10)

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 

Ähnlich wie Netezza fundamentals for developers

Netezza fundamentals-for-developers
Netezza fundamentals-for-developersNetezza fundamentals-for-developers
Netezza fundamentals-for-developersTariq H. Khan
 
Xd planning guide - storage best practices
Xd   planning guide - storage best practicesXd   planning guide - storage best practices
Xd planning guide - storage best practicesNuno Alves
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkIRJET Journal
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET Journal
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Databricks
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryKristofferson A
 
Ssis Best Practices Israel Bi U Ser Group Itay Braun
Ssis Best Practices   Israel Bi U Ser Group   Itay BraunSsis Best Practices   Israel Bi U Ser Group   Itay Braun
Ssis Best Practices Israel Bi U Ser Group Itay Braunsqlserver.co.il
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10kaashiv1
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paperJethroData
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 

Ähnlich wie Netezza fundamentals for developers (20)

Netezza fundamentals-for-developers
Netezza fundamentals-for-developersNetezza fundamentals-for-developers
Netezza fundamentals-for-developers
 
Actian Vector Whitepaper
 Actian Vector Whitepaper Actian Vector Whitepaper
Actian Vector Whitepaper
 
Xd planning guide - storage best practices
Xd   planning guide - storage best practicesXd   planning guide - storage best practices
Xd planning guide - storage best practices
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
Nt1330 Unit 1
Nt1330 Unit 1Nt1330 Unit 1
Nt1330 Unit 1
 
Sql Server
Sql ServerSql Server
Sql Server
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
paper
paperpaper
paper
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 
Ssis Best Practices Israel Bi U Ser Group Itay Braun
Ssis Best Practices   Israel Bi U Ser Group   Itay BraunSsis Best Practices   Israel Bi U Ser Group   Itay Braun
Ssis Best Practices Israel Bi U Ser Group Itay Braun
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10
 
Ebook10
Ebook10Ebook10
Ebook10
 
Vectorization whitepaper
Vectorization whitepaperVectorization whitepaper
Vectorization whitepaper
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 

Mehr von Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Chef patterns
Chef patternsChef patterns
Chef patternsBiju Nair
 

Mehr von Biju Nair (8)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Chef patterns
Chef patternsChef patterns
Chef patterns
 

Kürzlich hochgeladen

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Netezza fundamentals for developers

  • 1. Netezza Fundamentals Introduction to Netezza for Application Developers Biju Nair 04/29/2014 Version: Draft 1.5 Document provided for information purpose only.
  • 2. © asquareb llc 1 Preface As with any subject, one may ask why do we need to write a new document on the subject when there is so much information available online. This question is more pronounced especially in a case like this where the product vendor publishes detailed documentation. I agree and for that matter this document in no way replaces the documentation and knowledge already available on Netezza appliance. The primary objective of this document is  To be a starting guide for anyone who is looking to understand the appliance so that they can be productive in a short duration  To be a transition guide for professionals who are familiar with other database management systems and would like to or need to start using the appliance  Be a quick reference on the fundamentals for professionals who have some experience with the appliance With the simple objective in focus, the book covers the Netezza appliance broadly so that the reader can be productive in using the appliance quickly. References to other documents have been provided for readers interested in gaining more thorough knowledge. Also joining the Netezza developer community on the web which is very active is highly recommended. If you find any errors or need to provide feedback, please notify to bnair at asquareb dot com with the subject “Netezza” and thank you in advance for your feedback.
  • 3. © asquareb llc 2 Table of Contents 1. Netezza Architecture ............................................................................................................................3 2. Netezza Objects ....................................................................................................................................9 3. Netezza Security..................................................................................................................................16 4. Netezza Storage ..................................................................................................................................17 5. Statistics and Query Performance ......................................................................................................22 6. Netezza Query Plan Analysis...............................................................................................................26 7. Netezza Transactions..........................................................................................................................31 8. Loading Data, Database Back-up and Restores ..................................................................................33 9. Netezza SQL ........................................................................................................................................38 10. Stored Procedures...........................................................................................................................41 11. Workload Management..................................................................................................................52 12. Best Practices..................................................................................................................................56 13. Version 7 Key Features....................................................................................................................58 14. Further Reading ..............................................................................................................................59
  • 4. © asquareb llc 3 1. Netezza Architecture Building a good foundation helps developing better and beautiful things. Similarly understanding the Netezza architecture which is the foundation helps develop applications which use the appliance efficiently. This section details the architecture at a level which will satisfy the objective to help use the appliance efficiently. Netezza uses a proprietary architecture called Asymmetric Massively Parallel Processing (AMPP) which combines the large data processing efficiency of Massively Parallel Processing (MPP) where nothing (CPU, memory, storage) is shared and symmetric multiprocessing to coordinate the parallel processing. The MPP is achieved through an array of S-Blades which are servers on its own running its own operating systems connected to disks. While there may be other products which follow similar architecture, one unique hardware component used by Netezza called the Database Accelerator card which is attached to the S-Blades. These accelerator cards can perform some of the query processing stages while data is being read from the disk instead of the processing being done in the CPU. Moving large amount of data from the disk to the CPU and performing all the stages of query processing in the CPU is one of the major bottlenecks in the many of the database management systems used for data warehousing and analytics use cases. The main hardware components of the Netezza appliance are a host which is a Linux server, which can communicate to an array of S-Blades each of which has 8 processor cores and 16 GB of RAM running Linux operating system. Each processor in the S-Blade is connected to disks in a disk array through a Database Accelerator card which uses FPGA technology. Host is also responsible for all the client interactions to the appliance like handling database queries, sessions etc. along with managing the meta- data about the objects like database, tables etc. stored in the appliance. The S-Baldes between themselves and to the host can communicate through a custom built IP based high performance network. The following diagram provides a high level logical schematic which will help imagine the various components in the appliance.
  • 5. © asquareb llc 4 The S-Blades are also referred as Snippet Processing Array or SPA in short and each CPU in the S- Blades combined with the Database Accelerator card attached to the CPU is referred as a Snippet Processor. Let us use a simple concrete example to understand the architecture. Assume an example data warehouse for a large retail firm and one of the tables store the details about all of its 10 million customers. Also assume that there are 25 columns in the tables and the total length of each table row is 250 bytes. In Netezza the 10 million customer records will be stored fairly equally across all the disks available in the disk arrays connected to the snippet processors in the S-Blades in a compressed form. When a user query the appliance for say Customer Id, Name and State who joined the organization in a particular period sorted by state and name the following are the high level steps how the processing will happen  The host receives the query, parses and verifies the query, creates the code to be executed to by the snippet processors in the S-Blades and passes the code for the S-Blades  The snippet processors execute the code and as part of the execution, the data block which stores the data required to satisfy the query in a compressed form from the disk attached to the snippet processor will be read into memory. The Database Accelerator card in the snippet processor will un-compress the data which will include all the columns in the table, then it will remove the unwanted columns from the data which in case will be 22 columns i.e. 220 bytes out of the 250 bytes, applies the where clause which will remove the unwanted rows from the data and passes the small amount of the data to the CPU in the snippet processor. In traditional databases all these steps are performed in the CPU.  The CPU in the snippet processor performs tasks like aggregation, sum, sort etc on the data from the database accelerator card and parses the result to the host through the network.  The host consolidates the results from all the S-Blades and performs additional steps like sorting or aggregation on the data before communicating back the final result to the client. The key takeaways are  The Netezza has the ability to process large volume of data in parallel and the key is to make sure that the data is distributed appropriately to leverage the massive parallel processing.  Implement designs in a way that most of the processing happens in the snippet processors; minimize communication between snippet processors and minimal data communication to the host.
  • 6. © asquareb llc 5 Based on the simple example which helps understand the fundamental components of the appliance and how they work together, we will build on this knowledge on how complex query scenarios are handled in the relevant sections. Terms and Terminology The following are some of the key terms and terminologies used in the context of Netezza appliance. Host: A Linux server which is used by the client to interact with the appliance either natively or through remote clients through OBDC, JDBC, OLE-DB etc. Hosts also store the catalog of all the databases stored in the appliance along with the meta-data of all the objects in the databases. It also passes and verifies the queries from the clients, generates executable snippets, communicates the snippets to the S- Blades, coordinates and consolidates the snippet execution results and communicates back to the client. Snippet Processing Array: SPA is an array of S-Blades with 8 processor cores and 16 GB of memory running Linux operating system. Each S-Blade is paired with Database Accelerator Card which has 8 FPGA cores and connected to disk storage. Snippet Processor: The CPU and FPGA pair in a Snippet Processing Array called a snippet processor which can run a snippet which is the smallest code component generated by the host for query execution. Netezza Objects The following are the major object groups in Netezza. We will see the details of these objects in the following chapters.  Users  Groups  Tables  Views  Materialized View  Synonyms  Database  Procedures and User Defined Function For anyone who is familiar with the other relational database management systems, it will be obvious that there are no indexes, bufferpools or tablespaces to deal with in Netezza. Netezza Failover As an appliance Netezza includes necessary failover components to function seamlessly in the event of any hardware issues so that its availability is more than 99.99%. There are two hosts in a cluster in all the Netezza appliances so that if one fails the other one can takes over. Netezza uses Linux-HA (High
  • 7. © asquareb llc 6 Availability) and Distributed Replicated Block Device for the host cluster management and mirroring of data between the hosts. As far as data storage is concerned, one third of every disk in the disk array stores primary copy of user data, a third stores mirror of the primary copy of data from another disk and another third of the disk is used for temporary storage. In the event of disk failure the mirror copy will be used and the SPU to which the disk in error was attached will be updated with the disk holding the mirror copy. In the event of error in a disk track, the track will be marked as invalid and valid data will be copied from the mirror copy on to a new track. If there are any issues with one of the S-Blades, other S-Blades will be assigned the work load. All failures will be notified based on event monitors defined and enabled. Similar to the dual host for high availability, the appliance also has dual power systems and all the connection between the components like host to SPA and SPA to disk array also has a secondary. Any issues with the hardware components can be viewed through the NZAdmin GUI tool. Netezza Tools There are many tools available to perform various functions against Netezza. We will look at the tools and utilities to connect to Netezza here and other tools will be detailed in the relevant sections. For Administrators one of the primary tools to connect to the Netezza is the NzAdmin. It is a GUI based tool which can be installed on a Windows desktop and connect to the Netezza appliance. The tool has a system view which it provides a visual snapshot of the state of the appliance including issues with any hardware components. The second view the tool provides is the database view which lists all the databases including the objects in them, users and groups currently defined, active sessions, query history and any backup history. The database view also provides options to perform database administration tasks like creation and management of database and database objects, users and groups. The following is the screen shot of the system view from the NzAdmin tool.
  • 8. © asquareb llc 7 The following is the screen shot of the database view from the NzAdmin tool. The second tool which is often used by anyone who has access to the appliance host is the “nzsql” command. It is the primary tool used by administrators to create, schedule and execute scripts to perform administration tasks against the appliance. The “nzsql” command invoke the SQL command interpreter through which all Netezza supported SQL statements can be executed. The command also has some inbuilt options which can be used to perform some quick look ups like list of list of databases, users, etc. Also the command has an option to open up an operating system shell through which the user
  • 9. © asquareb llc 8 can perform OS tasks before exiting back into the “nzsql” session. As with all the Netezza commands, the “nzsql” command requires the database name, users name and password to connect to a database. For e.g. nzsql –d testdb –u testuser –p password Will connect and create a “nzsql” session with the database “testdb” as the user “testuser” after which the user can execute SQL statements against the database. Also as with all the Netezza commands the “nzsql” has the “-h” help option which displays details about the usage of the command. Once the user is in the “nzsql” session the following are the some of the options which a user can invoke in addition to executing Netezza SQL statements. c dbname user passwd Connect to a new database d tablename Describe a table view etc d{t|v|i|s|e|x} List tablesviewsindexessynonymstemp tablesexternal tables h command Help on particular command i file Reads and executes queries from file l List all databases ! Escape to a OS shell q Quit nzsql command session time Prints the time taken by queries and it can be switched off by time again One of the third party vendor tools which need to be mentioned is the Aginity Workbench for Netezza from Aginity LLC. It is a GUI based tool which runs on Windows and uses the Netezza ODBC driver to connect to the databases in the appliance. It is a user friendly tool for development work and adhoc queries and also provides GUI options to perform database management tasks. It is highly recommended for a user who doesn’t have the access to the appliance host (which will be most of the users) but need to perform development work.
  • 10. © asquareb llc 9 2. Netezza Objects Netezza appliance comes out of the box loaded with some objects which are referred to as system objects and users can create objects to develop applications which are referred as user objects. In this section we will look into the details about the basic Netezza objects which every user of the appliance need to be aware of. System Objects: Users The appliance comes preconfigured with the following 3 user ids which can’t be modified or deleted from the system. They are used to perform all the administration tasks and hence should be used by restricted number of users. User id Description root The super user for the host system on the appliance and has all the access as a super user in any Linux system. nz Netezza system administrator Linux account that is used to run host software on Linux admin The default Netezza SQL database administrator user which has access to perform all database related tasks against all the databases in the appliance. Groups By default Netezza comes with a database group called public. All database users created in the system are automatically added as members of this group and cannot be removed from this group. The admin database user owns the public group and it can’t be changed. Permissions can be set to the public group so that all the users added to the system get those permissions by default. Databases Netezza comes with two databases System and a model database both owned by Admin user. The system database consists of objects like tables, views, synonyms, functions and procedures. The system database is primarily used to catalog all user database and user object details which will be used by the host when parsing, validating and creation of execution code for queries from the users. User Objects: Database Users with the required permission or an admin user can create databases using the create database sql statement. The following is a sample SQL to create a database called testdb and it can be executed in an nzsql session or other query execution tool. create database testdb;
  • 11. © asquareb llc 10 Table The database owner or user with create table privilege can create tables in a database. The following is a sample table creation statement which can be executed in an nzsql session or other query execution tool. create table employee ( emp_id integer not null, first_name varchar(25) not null, last_name varchar(25) not null, sex char(1), dept_id integer not null, created_dt timestamp not null, created_by char(8) not null, updated_dt timestamp not null, updated_by char(8) not null, constraint pk_employee primary key(emp_id) constraint fk_employee foreign key (dept_id) references department(dept_id) on update restrict on delete restrict ) distribute on random; Anyone who is familiar with other DBMS systems, the statement will look familiar except for the “distribute on” clause details of which we will see in a later section. Also there are no storage related details like tablespace on which the table needs to be created or any bufferpool details which are handled by the Netezza appliance. The following is the list of all the data types supported by the Netezza appliance which can be used in the column definitions of tables. Data Type Description/Value Storage byteint (int1) -128 to 127 1 byte smallint (int2) -32,768 to 32,767 2 bytes Integer (int or int4) -35,791,394 to 35,791,394 4 bytes bigint (int8) -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 8 bytes numeric(p,s) Precision p can range from 1 to 38 and scale from 0 to P p < 9 – 4 bytes 10< p<18–8 bytes 19<p<38–16 bytes numeric Same as numeric(18,0) 4 – 16 bytes decimal Alias to numeric 4 – 16 bytes float(p) Floating point number with precision p between 1 & 15 p < 7 – 4 bytes 6 < p < 16 – 8bytes real Alias for float(6) 4 bytes double precision Alias for float(15) 8 bytes Character(n)/char(n) Fixed length, blank padded to length n. The default value of n is 1. The maximum character string size is 64,000. If n is equal to 16 or less then n bytes. If n is greater than 16, disk usage is the same as varchar (n). character varying(n) /varchar(n) Variable length to a maximum length of n. No blank padding, stored as entered. The maximum character string size is 64,000. N+2 or fewer bytes depending on the actual data. nchar(n) Fixed length unicode, blank padded to length n. The maximum length of 16,000 characters.
  • 12. © asquareb llc 11 Data Type Description/Value Storage nvarchar(n) Variable length unicode to a maximum length of n. The maximum length of 16,000 characters. Boolean / bool With value true (t) or false (f). 1 byte date Ranging from January 1, 0001, to December 31, 9999. 4 bytes timestamp Date part and a time part, with seconds stored to 6 decimal positions. Ranging from January 1, 0001 00:00:00.000000 to December 31, 9999 23:59:59.999999. 8 bytes Refer the reference document for additional data types and details. Other than validating the data to be inconsistent with the column data type and the not null constraint, Netezza doesn’t enforce any of the constraints like the primary key or foreign key when inserting or loading data into the tables for performance reasons. It is up to the application to make sure that these constraints are satisfied by the data being loaded into the tables. Even though the constraints are not enforced by Netezza defining them will provide additional hints to the query optimizer to generate efficient snippet execution code which in turn helps performance. Users can also create temporary table in Netezza which will get dropped at the end of the transaction or session in which the temp table is created. The temporary table can be created by adding the “temporary or temp” clause as part of the create table statement and the statement can include all the other clauses applicable for the creation of a regular table. The following is an example of creating a temporary table create temporary table temp_emp( id integer constraint pk_emp primary key, first_name varchar(25) ) distribute on hash(id); Another way a user can create a table is to model the new table based on a query result. This can be accomplished in Netezza using the create table as command and are referred as CTAS tables in short. The following are some sample CTAS statements create table ctas_emp as select * from emp where dept_id = 1; create table ctas_emp_dept as select emp.name, dept.name from emp, dept where emp.dept_id = dept.id; Since the new table definition is based on the query result which is executed as part of CTAS statement, there as many possibilities for which the users can use this like redefining the columns in a current table while populating the new table at the same time. If the user doesn’t want to populate data but create only the table structure a “limit 0” can be included at the end of the query as in the following example. create table ctas_emp as select * from emp limit 0; Tables can be deleted from the database using the DROP sql statement which drops the table and its content. The following is an example
  • 13. © asquareb llc 12 drop table employee; Once tables are created they can be modified using the alter statement. Renaming the table, changing the owner of the table, adding or dropping a new column, renaming a column, adding or dropping a constraint are some of the modifications which can be made through the alter statement. The following are sample alter statements which can be executed through nzsql session or a query execution tool. alter table employee add column education_level int1; alter table employee modify column (first_name varchar(30)); alter table employee drop column updated_by; alter table employee owner to hruser; The following are some of the points to be aware of when using the alter table statement.  Modifying the column length is only applicable to columns defined as varchar.  If a table gets renamed the views attached to the table will stop working  Column data types can’t be changed through an alter statement  If a table is referenced by a stored procedure adding or dropping a column is not permitted. The stored procedure needs to be dropped first before adding or dropping a column and then the stored procedure needs to be recreated. View The database owner or user with create view privilege can create views in a database. Netezza stores the select SQL statement for the view and doesn’t materialize the data and store the data on disk. The SQL gets executed pulling the data from the base table whenever a user accesses the view creating a virtual table. The following is a sample view creation statement which can be executed in an nzsql session or other query execution tool. create or replace view it_employees as select * from employee where dept_id = 100; Materialized View Materialized views are created to project a subset of columns from a table and data sorted on some of the projected columns. When created the system materializes and stores the sorted projection of the base table’s data in a new table on the disk. Materialized views can be queried directly or can be used to improve performance against the base table. In the latter case, the materialized view acts like an index since the system stores an additional column in the view with the details about from where in the base table that data originated. A database owner or user with create materialized view privilege can create materialized view in a database. The following is a sample materialized view creation statement which can be executed in an nzsql session or query execution tools.
  • 14. © asquareb llc 13 create materialized view employee_mview as select emp_id, first_name, last_name, dept_id from employee order by dept_id, first_name; This materialized view will improve the performance of the following query against the base table since the view stores where in the base table the data satisfying the query is stored and in turn acts like an index in a traditional database. select * from employee where dept_id = 100 and first_name = ‘John’; The following are the restrictions in the creation of materialized views  Only one table can be specified in the FROM clause of the create statement  There can be no where clause in the select clause of the create statement  The columns in the projection list must be columns from the base table and no expressions  The columns in the ORDER BY clause should be one of the columns in the select statement  NULLS LAST or DESC cannot be used in the ORDER BY clause  External, temporary, system or clustered base tables can’t be used as base table for materialized views As and when records are inserted into the base table the system also adds data to the materialized view table. But the new data getting inserted will not be in the sorted order and hence there will be some data in the materialized data which are not in the sorted order. In order to get the materialized view in sorted order, the views need to refreshed periodically or threshold of unsorted data as a percentage of total records can be set so that the system performs the refresh when the percentage of unsorted records in the table exceeds the threshold. Synonym A synonym is an alternate way of referencing tables and views. It allows users to create easy to type names for long table or view names. An admin user or any user with create synonym privilege can create synonyms in a database and the following is a sample statement to create synonym through an nzsql session or other query execution tool. create synonym techies for it_employees; Synonyms can be created against a non-existent table or view since it is not verified during creation. But if the table or view referred by a synonym is not existent during runtime an error message will be returned. Synonyms cannot be created for temporary tables, remote databases or other synonyms. Sequence
  • 15. © asquareb llc 14 Sequences are database objects through which users can generate unique numbers and can then be used in an application like creation of unique keys. The following is a sample sequence creation statement which can be used to populate the id column in the employee table. create sequence seq_emp_id as integer start with 1 increment by 1 minvalue 1 no maxvalue no cycle; Since no max value is used, the sequence will be able to hold up to the largest value of the sequence type which in this case is 35,791,394 for integer type. Also in this case once the sequence reaches its maximum value it doesn’t start reusing the old values since no cycle option is used to prevent reuse. Values generated by sequences can be accessed using “next value for sequence” and following are some examples. select next value for seq_emp_id; select *, next value for seq_emp_id from emp; Few points to note about using sequences  No two users of a sequence will ever get the same value from a sequence while requesting for the next value  When a transaction using a sequence rolls back the value read from the sequence will not be rolled back i.e. there can be gaps in the sequence number due to transaction rollbacks  Netezza appliance assigns sequence number ranges to all SPUs available in the system which gets cached for performance reasons. That would mean the sequences generated by the various SPUs will create gaps in the sequence numbers at any point in time since the ranges will not be over lapping.  System will be forced to flush cached values of sequences in situations like stopping of the system, system or SPU crashes or during some alter sequence statements which will also create gaps in the sequence number generated by a sequence. Existing sequences can be modified using “alter sequence” statement. Some of the modifications are changing the owner, renaming the sequence, restarting the sequence with a new start value, change the increment value or max value and whether the sequence should or should not reuse the old values once it has reached its max value. The following is a sample “alter sequence” statement. alter sequence seq_emp_id increment by 2; Also an existing sequence can be dropped using a “drop sequence” statement and the following is an example. drop sequence seq_emp_id;
  • 16. © asquareb llc 15 Procedures We will look into the details about stored procedures when we discuss about application development. Functions We will look into the details about user defined functions when we discuss about application development.
  • 17. © asquareb llc 16 3. Netezza Security This section deals with security in the context of user access to the appliance. In Netezza there are two levels of access controls, one at the host OS level and second at the Netezza database level. By setting the required access restrictions at these two levels, the appliance can be secured effectively. OS Level Security Netezza host uses industry standard Linux operating system customized for performance and functionality required for the appliance. As in any Linux installations, user access restrictions need to be put in place using user ids, user groups and passwords. Out of the box, the appliance is configured with a “root” user id which is the Linux super user and the user id “nz” which is the Netezza system administrator id which is used to run Netezza on the host. The “root” user id can be used to create other user ids for users who need to access the appliance natively through the host command shell. Since the host access is required to perform very restricted tasks primarily administration tasks, the number of user ids created to access the appliance should be fairly small. Restrictions on what users can perform can be set by creating Linux user groups with different access restrictions and attaching the relevant users to the groups. Setting password selection rules like mix of alphabets, numbers, special characters, minimum password length etc. along with password expiry for users is a good practice. Database Level Security Access to databases is controlled using user ids and passwords which are separate from the OS level user id and password. If an user need to be able to access to a Netezza database natively through a “nzsql” session on the host, the user need to use a OS level user id and password to log in to the host and then need to invoke the “nzsql” command using the database level user id and password which has access to the particular database of interest. Access to databases, objects with in a database and the type of activities which can be performed on them are all controlled by the privileges granted to the user id to perform the task. Netezza also supports user groups as with the Linux operating system where privileges can be assigned to groups and similar users can be attached to the group so that it is easier to manage access to databases. When a user id is attached to more than one group the user id gets combination of all the privileges assigned to the groups to which the user id is attached to. The following is a sample “create user” statement create user dbdev with password ‘zksrfas92834’; The following is a sample “create group” statement create group devgrp with user dbdev, scott;
  • 18. © asquareb llc 17 4. Netezza Storage As discussed earlier, each disk in the appliance is partitioned into primary, mirror and temp or swap partitions. The primary partition in each disk is used to store user data like database tables, the mirror stores a copy of the primary partition of another disk so that it can be used in the event of disk failures and the temp/swap partition is used to store the data temporarily like when the appliance does data redistribution while processing queries. The logical representation of the data saved in the primary partition of each disk is called the data slice. When users create database tables and loads data into it, they get distributed across the available data slices. Logical representation of data slices is called the data partition. For TwinFin systems each S-Blade or SPU is connected to 8 data partitions and some only to 6 disk partitions (since some disks are reserved for failovers). There are situations like SPU failures when a SPU can have more than 8 partitions attached to it since it got assigned some of the data partitions from the failed SPU. The following diagram illustrates this concept The SPU 1001 is connected to 8 data partitions numbered 0 to 7. Each data partition is connected to one data slice stored on different disks. For e.g., the data partition 0 points to the data slice 17 stored on the disk with id 1063. The disk 1063 also stores the mirror of the data slice 18 stored on disk 1064. The following diagram illustrates what happens when the disk 1070 fails. Immediately after the disk 1070 stops responding, the disk 1069 will be used by the system to satify queries for which data is required from data slice 23 and 24. Disk 1069 will serve the requests using the data in both its primary and mirror partition. This will also create a bottleneck which inturn impacts query performances. In the meantime, the contents in disk 1070 are regenerated on one of the spare
  • 19. © asquareb llc 18 disks in the disk array which in this case is disk 1100 using the data in disk 1069. Once the regen is complete the SPU data partition 7 is updated to point to the data slice 24 on disk 1100. The regen process removes the bottleneck of disk 1069 to perform optimally. In the situation where a SPU fails, the appliance assigns all the data partitions to other SPUs in the system. Pairs of disks which contains the mirror copy of each others data slice will be assigned to other SPUs which will result in additional two data partitioned to be managed by the target SPU. If for e.g. if a SPU currently manages data partitions 0 to 7 and if the appliance reassings two data partitions from a failed SPU, the SPU will have 10 data partitions to manage and it will be numbered from 0 to 9. Data Organization When users create tables in databases and store data into it, data gets stored in disk extents which is the minimum storage allocated on disks for data storage. Netezza distributes the data in data extents across all the available data slices based on the distribution key specified during the table creation. A user can specify upto four columns for data distribution or can specify the data to be distributed randomly or none at all during the table creation process. If an user provides no distribution specification, Netezza uses one of the columns to distribute the data and the selection of which cannot be influenced. When the user specifies particular column for distribution then Netezza uses the column data to distribute the records being inserted across the dataslices. Netezza uses hashing to determine the dataslice into which the record needs to be stored. When the user selects random as the option for data distribution, then the appliance uses round robin algorithm to distribute the data uniformly across all the available dataslices. The following are some sample table create statements using the distribute clause. create table employee ( emp_id integer not null, first_name varchar(25) not null, last_name varchar(25) not null, sex char(1), dept_id integer not null, created_dt timestamp not null, created_by char(8) not null, updated_dt timestamp not null, updated_by char(8) not null, constraint pk_employee primary key(emp_id) constraint fk_employee foreign key (dept_id) references department(dept_id) on update restrict on delete restrict ) distribute on hash(emp_id); create table nation_state ( ns_id integer not null constraint pk_nation_state primary key, nation varchar(25), state_code varchar(5), state varchar(25) ) distribute on random; The key is to make sure that the data for a table is uniformly distributed across all the data slices so that there are no data skews. By distributing data across the data slices, all the SPUs in the system can be utilized to process any query and in turn improves performance. Also the performance of any query depends on the slowest SPUs handling the query. So if the data is not uniformly distributed then some
  • 20. © asquareb llc 19 of the SPUs will have more data in the data slice attached to it called data skew which will impact the processing time and in turn the overall query performance. Selecting columns with high cardinality for the distribution is a good practice to follow. Even if a column with high cardinality like a date column is chosen to distribute data, there is a possibility of creating processing skew. For e.g. using the date column as the distribution key, the data gets distributed fairly evenly across all the data slices. But if most of the queries are looking for data for a particular month which is fairly often in a data warehousing environment, then only a particular set of data slices may need to be processed by the appliance which in turn will only utilize a subset of SPUs causing the query performing sub optimally. This is called processing skew and needs to be prevented by understanding the processing requirements and choosing the correct distribution keys. When creating CTAS tables the following pattern is followed in terms of how the data distribution will happen in the new table created when an explicit distribution criteria is not defined  If a CTAS table is created on a single source table and the source table is randomly distributed, then the CTAS table will be distributed in the first column in the CTAS table.  If a CTAS table is created on a single source table with defined distribution on certain columns, the CTAS table will inherit the source tables distribution  If a CTAS table is created by joining two tables, the distribution key of the resulting CTAS table will be the join key of the two tables  If a CTAS table is created by joining multiple tables and a group by clause, the distribution key of the resulting CTAS table will be the keys in the group by clause. Zone Maps When table data gets stored in extents on disk, Netezza keeps track of the minimal value and the maximum value of columns of certain data types in data structures called zone maps. The zone maps are created and maintained automatically for all the columns in the table which is of following data types.  All Integer types (int1, int2, int4, int8)  Date  Timestamp By keeping track of the min and max values, Netezza will be able to avoid reading disk extends which will be most of the disk extends in a large data warehouse environment. For e.g. if one of the fact table stores one million records per month for the last 10 years which is more than a quarter billion records and if the queries process only a month’s data, Netezza will be able to read only the one million record. And if there are 96 snippet processors and if the data is uniformly distributed across all the data slices, then the amount of data read into the snippet processors will be a little over 100,000 records which is small to process. Enabling the appliance to utilize the zone map feature by selecting the best data types for columns in the tables is another key point to take into consideration during design. When two tables are joined together often like a customer table and order table, the distribution key selection of the two tables can play an important role in the performance of the queries. If the
  • 21. © asquareb llc 20 distribution key is on the join column, for e.g. customer id column in both the customer and order table, the data distribution will result in the records with the same customer id values ending up in the same data slice for both the tables. When a query joining the table is being processed, since the matching data from both the tables are in the same data slice the snippet processor will be able to perform the join locally and the send the result without performing additional work which in turn improves the performance of the query. If the tables are not distributed on the columns often used for join, matching data from both the tables will end up in different data slices which means, the snippet processor needs to perform additional work to satisfy the join. The appliance will choose to temporarily redistribute one of the tables on the join column if the other is already distributed on the join column and then the snippet processors can perform the join locally. If both the tables are not distributed on the join column, then the appliance may redistribute both the tables before the snippet processors can perform the join. If a table stored relatively small number of records then Netezza can decide to broadcast the whole table to all the SPUs so that each one has its own copy for processing. What this means is the host needs to consolidate the table data from all the data slices and send it across to all the SPUs the complete table data. Clustered Base Tables (CBT) Along with the option to distribute data during table definition, Netezza also provides an option to organize the distributed data with in a data slice. For e.g., we may have distributed employee table on employee id but wanted to have employee records from the same department to be stored closely together, the column dept id in the table can be specified in the create table statement as in the following example create table employee ( emp_id integer not null, first_name varchar(25) not null, last_name varchar(25) not null, sex char(1), dept_id integer not null, created_dt timestamp not null, created_by char(8) not null, updated_dt timestamp not null, updated_by char(8) not null, constraint pk_employee primary key(emp_id) constraint fk_employee foreign key (dept_id) references department(dept_id) on update restrict on delete restrict ) distribute on hash(emp_id) organize on(dept_id); Netezza allows up to four columns to organize on. When data gets stored on the data slice, records with the same organize on column values will get stored in the same or close by extends. Organize on improves queries on fact tables when they are defined with the frequently joined columns to organize the data. All the columns specified in the organize on clause are zone mapped and by knowing the range of values of these columns stored in each physical extent, Netezza can eliminate reading unwanted extents during a join query which improves the query performance. Zone mapping of additional data types are supported when columns are specified in organize on clause and the following is the list of data types including data types which are by default zone mapped.
  • 22. © asquareb llc 21 Default zone map data types  Integer - 1-byte, 2-byte, 4-byte, and 8-byte  Date  Timestamp Additional data types which can be zone mapped due to organize on clause  Char - all sizes, but only the first 8 bytes are used in the zone map  Varchar - all sizes, but only the first 8 bytes are used in the zone map  Nchar - all sizes, but only the first 8 bytes are used in the zone map  Nvarchar - all sizes, but only the first 8 bytes are used in the zone map  Numeric - all sizes up to and including numeric(18)  Float  Double  Bool  Time  Time with timezone  Interval Organize on column definitions can be modified using the table alter statement. But any columns included in organize on clause cannot be dropped by the table. If a table is altered to be a clustered base table, any new records inserted into the table will be organized appropriately. In order to organize the existing records in the table, “GROOM TABLE” needs to be executed to take advantage of the data reorganization by queries. It is a good practice to have fact tables defined as clustered base tables with data organized on often joined columns to improve multi-dimensional lookup. At the same time care needs to be taken on the data organization columns by understanding the often executed queries and also minimizing the number of columns on which the data needs to be organized on. Compared to traditional indexes or using materialized views to improve performance, CBTs have a major advantage in terms of not using additional space but organizing the table data in place. One point to note is that by changing the data organization, it will impact the compression which may result in the increase or decrease in the size of the table storage after converting a table to a CBT.
  • 23. © asquareb llc 22 5. Statistics and Query Performance Netezza query optimizer relies on the statistics from catalog to come up with an optimal query execution plan. So it is imperative that the statics need to be kept current without which the execution plan generated by the optimizer may be sub-optimal resulting in poor query performance. The following are the statistics the optimizer relies on Object Type Statistics Table Number of records in the tables used in the query Table columns Minimum value in the column Maximum value in the column Count of null value in the column Count of distinct values in the column (dispersion or cardinality) Some examples of how the statistics can be used by the optimizer  If a column is null able additional code may be generated to check whether the value is null or not  Knowing the number of records in the table, min and max values along with the number of distinct values can help estimate the number of relevant records which can be returned for the query assuming there is uniform distribution  Based on the min and max values the optimizer can determine the type of math required to be performed like 64 or 128 bit computation. These statistics in the catalog are generated using the “GENERATE STATISTICS” command which collects them and updates the catalog. Admin users, table owners and users who have the GENSTATS privilege can execute the generate statistics command on tables in databases. The following are some examples generate statistics; -- Generates statistics on all tables in a database – generate statistics on emp; -- Generates statistics on table emp – generate statistics on emp(emp_id, dept_id);-- Generates statistics on specified columns – Since generate statistics reads every record in the table to generate the statistics, the data is of very high accuracy. Netezza automatically maintains the statistics which may be of lesser accuracy compared to what is generated by the generate statistics command. The following table lists the various commands and the type of statistics automatically maintained by Netezza. Even though the statistics may not be very accurate it is better than not having any Command Row Count Min/Max vals Null count Distinct vals Zone Map Create Table As Y Y Y Y Y Insert Y Y N N Y Update Y Y N N Y Delete N N N N N Groom N N N N Y
  • 24. © asquareb llc 23 Note that the groom command just removes the rows deleted logically; there is no change in statistics except for the zone map values. The following are some recommended scenarios when generate statistics needs to be executed for performance  Significant changes to the database in terms of data  Static tables intended for a long time  Object in queries where there is three or more joins  Slower query response after changes to tables  On columns used in where, order by, group by and having clauses  When temporary table used in a join contains large volumes of data Netezza also computes just in time statistics using sampling under the following conditions. This is not a replacement to running generate statistics and also not accurate due to the sampling.  Query has tables that stores more than 5 million records  Query has at least one column restriction like where clause  Only if the query involves user tables  Tables that participate in a query join or if a table has an associated materialized view Along with sampling the JIT statistics collection process utilize zone maps to collect several pieces of information like  Number of rows to be scanned for the target table  Number of extents that need to be scanned for the target table  Max number of extents that need to be scanned on the data slices with the greatest skew  Number of rows to be scanned for all the tables in each join  Number of unique values in target table columns involved in join or group by processing Note that when CTAS tables are created, Netezza schedules generate statistics to collect all required statistics for the new table. This automatic scheduling of generate statistics can be influenced by the following two postgresql.conf file settings.  enable_small_ctas_autostats enables or disables auto stats generation on small CTAS tables  ctas_auto_min_rows specifies the threshold number of rows above which the stats generation is scheduled and the default value is 10000. Groom When data is updated in Netezza, the current record is marked for deletion and a new record is created with updated values instead of updating the current record. Also when a record in a table is deleted, it is not physically removed from storage instead the record is marked as logically deleted and will not be visible for future transactions. Similarly when a table is altered to add or drop a column a new version of
  • 25. © asquareb llc 24 table is created and queries against the table are serviced by using the different versions. Due to the logical deletes which results in additional data in the tables and joins on versions of altered tables, query performance get impacted. In order to prevent the query performance degradation, users of Netezza appliance need to “GROOM” the table. The groom command is used to maintain user tables and  Reclaiming physical space by removing logically deleted rows in tables  Migrate records from previous versions of tables into the current version and leave with only one version of the altered table  Organizing tables according to the organize on columns defined in a alter table command are some of the key functions of the groom command. By default groom synchronizes with the latest backup set and what that means is logically deleted data will not be removed if that data is not backed up until up to the latest backup set. What that means is it is best to schedule grooms after database backups so that the database objects are kept primed. The following are good practices with groom  Groom tables that often receive large updates and deletes  Schedule “Generate Statistics” after groom on tables so that the stats are up to date and accurate  If there is a need to remove all the records in a table use truncate instead of delete so that there will be no need for a groom since truncate will reclaim the space The following are some example groom commands groom table emp ; -- to delete logically deleted rows and reclaim space -- groom table emp versions; -- to remove older versions of an altered table and copy rows to new – groom table emp records ready; -- to organize only the records which are ready to be organized – Note that when groom is getting executed, it doesn’t lock the tables and users will be able to access to perform data manipulation on the table data. Explain In order to understand the plan the Netezza appliance will use to execute a query, users can use the explain SQL statement. Understanding the plan will also provide an opportunity for the users to make any changes to the query or the physical data structure of the tables involved to make the query perform better. The following are some of the sample “explain” statements. explain select * from emp; explain distribution select emp.name, dept.name from emp, dept where emp.dept_id = dept.id; explain plangraph select * from emp; The output from the explain statement can be generated in verbose format or graphical format which can be opened in a web browser window. The output details all the snippets which will be executed in
  • 26. © asquareb llc 25 sequence on SPUs and also the code which will be executed on the host. The details include the type of activity like table scan, aggregation etc., the estimated number of rows, the width of the rows, the cost involved to execute that snippet since the Netezza query optimizer is a cost based optimizer, the object involved, the confidence on the estimation etc. The following is a sample explain output QUERY VERBOSE PLAN: Node 1. [SPU Sequential Scan table "OWNER" as "B" {}] -- Estimated Rows = 16, Width = 11, Cost = 0.0.. 0.0, Conf = 100.0 Projections: 1:B.OWNER_NAME 2:B.OWNER_ID [SPU Broadcast] [HashIt for Join] Node 2. [SPU Sequential Scan table "ORDER" as "A" {}] -- Estimated Rows = 454718, Width = 8, Cost = 0.0 .. 8.1, Conf = 80.0 Restrictions: (A.OWNER_ID NOTNULL) Projections: 1:A.ORDER_ID 2:A.OWNER_ID Node 3. [SPU Hash Join Stream "Node 2" with Temp "Node 1" {}] -- Estimated Rows = 519678, Width = 23, Cost = 0.0 .. 19.7, Conf = 64.0 Restrictions: (A.OWNER_ID = B.OWNER_ID) Projections: 1:A.APPLICATION_ID 2:B.OWNER_NAME Cardinality: B.ORDER_ID 9 (Adjusted) [SPU Return] [Host Return] QUERY PLANTEXT: Hash Join (cost=0.0..19.7 rows=519678 width=23 conf=64) {} (spu_broadcast, locus=spu subject=rightnode-Hash) (spu_join, locus=spu subject=self) (spu_send, locus=host subject=self) (host_return, locus=host subject=self) l: Sequential Scan table "A" (cost=0.0..8.1 rows=454718 width=8 conf=80) {} (xpath_none, locus=spu subject=self) r: Hash (cost=0.0..0.0 rows=16 width=11 conf=0) {} (xpath_none, locus=spu subject=self) l: Sequential Scan table "B" (cost=0.0..0.0 rows=16 width=11 conf=100) {} (xpath_none, locus=spu subject=self)
  • 27. © asquareb llc 26 6. Netezza Query Plan Analysis As with most database management systems, Netezza generates a query plan for all queries executed in the system. The query plan details optimal execution path as determined by Netezza to satisfy each query. The Netezza component which generates and determines the optimal query path from the available alternatives is called the query optimizer and it relies on the number of data available about the database objects involved in the query executed. The Netezza query optimizer is a cost based optimizer i.e. the optimizer determines a cost for the various execution alternatives and determines the path with the least cost as the optimal execution path for a particular query. Optimizer The optimizer relies on number of statistics about the objects in the query executed.  Number of rows in the tables  Minimum and maximum values stored in columns involved in the query  Column data dispersion like unique value, distinct values, null values  Number of extends in each tables and the total number of extends on the data slice with the largest skew Given that the optimizer relies heavily on the statistics to determine the best execution plan, it is important to keep the database statistics up to date using the “GENERATE STATISTICS” commands. Along with the statistics the optimizer also takes into account the FPGA capabilities of Netezza when determining the optimal plan. When coming up with the plan for execution, the optimizer looks for  The best method for data scan operations i.e. to read data from the tables  The best method to join the tables in the query like hash join, merge join, nested loop join  The best method to distribute data in between SPUs like redistribute or broadcast  The order in which tables can be joined in a query join involving multiple tables  Opportunities to rewrite queries to improve performance like o Pull up of sub-queries o Push down of tables to sub-queries o De-correlation of sub-queries o Expression rewrite Query Plan Netezza breaks query execution into units of work called snippets which can be executed on the host or on the snippet processing units (SPU) in parallel. Queries can contain one or many snippets which will be executed in sequence on the host or the SPUs depending on what is done by the snippet code. Scanning and retrieving data from a table, joining data retrieved from tables, performing data aggregation, sorting of data, grouping of data, dynamic distribution of data to help query performance are some of the processes which can be accomplished by the snippet code generated for query execution.
  • 28. © asquareb llc 27 The following is a sample query plan which shows the various snippets executed in sequence, what they accomplish, in what order the snippets are executed and where they get executed when the query is run. QUERY VERBOSE PLAN: Node 1. [SPU Sequential Scan table "ORGANIZATION_DIM" as "B" {}] -- Estimated Rows = 22193, Width = 4, Cost = 0.0 .. 843.4, Conf = 100.0 Restrictions: ((B.ORG_ID NOTNULL) AND (B.CURRENT_IND = 'Y'::BPCHAR)) Projections: 1:B.ORG_ID Cardinality: B.ORG_ID 22.2K (Adjusted) [HashIt for Join] Node 2. [SPU Sequential Scan table "ORGANIZATION_FACT" as "A" {}] -- Estimated Rows = 11568147, Width = 4, Cost = 0.0 .. 24.3, Conf = 90.0 [BT: MaxPages=19 TotalPages=1748] (JIT-Stats) Projections: 1:A.ORGANIZATION_ID Cardinality: A.ORGANIZATION_ID 22.2K (Adjusted) Node 3. [SPU Hash Join(right exists) Stream "Node 2" with Temp "Node 1" {}] -- Estimated Rows = 11568147, Width = 0, Cost = 843.4 .. 867.7, Conf = 57.6 Restrictions: (A.ORGANIZATION_ID = B.ORG_ID) AND (A.ORGANIZATION_ID = B.ORG_ID) Projections: Node 4. [SPU Aggregate] -- Estimated Rows = 1, Width = 8, Cost = 930.6 .. 930.6, Conf = 0.0 Projections: 1:COUNT(1) [SPU Return] [HOST Merge Aggs] [Host Return] The Netezza snippet scheduler determines which snippets to add to the current work load based on the session priority, estimated memory consumption versus available memory, and estimated SPU disk time. Plan Generation When a query is executed, Netezza dynamically generates the execution plan and C code to execute the each snippet of the plan. The recent execution plans are stored in the data.<ver>/plan directory under the /nz/data directory. The snippet C code is stored under the /nz/data/cache directory and this code is used to compare against the code for new plans so that the compilation of the code can be eliminated if the snippet code is already available. Apart from Netezza generating the execution plan dynamically during query execution, users can also generate the execution plan (without the C snippet code) using the “EXPLAIN” command. This process will help users identify any potential performance issues by reviewing the plan and making sure that the path chosen by optimizer is inline or better than expected. During the plan generation process, the optimizer may perform statistics generation dynamically to prevent issues due to out of statistics data particularly when the tables involved store large volume of data. The dynamic statistics generation process uses sampling which is not as perfect as generating statistics using the “GENERATE Snippet 1 reading data from table “B” and this runs on all the SPUs Snippet 2 reading data from table “A” and this runs on all the SPUs Snippet 3 performing a hash join on the data returned from snippet 1 & 2. Runs on all SPUs Snippet 4 performs count of the number of rows from the query snippet 3. Runs on all SPUs Each SPU returns the count to host which aggregates the numbers and return the total count to user Table from which data is retrieved in this snippet
  • 29. © asquareb llc 28 STATISTICS” command which scans the tables. The following are some variations of the “EXPLAIN” command EXPLAIN [VERBOSE] <SQL QUERY>; //Detailed query plan including cost estimates EXPLAIN PLANTEXT <SQL QUERY>; //Natural language description of query plan EXPLAIN PLANGRAPH <SQL QUERY>; //Graphical tree representation of query plan Data Points in Query Plan Following are some of the key data points in the query plan which will help identify any performance issues in queries … Node 2. [SPU Sequential Scan table "ORGANIZATION_FACT" as "A" {}] -- Estimated Rows = 10007, Width = 4, Cost = 0.0 .. 24.3, Conf = 54.0 [BT: MaxPages=19 TotalPages=1748] (JIT-Stats) Projections: 1:A.ORGANIZATION_ID Cardinality: A.ORGANIZATION_ID 22.2K (Adjusted) … If the estimated number of rows is less than what is expected, it is an indication that the statistics on the table is not current. It will be good to schedule “generate statistics” against the table to get the current statistics. If the estimated cost is high it is a good candidate to look further and make sure that the higher cost is not due to query coding inefficiency or database object definitions. QUERY VERBOSE PLAN: Node 1. [SPU Sequential Scan table "ORDERS" as "O" {(O.O_ORDERKEY)}] -- Estimated Rows = 500000, Width = 12, Cost = 0.0 .. 578.6, Conf = 64.0 Restrictions: ((O.O_ORDERPRIORITY = ‘HIGH’::BPCHAR) AND (DATE_PART('YEAR'::"VARCHAR", O.O_ORDERDATE) = 2012)) Projections: 1:O.O_TOTALPRICE 2:O.O_CUSTKEY Cardinality: O.O_CUSTKEY 500.0K (Adjusted) [SPU Distribute on {(O.O_CUSTKEY)}] [HashIt for Join] … Each SPU executing the snippet retrieves only the relevant columns from the tables and applies the restrictions using FPGA reducing the data to be processed by the CPU in the SPU which in turn helps with the performance. The columns retrieved and the restrictions applied are detailed in the projections section of the execution plan. The optimizer may decide to redistribute data retrieved in a snippet if the tables joined in the query are not distributed on the same key value as in this snippet. If large volume of data is being distributed by a snippet like redistribution of a fact table, it is a good candidate to check whether the table is physically distributed on optimal keys. Be aware that if the snippet is projecting a small set of columns from a table with many columns, when it redistributes the amount of data will be small – number of rows * the columns projects. If the amount of data is small, redistributing dynamically by a snippet execution may not be a major overhead since it is done using the high performing/high bandwidth IP based network between the SPUs without the involvement of the NZ host. Number of rows estimated to be returned by this snippet This is the confidence in percentage on the estimated cost This is the estimated cost of this query snippet The restrictions (where clause) applied to the data read from the table in this snippet Relevant table columns retrieved in this snippet. The other columns are discarded The resulting data is redistributed to all SPUs with CUSTKEY as distribution key The CUSTKEY value is hashed so that it can be used to perform a hash join
  • 30. © asquareb llc 29 … Node 2. [SPU Sequential Scan table "STATE" as "S" {(S.S_STATEID)}] -- Estimated Rows = 50, Width = 100, Cost = 0.0 .. 0.0, Conf = 100.0 Projections: 1:S.S_NAME 2:S.S_STATEID [SPU Broadcast] [HashIt for Join] … Data from a snippet can also be broadcast to all the snippets. What that means is that the data retrieved from all the SPUs executing the snippet returns the data to the host and the host consolidates the data and sends all the consolidated data to every SPU in the server. Broadcast is efficient in case of small tables joined with a large table and both cannot be distributed physically on the same key. As in the example above, the state code is used in the join to generate the name of the state in the query result. Since the fact and STATE table are not distributed in the same key the optimizer broadcasts the STATE table to all the SPUs so that the joins can be done locally on each SPU. Broadcasting of STATE table is much more efficient since the data stored in the table is miniscule compared to what NZ can handle. … Node 5. [SPU Hash Join Stream "Node 4" with Temp "Node 1" {(C.C_CUSTKEY,O.O_CUSTKEY)}] -- Estimated Rows = 5625000, Width = 33, Cost = 578.6 .. 1458.1, Conf = 51.2 Restrictions: (C.C_CUSTKEY = O.O_CUSTKEY) Projections: 1:O.O_TOTALPRICE 2:N.N_NAME Cardinality: C.C_NATIONKEY 25 (Adjusted) … NZ supports hash, merge and nested loop joins and hash joins are the most efficient. The key data points to check at in a snippet performing a join is to see whether the correct join is selected by the optimizer and whether the number of estimated rows is reasonable. For e.g. if a column participating in a join is defined as floating point instead of an integer type, the optimizer cannot use hash join when the expectation was to use hash join. The problem can be fixed by defining the table column to use integer type. If the estimated number of rows is very high or the estimated cost is high, it may point to not having the correct join conditions or missing a join condition by mistake resulting in incorrect join result set. Common reasons for performance issues The following are some of the common reasons for performance issues  Not having correct distribution keys resulting in certain data slice storing more data resulting in data skew. Performance of a query depends on the performance of the slice storing the most data for the tables involved in a query.  Even if the data is distributed uniformly, not taking into account processing patterns in distribution results is process skew. For e.g. data is distributed by month but if the process looks for a month of data, then the performance of the query will be degraded since the processing needs to be handled by a subset of SPUs and not all of the SPUs in the system. Data retrieved from the snippet is broadcast to all the snippets This snippet performs a join of the data from the previous snippets
  • 31. © asquareb llc 30  Performance gets impacted if a large volume of data (fact table) gets re-distributed or broadcast during query execution.  Not having zone maps of columns since the data types are not the ones for which NZ can generate Zone Maps like numeric(x, y), char, varchar etc.  Not having the table data organized optimally for multi table joins as in the case of multi- dimensional joins performed in a data warehouse environment. Steps to do query performance analysis The following are high level steps to do query performance analysis  Identify long running queries o Identify if queries are being queued and the long running queries which is causing the queries to be queued through NZAdmin tool or nzsession command. o Long running queries can also be identified if query history is being gathered using appropriate history configuration  For long running queries generate query plans using the “EXPLAIN” command. Recent query execution plans are also stored in *.pln files in the data.<ver>/plans directory under the /nz/data directory.  Look for some of the data points and reasons for performance issues details in the previous sections.  Take necessary actions like generate statistics, change distributions, grooming tables, creation of materialized views, modifying or using organization keys, changing column data types or rewriting query.  Verify whether the modifications helped with the query plan by regenerating the plan using the explain utility.  If the analysis is performed in a non-development environment, it is key to make sure that the statistics reflect the values expected or in production.
  • 32. © asquareb llc 31 7. Netezza Transactions All database systems which allow concurrent processing should support ACID transactions i.e. the transactions should be atomic, consistent, isolated and durable. All Netezza transactions are ACID in nature and in this section we will see how ACIDity is maintained by the appliance. By default Netezza SQLs are executed in auto-commit mode i.e. the changes made by a SQL statement takes in effect immediately after the completion of the statement as if the transaction is complete. If there are multiple related SQL statements where all the SQL execution need to fail if any one of them fails, user can use the BEGIN, COMMIT and ROLLBACK transaction control statements to control the transaction involving multiple statements. All SQL statements between a BEGIN statement and COMMIT or ROLLBACK statement will be treated as part of a single transaction i.e if anyone of the SQL statement fails, all the changes made by the prior statements before the failure will be reverted back leaving the database data in a state as it was before the start if execution of the BEGIN sql statement. There are SQL statements like BEGIN which are restricted from including inside a BEGIN, COMMIT/ROLLBACK block which users need to be aware of. Traditionally database systems used logs to manage transaction commits and rollbacks. All changes which are part of a transaction are maintained in a log and based on whether user commits or rolls back the transaction the changes are made durable i.e. written to the storage. Netezza doesn’t use logs and all the changes are made on the storage where user data is stored which also helps with the performance. To accomplish this for table data changes, Netezza maintains three additional hidden columns (createxid, deletexid and row id) per table row which stores the transaction id which created the row, the transaction id which deleted the row and a unique row id assigned to the data row by the system. The transaction ids are uniquely assigned ids for each transaction executed in the system and they are assigned in increasing order. When a row is created the deletexid is set to “0” and when it gets deleted it is set to the transaction id which executed the delete request. When an update is made to a table row, the row data is not updated in place rather a new row is created with the updated values and the old row deletexid is updated with the id of the transaction which issued the update command. In case a transaction rolls back a delete operation, the deletexid of the rows deleted is updated with “0” and in the case of rolling back update operation, the deletexid of the new row created is set to “1” and the deletexid of the old row is updated with “0”. One key point to note is that the rows are deleted logically and not physically during delete operations and additional rows are getting created during update operations. What that means is groom needs to be executed regularly on tables which undergo large volume of deletes and updates to reclaim the space and improve performance. The transaction isolation is controlled by the transaction isolation levels supported by any database system and the one used by the individual transaction. By ANSI standards there are four isolation levels  Uncommitted Read  Committed Read  Repeatable Read  Serializable The following table details the data consistency which can be expected with each isolation levels
  • 33. © asquareb llc 32 Isolation Level Uncommitted Data Non Repeatable Data Phantom Data Uncommitted Y Y Y Committed N Y Y Repeatable read N N Y Serializable N N N Uncommitted Data: Transaction can see data from another transaction which is still not committed Non Repeatable Data: When a transaction tries to read the same data, the data is changed by another transaction but committed after the previous read. Phantom Data: When a transaction tries to read the same data, the data previously read is not changed but new rows has been added which satisfies the previous query criteria i.e. new rows has been added to the data rows. Netezza allows only the “serializable” transaction isolation level and if a transaction is found to be not serializable then it will not get scheduled. This works fine since the appliance is primarily meant for a data warehousing environment where there are very minimal or no updates once data is loaded. Data versioning which can provide a consistent view of data for transactions running concurrently combined with supporting only serializable transactions, Netezza doesn’t need to lock the synchronization mechanism used in database systems traditionally to maintain concurrency. For some DDL statements like dropping a table, the appliance will require exclusive access on objects and that would mean the DDL statement execution will have to wait until any other operation like SELECT on the object is complete.
  • 34. © asquareb llc 33 8. Loading Data, Database Back-up and Restores Loading and unloading data, creating back-ups and the process to restore is a vital component for any database systems and in this section we will look into how they are done in Netezza. External Tables Users can define a file as a table and can access them as any other table in the database using SQL statements. Such tables are called external tables. The key difference is that the even though data from external tables can be accessed through SQL queries the data is stored in files and not in the disks attached to the SPUs. Netezza allows selects and inserts on external tables but not updates deletes or truncation. That means users can insert data into the external tables from a normal table in the database which will get stored in the external file. Also data can be selected from the external table and inserted into a database table and these two actions are equivalent of data unload and load. Any user with LIST privilege and CREATE EXTERNAL TABLE privilege can create external tables. The following are some sample statements to create external tables and how they can be used -- creates an external table with the same definition as an existing customer table -- create external table customer_ext same as customer using (dataobject (‘/nfs/customer.data’) delimiter ’|’); -- create statement used the column definitions provided to create the external table -- create external table product_ext(product_id bigint, product_name varchar(25), vendor_id bigint) using (dataobject(‘/tmp/product.out’ delimiter ’,’); -- create statement uses the definition of existing account table to create the external table and stores the data in the account.data file in an Netezza internal format -- create external table ‘/tmp/account.data’ using (format ‘internal’ compress true) as select * from account; -- read data from the external table and insert into an existing database table -- insert into temp_account select * from external ‘/tmp/account.data’ using (format ‘internal’ compress true); -- storing data into an external table which in turn into the file. This file will be readable and used with other DBMS -- insert into product_ext select * from product; Once external tables are created they can be altered or dropped. Statements used to drop and alter a normal table can be used to perform these actions on external tables. Note that when an external table is dropped, the file associated with the table is not removed and need to be removed through OS commands if that is what is intended. Also queries can’t have more than one external table and unions can’t be performed on two external tables. When user performs an insert operation on an external table, the data in the file associated with the external table (if existing) will be deleted before new data is written i.e. if required backup of the file may need to be made. External tables are one of the options to backup and restore data in a Netezza database even though it works at the table level. Since external tables can be used to create data files from table data in a readable format with user specified delimiters and line escape characters, they can be used to move data between Netezza and other DMBS systems. Note that the host system is not to store data and hence alternate arrangements need to be made if the user intends to move large volume of data like using nfs mounts to store the files created from the external tables.
  • 35. © asquareb llc 34 NZLoad CLI NZLoad is a Netezza command line interface which can be used to load data into database tables. Behind the scenes NZLoad creates an external table for the data file to be loaded, does select data from the external table and inserts the data into the target table and then drops the external table once the insert of all the records are complete. All these are done by executing a sequence of SQL statements and the complete load process is considered part of a single transaction i.e. if there is an error in any of the steps the whole process is rolled back. The utility can be executed on a Netezza host locally or from a remote client where Netezza ODBC client is installed. Even though the utility uses the ODBC driver, it doesn’t require a Data Source Name on the machine it is executed since it uses the ODBC driver directly bypassing the ODBC driver manager. The following is a sample NZLoad statement nzload –u nz –pw nzpass –db hr –t employee –df /tmp/employee.dat –delimiter ‘,’ –lf log.txt –bf bad.txt –outputDir /tmp –fileBufSize 2 –allowReplay 2 A control file can also be used to pass parameters using the –cf option to the nzload utility instead of passing through the command line. This also allows more than one data file to be loaded concurrently into the database with different options and each load will be considered as a different transaction. The control file option –cf and the data file option –df are mutually exclusive i.e they can’t be used at the same time. The following is a same nzload command assuming the load parameters are specified in the load.cf file. nzload –u nz –pw nzpass –host nztest.company.com –cf load.cf NZLoad exits with an error code of 0 - for successful execution, 1 – for failed execution i.e. no records were inserted, 2 – for successful loads where the maximum number of rows which were in error during load did not exceed the maximum specified in the NZLoad command request. When NZLoad is getting executed against a table, other transactions can still use the table since the load command will not commit until the end and hence the data being loaded will not be visible to other transactions executed concurrently. Also the NZLoad sends chunks of data to be loaded along with the transaction id to all the SPUs which in turn stores the data immediately onto storage and thus taking up space. If for any reason the transaction performing the load is cancelled or rolled back at any point, the data will still remain in storage even though it will not be visible to future transactions. In order to claim the space taken by such records, users need to schedule a groom against the table. Note NZReclaim utility can also be used on earlier versions of Netezza but it is getting deprecated from version 6. The user id used to run the NZLoad utility should have the privilege to create external tables along with select, insert and list access on the database table. Database Backup and Restore Unlike traditional database systems, Netezza has a host component where all the database catalogs, system tables, configuration files etc. are stored and the SPU component where user data is stored. Both these components need to be backed-up in sync for a successful restore even though in restore situations
  • 36. © asquareb llc 35 like database or table restore the host component may not be required. Netezza provides nzhostbackup and nzhostrestore utilities to make host backups and restore along with the nzbackup and nzrestore utilities to manage the user data backup and restoration. This is in addition to “external tables” which can be used to backup and restore individual tables which can also be used to make back-ups of all the tables in a database and hence creating a database backup. Host Backup and Restore Netezza host can be backed up using the nzhostbackup command utility. It backs up the Netezza data directory which stores the system tables, database catalogs, configuration files, query plans, cached executable code for the SPUs etc which are required for the correct functioning and access to user databases are stored. When the command is executed, it will pause the system to make a checkpoint before taking a backup of the /nz/data directory. Once the backup is complete the system is resumed to its original state and due to this requirement to pause the system, it would be good to schedule the host backup when there is no or very less activity on the system. In the event of any issues with the Netezza host, the host can be restored using the nzhostrestore utility. The restore will restore only the host components like the database catalog entries, system tables etc to the state when the host backup was made. What that means is if there were actions like dropping of user tables or truncation or grooming of tables after the host backup, the host restore will not be able to revert back these actions. This may result in the inconsistent state of the user objects and corrective action needs to be taken to bring them to the state in sync to when the host backup was made. Also in the case of new tables created after the host backup, when the host restore is done the utility will create SQL scripts to drop these tables which are orphaned since the entries are not in the catalog. To avoid any such inconsistencies, it is always recommended not to schedule the host backup whenever there is a major change to the databases. The following are sample nzhostbackup and nzhostrestore commands. nzhostbackup /tmp/backup/nz-devhost-bk-01212011.tar.gz nzhostrestore /tmp/backup/nz-uathost-bk-12212011.tar.gz During the host restore process the utility checks for the version of the catalog on the host compared to what is in the backup. If they are not the same or if for some reason the utility is not able to identify the versions, then the restore process will fail. This verification step can be skipped using the –catverok of the nzhostrestore utility but it is not recommended. Note that the nz host backup doesn’t back up the operating system components or configurations and they need to be backed up separately. User Database Backup and Restore User database and data separate from the host components can be backed up and restored using the nzbackup and nzrestore utilities. The nzbackup allows the user to make a full, cumulative or differential backup of a database. The full backup makes a backup of the complete database, while the differential backup makes a backup of the changes which were made after the previous backup either full or cumulative. The cumulative backup creates a backup of all the changes which were made after the previous full backup. Users executing the
  • 37. © asquareb llc 36 nzbackup command need to have backup privilege on the specific database level or at the global level. The following is an example where a full back up is scheduled every month say on day 1, a differential back up scheduled every day and a cumulative backup scheduled every week. When the differential back up is run, it captures all the data after the previous backup. If the database needs to be restored as of day 4 after the monthly backup, the monthly backup needs to be used along with the differential backup 1,2 & 3. If the database needs to be restored as of day 8, the then the monthly backup, along with the cumulative backup and the differential back 6 need to be used. There is no point in time recovery in Netezza as with other traditional databases where databases can be recovered to a point in time between databases backup which uses logs which are not available in Netezza. The following are some example Netezza backup commands. -- creates a full backup of the hrdb database -- nzbackup –dir /tmp/backups –u nzbackup –p nzpass –db hrdb -- creates a differential or incremental backup of the findb database -- nzbackup –dir /nfs/backups –db findb –differential -- creates a cumulative backup of the hrdb database -- nzbackup –dir /nfs/backups –db hrdb –cumulative -- creates a backup of the schema from the hrdb database and not the data -- nzbackup –dir /nfs/backups –db hrdb –schema-only -- creates the backup of the users and permissions at user database and system level -- nzbackup –dir /nfs/backups –users –u nzbackup –p nzpass As you may have noticed, there are two other uses for the nzbackup utility other than the database data backup. One is to generate the schema of a database which then can be used to create another database with only the entities in the source database and not the data. Second is to back up all the users, groups and the privileges in the system so that it can be used to restore the user details or create the same set of users and privileges in another system. The history of database backup can be viewed using the –history option of the nzbackup command. nzbackup –history nzbackup –history –db hrdb A full back-up along with all the differential and cumulative backups after that until the next full backup is considered part of a backup set. Each backup set is identified by an id along with the backups made within the backup set identified by a sequence number. All these details are displayed when a user makes a request for the backup history for a database. Databases can be restored using the nzrestore command line utility and the user needs to have the restore privilege. A full database restore cannot be made against an existing database which can be done Monthly full Daily Diff 1 iff Monthly full Daily Diff 2 iff Monthly full Daily Diff 3 iff Monthly full Daily Diff 4 iff Monthly full Daily Diff 5 iff Monthly full Daily Diff 6 iff Monthly full Weekly Cumulative iffMonthly full 1 2 3 4 5 6 7 8 Daily Diff 7 iff Monthly full 9
  • 38. © asquareb llc 37 in traditional databases. Instead the existing database needs to be dropped or new database needs to be specified to perform a full database restore. As with nzbackup utility the option of –schema-only can be used to create only the objects in the new database and not load the data from the backup. Also only the users and privileges can be restored using the –users option of the nzrestore utility. The following are some example restore commands -- Performs a restore of the hrdb database from a full backup -- nzrestore –db hrdb –u nzbackup –p nzpass –dir /tmp/backups –v -- creates all the objects in fundb into the new database testfindb but doesn’t load any data -- nzrestore –db testfindb –sourcedb findb –schema-only –u nzbackup –p nzpass –dir /nfs/backups -- Creates all the users and applies all the privileges as in the backup but doesn’t drop users or alters existing privileges -- nzrestore –users –u nzbackup –p nzpass –dir /nfs/backups -- Restores the database findb upto and until the backup sequence 2 with in the backupset 200137274 -- nzrestore –db findb –u nzbackup –p nzpass –dir /tmp/backups –backupset 200137274 – increment 2 Database backups can be used to restore specific table(s) to a point in time using the nzrestore –tables option. One or more tables can be restored by listing them along with the –tables option. If the table being restored is in the database then the table will be overwritten and if it doesn’t exist, the table will get created. Table level restore is applicable only for regular tables.
  • 39. © asquareb llc 38 9. Netezza SQL As with other popular DBMS systems currently available in the market, SQL in Netezza supports SQL- 92 standard. The SQLs can be executed through the nzsql command line utility or through any JDBC, ODBC or OLE DB connectivity. SQL Statements The following are the SQL statements available for the various object types in Netezza. Database SQL Statement Description Create Database To create a new Netezza database Drop Database To drop an existing Netezza database Alter Database To rename or change owner of an existing database. Renaming a database will require recompilation of any views in the database. Also all materialized views will be converted to normal views and will need to be replaced. The following are some limits at the database level Limit Type Limit Name Length Maximum number of characters 128 bytes Connections Maximum number of connections to the server:2000 default:500 When connected to one database, objects in another database can be accessed by qualifying the object name with “database name.schema name” for e.g. “hrdb.hradmin.emp” can be used to access the emp table in hrdb database under hradmin schema. The schema name/object owner name is optional and the object can be accessed by “database nama..object name” e.g “hrdb..emp” since object names are unique in a database. While objects in a second database can be queried or used in query joins, no data updates are allowed across databases. Tables SQL Statement Description CREATE TABLE To create a new Netezza table DROP TABLE To drop an existing Netezza table ALTER TABLE To rename or change owner or modify (add, drop) columns or modify constraints in an existing table SELECT col_name(s) FROM tbl To retrieve data from a database table INSERT INTO tbl To insert data into a database table UPDATE tbl SET col_nme = val To update existing data in a table DELETE FROM tbl To delete data from a table TRUNCATE TABLE tbl To remove all records from the table. Truncate removes the data and recovers the storage instead of logically deleting the rows as in delete statement
  • 40. © asquareb llc 39 SQL Statement Description UNION, UNION ALL To combine data from more than one table or row set UNION can be used in conjunction with SELECT statement INTERSECT, INTERSECT ALL To retrieve common set of rows in more than one table or row set INTERSECT can be used in conjunction with SELECT statement EXCEPT[DISTINCT], MINUS[DISTINCT, EXCEPT ALL, MINUS ALL To retrieve set of rows from one table which are not in another row set/table EXCEPT or MINUS can be used in conjunction with SELECT statement ALTER VIEWS ON tbl MATERIALIZE REFRESH To re-materialize all the materialized views which are based on a particular table The following are some limits at the table level Limit Type Limit Name/Column Name Length Maximum number of characters 128 bytes Max Column Count Maximum number of columns per table: 1600 Distribution Keys Maximum number of distribution columns: 4 Char/Varchar Field length Maximum number of characters in char/varchar type: 64000 Row Size Max row size in a table: 65,535 Synonyms Synonyms can be used to refer tables and views in the same database with a different name or refer to objects in a remote database without the “database.schema” qualifier. The following are the supported SQL statements for Synonyms SQL Statement Description CREATE SYNONYM name FOR table/view To create a synonym and it doesn’t verify whether the table or view for which it is getting created exists or not. If the underlying object is non-existent a runtime error will be thrown when user tries to access the Synonym. DROP SYNONYM name To drop an existing synonym ALTER SYNONYM name To rename or change owner of an existing synonym Views Views are created to give a different perception of data like restricted table column or join of data from two sets of data and/or to control access to the data in the database. The following are the supported SQL statements for views SQL Statement Description CREATE OR REPLACE VIEW To create a new Netezza view DROP VIEW To drop an existing Netezza view ALTER VIEW To rename or change owner of an existing view
  • 41. © asquareb llc 40 Materialized Views The following are the supported SQL statements on Materialized Views SQL Statement Description CREATE OR REPLACE MATERIALIZED VIEW To create a new materialized view or replace an existing one when the base table gets changed that impacts materialized view DROP VIEW To drop an existing materialized view (same as normal views) ALTER VIEW vw MATERIALIZE REFRESH|SUSPEND Use “suspend” option to suspend a view so that activites like base table refresh or groom can run. Use “refresh” option to activate a suspended materialized view. The “refresh” option is also used to re-materialize the view so that the rows are in the correct order. A system default can be set to specify the percentage of rows in a materialized view that can be out of order and the default value is 20%. Once set the command “ALTER VIEWS ON MATERIALIZE REFRESH” will re-materialize all the views in the database which have the set percentage of rows which are not in the correct order. Functions Netezza comes with many built in functions are the following are some of them under various categories. Category Functions Aggregates rollup, cube, grouping sets, count, sum, max, min, avg Casting to_char, to_date, to_number, to_timestamp, cast(value as type),:: Date Time extract(field from datetime value) where field = epoch, year, month, day,dow,doy Checking null value nvl(), nvl2() String Functions Trim, position, substring, lower, upper, like, not like, character_length Fuzzy String Search le_dst, dle_dst Value Functions current_date, current_time, current_timestamp, current_user, current_db Math Functions acos, asin, atan, atan2, cos, cotx, degrees, pi, radiants, sin, tan, random, ceil, abs This is just a sample list and other built in functions along with the details can be obtained from the Netezza database user guide.
  • 42. © asquareb llc 41 10. Stored Procedures Stored procedures are programs written in NZPLSQL language to perform database operations. They are stored in Netezza databases and runs on Netezza host when executed. NZPLSQL is an interpreted language based in Postgres’ PL/pgSQL. The key advantages of using stored procedures are  It helps avoid unnecessary network traffic between the appliance and the application requesting for data which is a major advantage when dealing with large volumes of data in a ware housing environment.  By controlling the access restrictions users can be allowed to perform database operations through stored procedures without providing access to underlying objects.  By encapsulating business logic in stored procedures there is only one place to make modifications i.e. helps maintenance NZPLSQL along with regular SQL statements provides a complete programing language with capabilities like conditional statement, looping functionality, declaring variables and manipulating it through expressions, creating and calling sub routines etc. All regular SQL statements except the ones which are not allowed to be with in a BEGIN/COMMIT block can be used in a stored procedure. NZ SQL Procedures can accept multiple input variables and return one output either a scalar value or a result set. The stored procedures can be created, altered, executed and dropped through NZSQL command utility or through an ODBC, JDBC, OLE-DB connection to a database. The following is a sample set of SQL statements with respect to stored procedures /* Stored procedure to update employee salary */ create or replace procedure update_emp_sal(int4) returns int4 execute as owner language nzplsql as begin_proc declare int4 ret_code; -- Variable to store the first value passed as input to the stored procedure -- Inc_Percent alias for $1; begin update hr..emp set salary = salary+(salary*inc_percent*100); raise notice ‘Updated employee salary successfully’; -- Section to handle all unhandled exception -- exception when others the raise notice ‘Error in updating employee salary’; End; End_proc; -- To display the details about the stored procedure -- show procedure update_emp_sal(int4); -- To execute the stored procedure -- exec procedure update_emp_sal(12); -- Another way to execute the stored procedure -- call procedure update_emp_sal(-2); -- To drop the stored procedure -- drop procedure update_emp_sal(int4); Stored procedures are not case sensitive i.e. they get converted into capital letters irrespective of what case letter is used. So if there are variables which are spelled the same but use different case letters, they
  • 43. © asquareb llc 42 all are considered duplicates. Create or replace procedure statement replaces if a procedure with the same name already in the database or creates a new one if none exists. Admin user and database owner by default have all the permissions to perform actions regarding stored procedures. Users with grant permission option also provide permissions to other users or groups to alter, create or replace, drop, execute stored procedures. The following are some sample statements -- grant permission to create stored procedure to the group devgrp -- grant create procedure to group devgrp; -- grant all permissions regarding stored procedures to user mike -- grant all on procedure to mike; -- revoke permission to create procedure from the group qagrp -- revoke create procedure from group qagrp; -- revoke permission to alter on the procedure update_emp_sal(int4) from user tom -- revoke alter on update_emp_sal(int4) from tom; -- provide execute permission on the procedure update_emp_sal(int4) to user sally -- grant execute on update_emp_sal(int4) to sally; -- grant drop access on the procedure update_emp_sal(int4) to user ted -- grant drop on update_emp_sal(int4) to ted; Note that a procedure is uniquely identified using the name along with the input parameters. Procedures with the same name but with different input parameter lists are considered different stored procedures. When a procedure is created it can be defined whether the procedure should execute with the authority of the user who created it or with the authority of the user who is executing it. The “EXECUTE AS OWNER” or “EXECUTE AS CALLER” parameter in the CREATE or ALTER procedure statement determines the authority used during execution. By specifying that the procedure needs to be executed as owner, even if the user executing the procedure doesn’t have the necessary authority on the objects accessed by the caller of the procedure, the procedure will be executed successfully as long as the owner who created the procedure has the necessary privileges on the objects. For e.g. if the emp table data can be updated by only the user in the hrgroup and the stored procedure update_emp_sal(int4) is created by someone from the hrgroup, the user sally will be able to execute the procedure successfully even if the user is not part of the hrgroup. The reason is that the procedure was created with “execute as owner” option and the owner has the privilege to update the table. If the standard is that only the user with the necessary permissions on the underlying objects can only be able to manipulate data through a procedure, then procedures can be created using “execute as caller” option. -- alter procedure update_emp_sal(int4) to execute as caller -- alter procedure update_emp_sal(int4) execute as caller; After the execution of this alter statement, if the user sally executes the procedure it will fail since the user is not part of the hrgroup and doesn’t have the update privilege. When a stored procedure is executed all the objects referred in the procedure will need to be available in the database where it is getting executed unless the object is fully qualified. If an object from a remote database is used in the procedure, data can only be queried from the object in the remote database and can’t be inserted, updated or deleted. When database backups are restored the stored procedures all get restored and can be without errors unless the version of Netezza where the restore is done doesn’t support procedures or some of the