Watch How Teradata works with Introduction to teradata ,How Teradata Visual Explain Works,teradata database and tools,teradata database model,teradata hardware and software architecture,teradata database security,teradata storage based on primary index
3. 3
How DoesTeradata Store Rows?
ïĄ Teradata uses hashing algorithm to randomly and evenly distribute data across all AMPs.
ïĄ The rows of every table are distributed among all AMPs - and ideally will be evenly distributed
among all AMPs.
ïĄ Each AMP is responsible for a subset of the rows of each table.
ïĄ Evenly distributed tables result in evenly distributed workloads.
ïĄ The data is not placed in any particular order
The benefits of unordered data include:
ïĄ No maintenance needed to preserve order, and
ïĄ It is independent of any query being submitted.
The benefits of automatic data placement include:
ïĄ Distribution is the same regardless of data volume
ïĄ Distribution is based on row content, not data demographics
4. 4
Primary Indexes
ï· The mechanism used to assign a row to an AMP
ï· A table must have a Primary Index
ï· The Primary Index cannot be changed
UPI ï· If the index choice of column(s) is unique, we call this a UPI
(Unique Primary Index).
ï· A UPI choice will result in even distribution of the rows of the
table across all AMPs.
NUPI ï· If the index choice of column(s) isnât unique, we call this a NUPI (Non-
Unique Primary Index).
ï· A NUPI choice will result in even distribution of the rows of the table
proportional to the degree of uniqueness of the index.
UPIâs guarantee even data distribution and eliminate duplicate row checking.
Why would you choose an Index that is different from the Primary Key?
âą Join performance
âą Known access paths
5. 5
Data Storage based on Primary Index
ï· The value of the Primary Index for a specific row determines its AMP assignment.
ï· This is done using the hashing algorithm.
PIValue
AMP AMP AMP
PE
Row assignment
Row access
Hashing
algorithm
Accessing the row by its Primary Index value is:
ï· Always a one-AMP operation
ï· The most efficient way to access a row
6. 6
Row Distribution Using a UPI
Order
Number
Customer
Number
Order
Date
Order
Status
PK
UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
The PK column(s) will
often be used as a UPI.
PI values for Order_Number
are known to be unique (itâs a PK).
Teradata will distribute different
index values evenly across AMPs.
Resulting row distribution
among AMPs is uniform.
AMP 1 AMP 2 AMP 3 AMP 4
7202 2 4/09 C
7402 3 4/16 C
7325 2 4/13 C
7225 2 4/15 C
7188 1 4/13 C
7384 1 4/12 C
7324 3 4/13 C
7103 1 4/10 C
7415 1 4/13 C
Order
7. 7
Row Distribution Using a NUPI
Order
Number
Customer
Number
Order
Date
Order
Status
PK
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
Order
Customer_Number may be the
referred access column for ORDER
table, thus a good index candidate.
Values for Customer_Number are
non-unique and therefore a NUPI.
Rows with the same PI value
distribute to the same AMP
causing row distribution to
be less uniform or skewed.
7225 2 4/15 C
7325 2 4/13 0
7415 1 4/13 C
7384 1 4/12 C
7324 3 4/13 0
7402 3 4/16 C7103 1 4/10 C
AMP 1 AMP 2 AMP 4
7202 2 4/09 C
7188 1 4/13 C
AMP 3
8. 8
Secondary Indexes
Three general ways to access a table:
ï· Primary index access (one-AMP access)
ï· Secondary index access (two-or all-AMP access)
ï· FullTable Scan (all-AMP access)
A secondary index is an alternate path to the rows of a table.
A table can have from 0 to 32 secondary indexes.
Secondary indexes:
âą Do not affect table distribution.
âą Add overhead, both in terms of disk space and maintenance.
âą May be added or dropped dynamically as needed.
âą Are chosen to improve table performance.
9. 9
Customer table
Id = 100
USI Value = 56
Table ID Row Hash USI Value
100 602 56
Hashing
Algorithm
PE
CREATE UNIQUE INDEX
(cust) on customer;
SELECT *
FROM customer
WHERE cust = 56;
Create
USI
Access via
USI
- * -
AMP 1 AMP 2 AMP 3 AMP 4
RowID Cust RowID RowID Cust RowID RowID CustRowID RowID Cust RowID
BYNET
AMP 2
Table ID
100
Row Hash
778
Unique Val
7
USI Subtable USI Subtable USI SubtableUSI Subtable
BYNET
AMP 1 AMP 3 AMP 4
74
77
51
27
884, 1
639, 1
915, 9
388, 1
244, 1
505, 1
744, 4
757, 1
84
98
56
49
536, 5
555, 6
778, 7
147, 1
296, 1
135, 1
602, 1
969, 1
31
40
45
95
638, 1
640, 1
471, 1
778, 3
288, 1
339, 1
372, 2
588, 1
175, 1 37 107, 1
489, 1 72 717, 2
838, 1 12 147, 2
919, 1 62 822, 1
Adams
Smith
Rice
White555-4444
111-2222
222-3333
666-5555
31
37
40
84
107, 1
536, 5
638, 1
640, 1
RowIDCustName Phone
NUPI
Base Table
US
I
RowIDCust Name Phone
NUPI
Base Table
US
I
Base Table Base Table
Adams
Smith
Brown
Adams444-6666
666-7777
555-6666
333-9999
72
45
74
98
471, 1
555, 6
717, 2
884, 1
Jones
Black
Young
Smith111-6666
222-8888
444-5555
777-4444
27
49
62
12
147, 1
147, 2
388, 1
822, 1
RowIDCustName Phone
NUPIUS
I
Smith
Marsh
Peters
Jones 777-6666
555-7777
888-2222
555-7777
56
77
51
95
639, 1
778, 3
778, 7
915, 9
RowIDCustName Phone
NUPIUS
I
Unique Secondary Index (USI) Access
10. 10
Non-Unique Secondary Index (NUSI) Access
Table ID
100
Row Hash
567
NUSI Value
âAdamsâ
Hashing
Algorithm
Customer table Id
= 100
BYNET
AMP 2
NUSI Value = âAdamsâ
PE
CREATE INDEX (name) on
customer;
SELECT *
FROM customer
WHERE name = âAdamsâ;
Create
NUSI
Access
via NUSI
AMP 1
Brown
Adams
Smith
555, 6
471, 1 717, 2
884, 1
852, 1
567, 2
432, 3
RowID Name RowID
White
Rice
Adams
Smith
107, 1
536, 5
638, 1
640, 1
448, 1
656, 1
567, 3
432, 8
RowID Name RowID
NUSI Subtable NUSI Subtable
Smith
Young
Jones
Black
147, 1
147, 2
338, 1
822, 1
432, 1
770, 1
567, 6
448, 4
RowID Name RowID
NUSI Subtable
Jones
Peters
Smith
Marsh
639, 1
778, 3
778, 7
915, 9
262, 1
396, 1
432, 5
155, 1
RowID Name RowID
NUSI Subtable
AMP 4AMP 3
Adams
Smith
Rice
White 555-4444
111-2222
222-3333
666-5555
31
37
40
84
107, 1
536, 5
638, 1
640, 1
RowIDCust Name Phone
NUPI
Base Table
RowIDCust Name Phone
NUPI
Base Table Base Table Base Table
Adams
Smith
Brown
Adams444-6666
666-7777
555-6666
333-9999
72
45
74
98
471, 1
555, 6
717, 2
884, 1
Jones
Black
Young
Smith 111-6666
222-8888
444-5555
777-4444
27
49
62
12
147, 1
147, 2
388, 1
822, 1
RowIDCust Name Phone
NUPI
Smith
Marsh
Peters
Jones 777-6666
555-7777
888-2222
555-7777
56
77
51
95
639, 1
778, 3
778, 7
915, 9
RowIDCust Name Phone
NUPINUSI NUSI NUSI NUSI
11. 11
Comparison of Primary and Secondary Indexes
Index Feature Primary Secondary
Required? Yes No
Number per Table 1 0-32
Max Number of Columns 16 16
Unique or Non-Unique? Both Both
Affects Row Distribution Yes No
Created/Dropped Dynamically No Yes
Improves Access Yes Yes
Multiple Data Types Yes Yes
Separate Physical Structure None Sub-table
Extra Processing Overhead No Yes
12. 12
Primary Keys and Primary Indexes
Primary Key
Logical concept of data modeling
Teradata doesnât need to recognize
No limit on column numbers
Documented in data model
(Optional in CREATETABLE)
Must be unique
Uniquely identifies each row
Values should not change
May not be NULLârequires a value
Does not imply an access path
Chosen for logical correctness
Primary Index
Physical mechanism for access and
storage
Each table must have exactly one
16-column limit
Defined in CREATETABLE statement
May be unique or non-unique
Used to place and locate each row
on an AMP
Values may be changed (Del+ Ins)
May be NULL
Defines most efficient access path
Chosen for physical performance
Indexes are conceptually different from keys:
A PK is a relational modeling convention which uniquely identified each row.
A PI is aTeradata convention which determines how the rows are stored and accessed.
A significant percentage of tables may use the same columns for both the PK and PI.
A well-designed database will use a PI that is different from the PK for some tables.