SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
The Internals (part 1)
Table of Contents
Background ............................................................................................................................................................. 2
Heap Table .............................................................................................................................................................. 2
First Level Bitmap ................................................................................................................................................ 3
Second Level Bitmap ........................................................................................................................................... 5
Extent Information .............................................................................................................................................. 6
Data Blocks .......................................................................................................................................................... 8
Data Types ........................................................................................................................................................... 9
Varchar and Char............................................................................................................................................. 9
Number ......................................................................................................................................................... 10
Date and Timestamp ..................................................................................................................................... 11
B-Tree Index .......................................................................................................................................................... 11
Unique Index ..................................................................................................................................................... 12
Non-Unique Index ............................................................................................................................................. 15
Composite Index ............................................................................................................................................... 16
Function-Based Index ........................................................................................................................................ 17
Local and Global Index ...................................................................................................................................... 18
Bitmap Index ......................................................................................................................................................... 20
IOT ......................................................................................................................................................................... 21
Row Migration ....................................................................................................................................................... 23
The Symptom .................................................................................................................................................... 23
How Things are Working? ................................................................................................................................. 24
The Impact......................................................................................................................................................... 27
Row Chaining......................................................................................................................................................... 28
References............................................................................................................................................................. 30
What’s Next? ......................................................................................................................................................... 30
Background
“When you love someone - you'll do anything
you'll do all the crazy things that you can't explain” 

Yeah…, that is a few lyric from Bryan Adam’s “When You Love Someone”. Analogy to that, when I like
something, I want to understand how the things are working or stored, internally. Thus, while reading few
articles and also Jonathan Lewis’s book about Oracle internal structure, I decide to do an exercise regarding
Oracle internal structure: table, index, undo, redo, etc. I run these exercises against Oracle 10.2.0.5 in
Windows box with ASSM (the most used setting in current production environment). The objectives of this
exercise are:
1.
2.
3.
4.
5.
6.

Understand the structure of table, b-tree and bitmap index, undo segment and redo record
How Oracle store data for several data types
How Oracle build the result of query involving undo information
DML operation
Impact of move/ shrink space command
Other symptoms: row migration, deadlock, snapshot too old

Heap Table
This is the most popular table type in Oracle or even in other RDBMS system. It will store data in unordered way using first fit algorithm. Using simple “ALTER SYSTEM DUMP” command, we can dump the
structure of any segment in the datafile. The same method will work for undo segment as well.
ALTER SYSTEM DUMP DATAFILE 4 BLOCK MIN 123 BLOCK MAX 130
To observe the structure of heap table, I have created 2 tables: EMPTY and ONE_ROW. The codes of
those 2 tables creation are as attached below. Along with the script, I have attached as well the trace file for
this exercise.

exercise_01.sql

The first dump file is coming from empty table (0 record), while the second dump file is coming from
table with few records in it, but since the PCTFREE is 99, it is enough to force the number of extent to be 2. In
the general, the structure of heap table is like this:
In ASSM, Oracle does not use Freelist – Freelist Group, but introduces new mechanism which is called
BMB (Bitmap Managed Block) to manage free space in the block. BMB has a structure like B-tree index (root,
branch and leaf block) where the free space information of block is being kept inside leaf structure (it is called
as First Level Bitmap, L1), while L2 and L3 contains address to another level. For example, L2 contains address
of L1 and L3 contains address of L3. But during this entire exercise I cannot see any segment in my test case is
having Third Level Bitmap, L3 (maybe due to size of the table). Beside data block address, Oracle also keeps 4
bits information that indicate the available space in the block, as the following:
BINARY DECIMAL
DESCRIPTION
CODE
CODE
0000
0
Unformatted
0001
1
FULL
0010
2
0-25% Free
0011
3
25-50% Free
0100
4
50-75% Free
0101
5
75-100% Free

First Level Bitmap
In this section we can see information, like:
# Unformatted blocks
1.
2.
3.
4.

How many unformatted/ formatted blocks in the segment
HWM and Second Level Bitmap
Free block statistics and DBA (data block address) range(s)
Transaction ID of locker (if available)

Second level
bitmap address
Summary of free block:
nf1: 0-25% free
nf2: 25-50% free
nf3: 50-75% free
nf4: 75-100% free

Object ID and
locker XID
HWM address
# Available blocks
for storing data
Extent information
Free block map
From above block header information, we can easily see this table has 1 extent only and the extent
size is 8 blocks, it is started from 0x010055b9 to 0x010055c0 (these numbers are a DBA, data block address, in
hexadecimal format – started with 0x). We can convert those numbers into block number and file number
using DBMS_UTILITY package.0x010055b9 is 16799161 in decimal format while0x010055c0 is 16799168.

The first 3 blocks are being used for metadata information, such as: First Level Bitmap Block, Second
Level Bitmap Block and Extent Header Information. Since the table is empty, Oracle only format the first 3
blocks (for storing metadata information) and leave the other 5 blocks unformatted. The HWM is 0x010055bc,
which is block number 3 and since no rows in the table, “#blocks below” HWM is showing 0.

8 blocks are
75-100% free
and 5 blocks
are FULL

2 extents
Above is the dump output of ONE_ROW table with 2 extents in the table. It is interesting to know that
when we insert a row (even only one row), Oracle will format all blocks in the extent (unformatted: 0).Freeness
status also shows nf4 = 8 which is matched with free blocks map in the bottom of this dump section. This part
reminds me to the output of DBMS_SPACE.SPACE_USAGE function (below output is taken from oCheck), and I
understand now why this function is quite fast, regardless the size of table, because this function get the
information of free space from block header, not scanning the whole table.

Second Level Bitmap
Below are the output of EMPTY and ONE_ROW table, it contains the address of BMBL1 along with
indication of available space. For below 2 tables, we can see in the available space indicator, most of the blocks
are 75-100% free (Free: 5)

Object ID

DBA of L1

DBA to the next
section, extent
information

Available space
indicator

For comparison, I have created another table with more than 1 entry for L1 (it has 3 entry for L1).
Please find below the script and dump output for the details. First 2 L1 entry for this table has available space
indicator equal to 1 (means most of the blocks are FULL) and both table has 2 extents, while the last entry is
showing 5 (75-100% free) with only 1 extent. Let’s see the complete representation of Second Level Bitmap
along with First Level Bitmap of this table.
exercise_02.sql

#L1 with full space??
#L1 with free space

Extent Information
Again, I will use the output from BIG_ONE (the output of ONE_ROW can be seen in above attached
dump file). I will just read the output of dump to check the result. #extents are 5 with 8 blocks in each extent,
so #blocks will be 40 (shown by green circle). Oracle keeps the details of BMB information in this segment. In
ASSM, we have 2 types of HWM, High HWM and Low HWM. All blocks below Low HWM are usable (have been
formatted) and this is the original HWM in MSSM. All blocks above High HWM have not been unformatted.

HWM information

This is where Oracle
keep details
information about
BMB to track and
manage free space

Extent Map

This is how to interpret
the Auxiliary Map
information

DBA of L2
Data Blocks
The next structure in table segment is data block. Let’s go to ONE_ROW table to see the structure. In
general we can divide data block structure into 3 parts: header information (there is ITL entry in this part), row
directory and table’s row. Oracle uses hexadecimal format for Object ID, not sure what is the reason.
Object ID

DBA of L1

tsiz: table size
hsiz: header size??
ntab: number of table
nrow: number of row
fsbo: free space begin offset
fseo: free space end offset
avsp/tosp: average/ total free
space (calculated as fsbo – fseo)

ITL entry:
Xid, transaction ID
Uba, undo block address
Flag of DML
Number of lock (Lck)
Scn information

Row directory

Table’s row

The ITL entry can be used to track down which transaction is locking the row and where is the Undo
Block Address (UBA) is being kept. We will see all those relations later in the next part. From above output we
can see all three rows are being locked by first transaction in the ITL entry (shown by purple circle)
Data Types
In this section we are going to see how Oracle stores the data in the block. I am going to cover only few
data types, such as: Varchar, Char, Number and Date. Below exercise is being used to show how Oracle stored
the data in the data block. If the purpose is to see how Oracle stores the data only, we can use DUMP function
instead of dumping block using “ALTER SYSTEM DUMP” command, it is faster and easier.

most_type.LST

Varchar and Char
Oracle uses ASCII code to stores both Varchar and Char data type. 65 is ASCII code for A, 66 is B, etc.
Since Char data type is fixed width data type, Oracle will use white space (ASCII code 32) for right padding the
data. Oracle stores the value in hexadecimal format, so before we apply CHR function, we need to convert the
value into decimal first.
Number
Oracle uses different way to store possitive, negative and zero number. I will cover only Number data
type (not Float, Double, etc).For Number data type, in general Oracle follows theserules:
1.
2.
3.
4.
5.
6.
7.
8.
9.

First byte is exponential information (10x)
Second byte is the Integer part
Last byte is negative sign if the value is 0x66 (102)
The rest bytes are Decimal part
All bytes are stored in hexadecimal format (0x) and Oracle break into 2 bytes each from the beginning
The real value for Integer and Decimal part (point 3 and 5) has to be substracted by 1
For possitive number, exponential bit is [value] – [0xC1 (193)]
For negative number:
a. exponential bit is [0x3E (62)] – [value]
b. data bytes is [0x66(102)]– [value]
10. The final number for exponential byte has to be multiplied by 2
11. For zero, Oracle stores 0x80 (128) without exponential and negative sign bytes

0xc1 – 0xc1 = 0 * 2 = 0
0x02 – 1 = 0x01 = 1
Final = 100 * 1 = 1
0xc1 – 0xc1 = 0 * 2 = 0
0x0b – 1 = 0x0a = 10
0x18 – 1 = 0x17 = 23
Final = 100 * 10.23 = 10.23
Last byte is 0x66, negative number
0x3e – 0x3e = 0 * 2 = 0
0x66 – 0x5b – 1 = 0x0a = 10
0x66 – 0x4e – 1 = 0x17 = 23
Final = -1 * 100 * 10.23 = -10.23
0xc1 – 0xc1 = 0 * 2 = 0
0x02 – 1 = 0x01 = 1
0x17 – 1 = 0x16 = 22
0x1f – 1 = 0x1e = 30
Final = 100 * 1.2230 = 1.223
After we understand how Oracle stores the Number data type, don’t you curious why Oracle uses 0xC1
and 0x3E as a special number for exponential for possitive and negative number respectively? Why Oracle uses
0x66 as negative sign? Why Oracle didn’t use any other number?
These are my best guesses so far:
0XC1 = 193
Maximum value for each byte is 0xFF (255), so the maximum exponential value for possitive
number will be 255 – 193 = 62. This number has to be multiplied by 2 according to above rules, so
62 * 2 = 124. Since the Integer part can be 2 digit (maximum value isdecimal 99), we have 1 more
digit to be added, so total 124 + 1 = 125.
It means the maximum value for possitive value is 9.9999 * 10125
0X3E = 62
0x3E is 62 in decimal value, very nice coincident, right?  so according to the rules, it has to be
multiplied by 2, so 62 * 2 = 124 plus 1 more from Integer part of the number. For negative value,
the minimum value is 10, so the final will be 124 + 1 = 125. It means the minimum value for
negative is -1 * 10125
0X66 = 102
0x66 is 102 in decimal format. According to above rules, the real number is X (negative sign) –
stored value – 1. Since the maximum posible value for stored value is 100, so X will be 100 + 1 +1
(this is to avoid the result is 0).
So, Oracle uses above number as range for Number data type itself (-1 * 10125 – 9.9 * 10125). The other
number data type (Float, double, etc) will be having different “special number” I guess, but it’s enough for me
at this stage  And why Oracle uses 0x80 for storing 0 doesn’t make me interested to find out the reason.
Date and Timestamp
The different between Timestamp and Date format is that Oracle stores zero for Time part in Date data
type. There are the rules which are being used for these 2 data types:
1. All data is stored in decimal format
2. For Century and Year part, Oracle add 100 to the stored value. The reason is to support BC and AD
dates (please read Thomas Kyte book: Expert Oracle Database Architechture)
3. For Month and Date part, Oracle stores as is
4. For Hour, Minute and Second, we need to substract by 1 to get the real value

B-Tree Index
Moving to the index part, the first thing to be observed of course B-tree index since this is the most
popular index type in the database world. Firstly we are going to see how Oracle stores unique and non-unique
index, and after that we will see also how it is working for function-based index.
Oracle also uses BMB to track and manage free space in the index segment. So I will not repeat again
the explanation for First Level Bitmap and Second Level Bitmap, but I will directly go to the Segment Header
(Extent Control Header). For index segment, there are 2 type of structures in the data segment: branch block
and leaf block.

Unique Index
I have created unique index on ONE_ROW (ID) column with pctfree 98 to expand the size of the index.

index_unique.LST

Object ID

DBA of L1

ITL entry:
Xid, transaction ID
Uba, undo block address
Flag of DML
Number of lock (Lck)
Scn information
kdxcolev: index level (0 = leaf block; 1 = branch block)
kdxcolok: denotes whether structural block transaction is
occurring
kdxcoopc: internal operation code
kdxconco: index column count
kdxcosdc: count of index structural changes involving block
kdxconro: number of index entries (does not include kdxbrlmc
pointer)
kdxcofbo: free space begin offset
kdxcofeo: free space end offset
kdxcoavs: average free space (calculated as kdxcofbo –
kdxcofeo)
kdxbrlmc: entry to the leaf block
kdxbrsno: last index entry to be modified
kdxbrbksz: size of usable block space
Index key
Address to Leaf Block
Above is the output of index dump for branch block. There is only 1 column for every row in the
branch block, which holds index key. The address of leaf block is kept in the row header. Now let’s take a look
into Leaf Block.
Object ID

DBA of L1

kdxlespl: bytes of uncommitted data at time of block split that
have been cleaned out
kdxlende: number of deleted entries
kdxlenxt: pointer to the next leaf block in the index structure
kdxleprv: pointer to the previous leaf block in the index
kdxlebksz: usable block space
kdxledsz: size of data in row header
Index value
ROWID
It is clear now that Oracle keeps the linked list information in the leaf block (kdxlenxt and kdxleprv),
and those information are being used to move from one leaf to another leaf block.
In Unique Index, Oracle stores ROWID information in the row header. ROWID contains information of
relative file number, block number and row number. ROWID is being used to pointing to the respective row in
the table. It is represented in hexadecimal format. There are 6 bytes in the ROWID, first 2 bytes are relative file
number, next 2 bytes are block number and the latest 2 bytes are row number. To break down the ROWID,
first convert the value into decimal format and follow below rules (ex. 01 0055 c400 00):
0x 01 00 = 256, to get relative file number, we need to divide by 64, so 256 / 64 = 4
0x 55 c4 = 21956  block number
0x 00 00 = 0  row number
So, the results are matched with below query.
After we understand how Oracle stores index key and ROWID in the branch and leaf block, let’s try to
draw the index structure in different way. Below trace files are being used as source (this index has root –
branch – leaf structure). Since the index key comes from numeric column, we can use the same rules for
reading data block value for Number data type.

And finally we can draw the index’s structure in a “tree” form as below (this is why it’s called as B-tree )
Non-Unique Index
I am going to use the same table ONE_ROW, add one column (ID2) and insert few duplicate values for
ID2. The purpose is to see in non-unique index, how Oracle handles duplicate data in the branch and leaf block.

index_unique_add_d
uplicate_value.LST

Index value
with multiple
rows

Index value
with 1 row

Index value

ROWID

First let’s take a look at the branch block. For non unique index, Oracle adds 1 extra column to keep
the entry unique. There are 2 types of information in that column, if there is only 1 row for the respective
index key, Oracle stores “TERM” in the new column. Apart from that, if there is more than 1 row for any index
key (see index key c1 03, which is storing ID2 = 2), Oracle stores ROWID information and perhaps Oracle will
uses “TERM”as well. It looks like Oracle uses “TERM” for row with lowest ROWID, but it doesn’t means
anything I guess. We can see for row#3 in the branch block, Oracle doesn’t store the complete ROWID
information (no row number information), maybe it is part of the internal algorithm to keep the additional
column (ROWID column) as short as possible.
It should be
01 00 55 d1 00 00
right?

Moving to the leaf block, we can see that Oracle uses different approach to store the ROWID. In
unique index, Oracle keeps the ROWID information in the row header, while in non-unique index, Oracle add
one new column (mentioned as col 1;) to store the ROWID. The purpose of this approach is to keep the index
entry unique (exactly the same reason for branch block).

Composite Index
Composite index is index with more than 1 column as the index key. In this section we are going to
observe composite unique index, let’s create table and index as following and capture the branch and leaf
block dump.

Branch block

Leaf block
It is as expected that this index has 2 columns (ID and ID2), and since this is unique index, ROWID
information is kept in the row header.
We are aware that in single column index, NULL value is not indexed. How about NULL value, is it indexed in
composite index? Let’s observe the behavior by creating small table with single and composite index and
populate it with few rows.
The output shows that for Composite Index, index key will not be created only if all values, for the columns
which are part of the index, are NULL.

Output of TINY_1IDX, there are
2 index entries only:
C1 02 for X = 1
C1 03 for X = 2

Output of TINY_2IDX, there are
3 index entries only. Entry for 4th
row is not created (X = NULL and
Y = NULL)

Function-Based Index
How about function-based index? Does Oracle stores the original value or the result of the function? In
this section we are going to create function-based index using LOWER function as below:
Leaf block
0x 61 = 97
0x 62 = 98
0x 63 = 99

Function-based index is stored using B-tree structure and Oracle stores the result of the function,
instead of the column’s value.

Local and Global Index
It is interesting as well to see how Oracle stores local and global index in partitioned table. For this
purpose, I have created small partitioned table, PART, from ONE_ROW table and then create non unique local
index on ID column and global index on ID2 column.

part.sql

For local index, Oracle stores the index key in the same way Oracle handles ordinary index (in nonpartitioned table). There is nothing special or any different in the structure. This is the capture of partition P10
of PART table.
Interesting result is shown for global index. Instead of storing ROWID information only, Oracle also
stores object_id (or maybe data_object_id) along with index key.
Object ID of partition

ROWID

Let’s take 2 examples from above orange parts and break down the information.
00 00 d6 de01 0056 a400 07
o 0x 00 07 = 7  row number
o 0x 56 a4 = 22180  block number
o 0x 01 00= 256 / 64 = 4  relative file number
o 0xd6 deor 0x00 00 d6 de= 55006 object_id/ data_object_id of partition P20
00 00 d6 df 01 0056 e400 00
o 0x 00 00 = 0  row number
o 0x 56 e4 = 22244  block number
o 0x 01 00= 256 / 64 = 4  relative file number
o 0xd6 dfor 0x00 00 d6 df= 55007 object_id/ data_object_id of partition PX
Bitmap Index
Bitmap index is another option that we can use to index our table. Usually (not always, but in most of
the cases), bitmap index is being used in the table with low cardinality (few distinct values in the column). The
most famous example is to bitmap index onSEX column, where we have only 2 values: female and male (not
sure if someone will requires another entry, such as “half male” or “half female” )
While NULL value is not indexed in B-tree index, it is indexed in Bitmap index. For every index key,
Oracle creates bitmap to track where the data is. The bigger the data (number of rows) the bigger bitmap it is.
Let’s create small table with huge PCTFREE to create more than 1 extent in the table.

bitmap.sql

Begin ROWID
End ROWID
Bitmap information

NULL is indexed
Oracle doesn’t store the exact ROWID to identify the table’s data, but 2 ROWIDs; 1 is the Begin ROWID
and the other is the End ROWID(it’s kind of range of ROWID). To see what is the changes in the bitmap column
if we have more rows, let’s add another 3 rows to the table.

But how Oracle converts the bitmap into ROWID, vice a versa? Ggrrrr, I don’t know yet, still I am trying to get
the information how this thing is working.

IOT
IOT (Index Organized Table) is a special table in Oracle which is maintained and created using B-tree
structure (there will be root – branch – leaf block). Data will be stored in an ordered form (based on primary
key’s column). Oracle will create system generated name for the segment_name, something like
“SYS_IOT_TOP_<object_id>”. Following is an example of branch and leaf block of an IOT without an overflow
segment.

iot.sql
Branch block structure, exactly the same
with ordinary B-tree index

Leaf block structure.
No ROWID information since the table
data is stored together within the index
leaf block.
Information inside green rectangle is the
index part, while information inside
purple rectangle is the table part

We have an option to store the overflow column (column which is not part of primary key) into
another tablespace. In this case, in addition of “SYS_IOT_TOP_<object_id>” segment, Oracle creates one more
segment with this pattern “SYS_IOT_OVER_<object_id>”. When we create an overflow segment, Oracle stores
this information using heap table structure and also creates a pointer in the B-tree structure to point to the
table structure (the relation of IOT and overflow segment is very similar to the relation of heap table and Btree index). The pointer is stored using “DBA.ROWNO” notation, the same notation we will see when we have
row migration or row chaining in table segment.
IOT part
In the table part (purple rectangle),
Oracle stores the pointer to the overflow
segment using “DBA.ROWNO” notation
(red rectangle)

Overflow segment, stored using
heap table structure

Row Migration
The Symptom
When database block doesn’t sufficient enough to hold a row (for example: user update some row
with bigger data), Oracle will move that row into another block and create a pointer to join those 2 rows. This
symptom is called as “Row Migration”. Row Migration gives a negative impact in performance perspective
because it will makes additional consistent to get the data (regardless the access path, whether it is full table
scan or index scan). If the block is still enough to hold the data, Oracle will move the row into another part of
the block (the offset of that row will be changed). Row migration doesn’t give any impact in the structure of
index (if the table has index), so it is independent with the index, the only impacted is the structure of table’s
data block.
Below is the illustration of row migration.
How Things are Working?
I will demonstrate how this symptom is happened and how Oracle creates the pointer for the migrated
row. Below are the complete steps in how to reproduce the symptom.

row_migration.sql

row_migration_trace
.zip

First I created small table with only 2 columns. Then I populated with 20 rows only, but please take a
look for the second column, instead of inserting 1,000 characters (the maximum length of Y column), I put only
single character (this behavior we saw frequently in the application). It will make all those rows packed into
single block.

Before demonstrate row migration, let’s try to update single row with bigger value for Y column using
below update statement. Since there is enough room in the current block, Oracle only moves the row into
another offset within the block, but not to another the block. You can see in the following figure that Oracle
moves the offset of row 0, from 0x1ef8 to 0x1ed8
Before update

After update

Now let’s update the whole table with bigger value (update Y column to 1,000) and check the result.
row 0 is moved from 0x0100570d.0 to
0x010057e.0
In the original row (0x0100570d.0),
Oracle uses nrid to locate to the new
row’s location (0x0100570e.0), and in
the new location, Oracle uses hrid to
point to the original location.
In this case, Oracle doesn’t use ROWID
format in the pointer, but in
“DBA.ROWNO” notation, from above
example:
0x0100570d is the DBA part while 0 is
the row number.

Only pointer information (nrid: ) is left
in the original location (purple
rectangle), all data has been moved
into the new location

In above picture, Oracle only put pointer information in the original location (there is no data
information), it means that Oracle moves all data from old location into the new location). We can identify this
symptom by monitoring “table fetch continued row” session statistic. From below figure we can see there are
15 row migrations when we select the table. Defrag the table (Alter Table Move, Shrink Space, CTAS or Export
– Import data) is the sensible solution for this problem, but that is only for temporary solution, because row
migration is something related with application and table design.
The Impact
As mentioned previously, row migration will increase the number of consistent gets (and probably
physical read as well) during index scan or full table scan. We can identify this behavior by simply enable autotrace (sqlplus) to get the statistics or by turn on events 10200 to dump the consistent gets. Below are the
results from events 10200 for both index scan and full table scan.
For both index scan and full table scan, row migration makes the consistent gets bigger compare to
table without row migration. In index scan example, Oracle requires 2 consistent gets for the table with row
migration (it is only 1 for table without row migration). And for full table scan example, Oracle requires 20
consistent gets for the table with row migration (table without row migration only requires 3 consistent gets).
Index Scan Without Row Migration

Full Table Scan Without Row Migration

Index Scan With Row Migration
Full Table Scan With Row Migration

Row Chaining
Row chaining is happened when single block doesn’t sufficient enough to hold 1 row, due to too many
number of columns or column size is too width. Considering below example, if we update all the 3 columns (B,
C and D) with the maximum 4,000 characters, the total row size will be more than 12,000 byte. With default
block size (8k), it will require at least 2 blocks to hold the row.

row_chained.sql
Original row

After the update, it splits into 3 blocks.
We don’t see hrid information in the
new location, and Oracle didn’t split
the column into several blocks. Every
column will be stored in different block
to avoid split information of column.
From above picture we can see Oracle requires 3 blocks to hold the row, because Oracle will not stores
split column. Column A and B are stored in block 0x01005755, column C is stored in 0x01005757 and column D
is stored in 0x01005756. To identify row chaining, we can use the same session statistic, “table fetch continued
row”.

References
http://www.dbafan.com/book/oracle_index_internals.pdf
http://www.jlcomp.demon.co.uk/03_bitmap_1.doc
http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf
http://www.orafaq.com/node/2810
http://arup.blogspot.com/2011/01/how-oracle-locking-works.html
Jonathan Lewis’s “Oracle Core Essential Internals for DBAs and Developers”
Thomas Kyte’s “Expert Oracle Database Architechture”

What’s Next?
In part 2, I will try to cover the following items, so that we can see complete figure how the internal
things are working.
Undo and Redo
Transaction
Consistent Read
Few other things: deadlocks, snapshot too old, etc.

-heri-

Weitere ähnliche Inhalte

Was ist angesagt?

Sql Queries
Sql QueriesSql Queries
Sql Querieswebicon
 
Unit ii(dsc++)
Unit ii(dsc++)Unit ii(dsc++)
Unit ii(dsc++)Durga Devi
 
Data Structures with C Linked List
Data Structures with C Linked ListData Structures with C Linked List
Data Structures with C Linked ListReazul Islam
 
MYSQL using set operators
MYSQL using set operatorsMYSQL using set operators
MYSQL using set operatorsAhmed Farag
 
Lecture02 abap on line
Lecture02 abap on lineLecture02 abap on line
Lecture02 abap on lineMilind Patil
 
Circular linked list
Circular linked listCircular linked list
Circular linked listmaamir farooq
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinAjay Gupte
 
Doubly & Circular Linked Lists
Doubly & Circular Linked ListsDoubly & Circular Linked Lists
Doubly & Circular Linked ListsAfaq Mansoor Khan
 
Discover the power of Recursive SQL and query transformation with Informix da...
Discover the power of Recursive SQL and query transformation with Informix da...Discover the power of Recursive SQL and query transformation with Informix da...
Discover the power of Recursive SQL and query transformation with Informix da...Ajay Gupte
 
linked list (c#)
 linked list (c#) linked list (c#)
linked list (c#)swajahatr
 
Circular linked list
Circular linked listCircular linked list
Circular linked listdchuynh
 
Data Structure and Algorithms Linked List
Data Structure and Algorithms Linked ListData Structure and Algorithms Linked List
Data Structure and Algorithms Linked ListManishPrajapati78
 
MySQL index optimization techniques
MySQL index optimization techniquesMySQL index optimization techniques
MySQL index optimization techniqueskumar gaurav
 
Deletion from single way linked list and search
Deletion from single way linked list and searchDeletion from single way linked list and search
Deletion from single way linked list and searchEstiak Khan
 

Was ist angesagt? (20)

Sql Queries
Sql QueriesSql Queries
Sql Queries
 
Unit ii(dsc++)
Unit ii(dsc++)Unit ii(dsc++)
Unit ii(dsc++)
 
linked list
linked listlinked list
linked list
 
Data Structures with C Linked List
Data Structures with C Linked ListData Structures with C Linked List
Data Structures with C Linked List
 
MYSQL using set operators
MYSQL using set operatorsMYSQL using set operators
MYSQL using set operators
 
Lecture02 abap on line
Lecture02 abap on lineLecture02 abap on line
Lecture02 abap on line
 
Linklist
LinklistLinklist
Linklist
 
Circular linked list
Circular linked listCircular linked list
Circular linked list
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash Join
 
Doubly & Circular Linked Lists
Doubly & Circular Linked ListsDoubly & Circular Linked Lists
Doubly & Circular Linked Lists
 
linked list
linked list linked list
linked list
 
Linked List
Linked ListLinked List
Linked List
 
Linked lists a
Linked lists aLinked lists a
Linked lists a
 
Discover the power of Recursive SQL and query transformation with Informix da...
Discover the power of Recursive SQL and query transformation with Informix da...Discover the power of Recursive SQL and query transformation with Informix da...
Discover the power of Recursive SQL and query transformation with Informix da...
 
single linked list
single linked listsingle linked list
single linked list
 
linked list (c#)
 linked list (c#) linked list (c#)
linked list (c#)
 
Circular linked list
Circular linked listCircular linked list
Circular linked list
 
Data Structure and Algorithms Linked List
Data Structure and Algorithms Linked ListData Structure and Algorithms Linked List
Data Structure and Algorithms Linked List
 
MySQL index optimization techniques
MySQL index optimization techniquesMySQL index optimization techniques
MySQL index optimization techniques
 
Deletion from single way linked list and search
Deletion from single way linked list and searchDeletion from single way linked list and search
Deletion from single way linked list and search
 

Ähnlich wie The internals (20)

Sap abap
Sap abapSap abap
Sap abap
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorial
 
SQL
SQLSQL
SQL
 
Extentcount
ExtentcountExtentcount
Extentcount
 
Hash join
Hash joinHash join
Hash join
 
Ext2
Ext2Ext2
Ext2
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
Mshd
MshdMshd
Mshd
 
Libre office calc
Libre office calcLibre office calc
Libre office calc
 
Migration
MigrationMigration
Migration
 
Ch02 early system memory management
Ch02 early system  memory managementCh02 early system  memory management
Ch02 early system memory management
 
SQL
SQLSQL
SQL
 
SQL Tutorial
SQL TutorialSQL Tutorial
SQL Tutorial
 
SQL
SQLSQL
SQL
 
Interview Preparation
Interview PreparationInterview Preparation
Interview Preparation
 
Internal tables in sap
Internal tables in sapInternal tables in sap
Internal tables in sap
 
Session2
Session2Session2
Session2
 
Memory management early_systems
Memory management early_systemsMemory management early_systems
Memory management early_systems
 
Unit iii
Unit iiiUnit iii
Unit iii
 
DBMS
DBMSDBMS
DBMS
 

Kürzlich hochgeladen

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Kürzlich hochgeladen (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

The internals

  • 1. The Internals (part 1) Table of Contents Background ............................................................................................................................................................. 2 Heap Table .............................................................................................................................................................. 2 First Level Bitmap ................................................................................................................................................ 3 Second Level Bitmap ........................................................................................................................................... 5 Extent Information .............................................................................................................................................. 6 Data Blocks .......................................................................................................................................................... 8 Data Types ........................................................................................................................................................... 9 Varchar and Char............................................................................................................................................. 9 Number ......................................................................................................................................................... 10 Date and Timestamp ..................................................................................................................................... 11 B-Tree Index .......................................................................................................................................................... 11 Unique Index ..................................................................................................................................................... 12 Non-Unique Index ............................................................................................................................................. 15 Composite Index ............................................................................................................................................... 16 Function-Based Index ........................................................................................................................................ 17 Local and Global Index ...................................................................................................................................... 18 Bitmap Index ......................................................................................................................................................... 20 IOT ......................................................................................................................................................................... 21 Row Migration ....................................................................................................................................................... 23 The Symptom .................................................................................................................................................... 23 How Things are Working? ................................................................................................................................. 24 The Impact......................................................................................................................................................... 27 Row Chaining......................................................................................................................................................... 28 References............................................................................................................................................................. 30 What’s Next? ......................................................................................................................................................... 30
  • 2. Background “When you love someone - you'll do anything you'll do all the crazy things that you can't explain”  Yeah…, that is a few lyric from Bryan Adam’s “When You Love Someone”. Analogy to that, when I like something, I want to understand how the things are working or stored, internally. Thus, while reading few articles and also Jonathan Lewis’s book about Oracle internal structure, I decide to do an exercise regarding Oracle internal structure: table, index, undo, redo, etc. I run these exercises against Oracle 10.2.0.5 in Windows box with ASSM (the most used setting in current production environment). The objectives of this exercise are: 1. 2. 3. 4. 5. 6. Understand the structure of table, b-tree and bitmap index, undo segment and redo record How Oracle store data for several data types How Oracle build the result of query involving undo information DML operation Impact of move/ shrink space command Other symptoms: row migration, deadlock, snapshot too old Heap Table This is the most popular table type in Oracle or even in other RDBMS system. It will store data in unordered way using first fit algorithm. Using simple “ALTER SYSTEM DUMP” command, we can dump the structure of any segment in the datafile. The same method will work for undo segment as well. ALTER SYSTEM DUMP DATAFILE 4 BLOCK MIN 123 BLOCK MAX 130 To observe the structure of heap table, I have created 2 tables: EMPTY and ONE_ROW. The codes of those 2 tables creation are as attached below. Along with the script, I have attached as well the trace file for this exercise. exercise_01.sql The first dump file is coming from empty table (0 record), while the second dump file is coming from table with few records in it, but since the PCTFREE is 99, it is enough to force the number of extent to be 2. In the general, the structure of heap table is like this:
  • 3. In ASSM, Oracle does not use Freelist – Freelist Group, but introduces new mechanism which is called BMB (Bitmap Managed Block) to manage free space in the block. BMB has a structure like B-tree index (root, branch and leaf block) where the free space information of block is being kept inside leaf structure (it is called as First Level Bitmap, L1), while L2 and L3 contains address to another level. For example, L2 contains address of L1 and L3 contains address of L3. But during this entire exercise I cannot see any segment in my test case is having Third Level Bitmap, L3 (maybe due to size of the table). Beside data block address, Oracle also keeps 4 bits information that indicate the available space in the block, as the following: BINARY DECIMAL DESCRIPTION CODE CODE 0000 0 Unformatted 0001 1 FULL 0010 2 0-25% Free 0011 3 25-50% Free 0100 4 50-75% Free 0101 5 75-100% Free First Level Bitmap In this section we can see information, like: # Unformatted blocks 1. 2. 3. 4. How many unformatted/ formatted blocks in the segment HWM and Second Level Bitmap Free block statistics and DBA (data block address) range(s) Transaction ID of locker (if available) Second level bitmap address Summary of free block: nf1: 0-25% free nf2: 25-50% free nf3: 50-75% free nf4: 75-100% free Object ID and locker XID HWM address # Available blocks for storing data Extent information Free block map
  • 4. From above block header information, we can easily see this table has 1 extent only and the extent size is 8 blocks, it is started from 0x010055b9 to 0x010055c0 (these numbers are a DBA, data block address, in hexadecimal format – started with 0x). We can convert those numbers into block number and file number using DBMS_UTILITY package.0x010055b9 is 16799161 in decimal format while0x010055c0 is 16799168. The first 3 blocks are being used for metadata information, such as: First Level Bitmap Block, Second Level Bitmap Block and Extent Header Information. Since the table is empty, Oracle only format the first 3 blocks (for storing metadata information) and leave the other 5 blocks unformatted. The HWM is 0x010055bc, which is block number 3 and since no rows in the table, “#blocks below” HWM is showing 0. 8 blocks are 75-100% free and 5 blocks are FULL 2 extents
  • 5. Above is the dump output of ONE_ROW table with 2 extents in the table. It is interesting to know that when we insert a row (even only one row), Oracle will format all blocks in the extent (unformatted: 0).Freeness status also shows nf4 = 8 which is matched with free blocks map in the bottom of this dump section. This part reminds me to the output of DBMS_SPACE.SPACE_USAGE function (below output is taken from oCheck), and I understand now why this function is quite fast, regardless the size of table, because this function get the information of free space from block header, not scanning the whole table. Second Level Bitmap Below are the output of EMPTY and ONE_ROW table, it contains the address of BMBL1 along with indication of available space. For below 2 tables, we can see in the available space indicator, most of the blocks are 75-100% free (Free: 5) Object ID DBA of L1 DBA to the next section, extent information Available space indicator For comparison, I have created another table with more than 1 entry for L1 (it has 3 entry for L1). Please find below the script and dump output for the details. First 2 L1 entry for this table has available space indicator equal to 1 (means most of the blocks are FULL) and both table has 2 extents, while the last entry is showing 5 (75-100% free) with only 1 extent. Let’s see the complete representation of Second Level Bitmap along with First Level Bitmap of this table.
  • 6. exercise_02.sql #L1 with full space?? #L1 with free space Extent Information Again, I will use the output from BIG_ONE (the output of ONE_ROW can be seen in above attached dump file). I will just read the output of dump to check the result. #extents are 5 with 8 blocks in each extent,
  • 7. so #blocks will be 40 (shown by green circle). Oracle keeps the details of BMB information in this segment. In ASSM, we have 2 types of HWM, High HWM and Low HWM. All blocks below Low HWM are usable (have been formatted) and this is the original HWM in MSSM. All blocks above High HWM have not been unformatted. HWM information This is where Oracle keep details information about BMB to track and manage free space Extent Map This is how to interpret the Auxiliary Map information DBA of L2
  • 8. Data Blocks The next structure in table segment is data block. Let’s go to ONE_ROW table to see the structure. In general we can divide data block structure into 3 parts: header information (there is ITL entry in this part), row directory and table’s row. Oracle uses hexadecimal format for Object ID, not sure what is the reason. Object ID DBA of L1 tsiz: table size hsiz: header size?? ntab: number of table nrow: number of row fsbo: free space begin offset fseo: free space end offset avsp/tosp: average/ total free space (calculated as fsbo – fseo) ITL entry: Xid, transaction ID Uba, undo block address Flag of DML Number of lock (Lck) Scn information Row directory Table’s row The ITL entry can be used to track down which transaction is locking the row and where is the Undo Block Address (UBA) is being kept. We will see all those relations later in the next part. From above output we can see all three rows are being locked by first transaction in the ITL entry (shown by purple circle)
  • 9. Data Types In this section we are going to see how Oracle stores the data in the block. I am going to cover only few data types, such as: Varchar, Char, Number and Date. Below exercise is being used to show how Oracle stored the data in the data block. If the purpose is to see how Oracle stores the data only, we can use DUMP function instead of dumping block using “ALTER SYSTEM DUMP” command, it is faster and easier. most_type.LST Varchar and Char Oracle uses ASCII code to stores both Varchar and Char data type. 65 is ASCII code for A, 66 is B, etc. Since Char data type is fixed width data type, Oracle will use white space (ASCII code 32) for right padding the data. Oracle stores the value in hexadecimal format, so before we apply CHR function, we need to convert the value into decimal first.
  • 10. Number Oracle uses different way to store possitive, negative and zero number. I will cover only Number data type (not Float, Double, etc).For Number data type, in general Oracle follows theserules: 1. 2. 3. 4. 5. 6. 7. 8. 9. First byte is exponential information (10x) Second byte is the Integer part Last byte is negative sign if the value is 0x66 (102) The rest bytes are Decimal part All bytes are stored in hexadecimal format (0x) and Oracle break into 2 bytes each from the beginning The real value for Integer and Decimal part (point 3 and 5) has to be substracted by 1 For possitive number, exponential bit is [value] – [0xC1 (193)] For negative number: a. exponential bit is [0x3E (62)] – [value] b. data bytes is [0x66(102)]– [value] 10. The final number for exponential byte has to be multiplied by 2 11. For zero, Oracle stores 0x80 (128) without exponential and negative sign bytes 0xc1 – 0xc1 = 0 * 2 = 0 0x02 – 1 = 0x01 = 1 Final = 100 * 1 = 1 0xc1 – 0xc1 = 0 * 2 = 0 0x0b – 1 = 0x0a = 10 0x18 – 1 = 0x17 = 23 Final = 100 * 10.23 = 10.23 Last byte is 0x66, negative number 0x3e – 0x3e = 0 * 2 = 0 0x66 – 0x5b – 1 = 0x0a = 10 0x66 – 0x4e – 1 = 0x17 = 23 Final = -1 * 100 * 10.23 = -10.23 0xc1 – 0xc1 = 0 * 2 = 0 0x02 – 1 = 0x01 = 1 0x17 – 1 = 0x16 = 22 0x1f – 1 = 0x1e = 30 Final = 100 * 1.2230 = 1.223
  • 11. After we understand how Oracle stores the Number data type, don’t you curious why Oracle uses 0xC1 and 0x3E as a special number for exponential for possitive and negative number respectively? Why Oracle uses 0x66 as negative sign? Why Oracle didn’t use any other number? These are my best guesses so far: 0XC1 = 193 Maximum value for each byte is 0xFF (255), so the maximum exponential value for possitive number will be 255 – 193 = 62. This number has to be multiplied by 2 according to above rules, so 62 * 2 = 124. Since the Integer part can be 2 digit (maximum value isdecimal 99), we have 1 more digit to be added, so total 124 + 1 = 125. It means the maximum value for possitive value is 9.9999 * 10125 0X3E = 62 0x3E is 62 in decimal value, very nice coincident, right?  so according to the rules, it has to be multiplied by 2, so 62 * 2 = 124 plus 1 more from Integer part of the number. For negative value, the minimum value is 10, so the final will be 124 + 1 = 125. It means the minimum value for negative is -1 * 10125 0X66 = 102 0x66 is 102 in decimal format. According to above rules, the real number is X (negative sign) – stored value – 1. Since the maximum posible value for stored value is 100, so X will be 100 + 1 +1 (this is to avoid the result is 0). So, Oracle uses above number as range for Number data type itself (-1 * 10125 – 9.9 * 10125). The other number data type (Float, double, etc) will be having different “special number” I guess, but it’s enough for me at this stage  And why Oracle uses 0x80 for storing 0 doesn’t make me interested to find out the reason. Date and Timestamp The different between Timestamp and Date format is that Oracle stores zero for Time part in Date data type. There are the rules which are being used for these 2 data types: 1. All data is stored in decimal format 2. For Century and Year part, Oracle add 100 to the stored value. The reason is to support BC and AD dates (please read Thomas Kyte book: Expert Oracle Database Architechture) 3. For Month and Date part, Oracle stores as is 4. For Hour, Minute and Second, we need to substract by 1 to get the real value B-Tree Index Moving to the index part, the first thing to be observed of course B-tree index since this is the most popular index type in the database world. Firstly we are going to see how Oracle stores unique and non-unique index, and after that we will see also how it is working for function-based index.
  • 12. Oracle also uses BMB to track and manage free space in the index segment. So I will not repeat again the explanation for First Level Bitmap and Second Level Bitmap, but I will directly go to the Segment Header (Extent Control Header). For index segment, there are 2 type of structures in the data segment: branch block and leaf block. Unique Index I have created unique index on ONE_ROW (ID) column with pctfree 98 to expand the size of the index. index_unique.LST Object ID DBA of L1 ITL entry: Xid, transaction ID Uba, undo block address Flag of DML Number of lock (Lck) Scn information kdxcolev: index level (0 = leaf block; 1 = branch block) kdxcolok: denotes whether structural block transaction is occurring kdxcoopc: internal operation code kdxconco: index column count kdxcosdc: count of index structural changes involving block kdxconro: number of index entries (does not include kdxbrlmc pointer) kdxcofbo: free space begin offset kdxcofeo: free space end offset kdxcoavs: average free space (calculated as kdxcofbo – kdxcofeo) kdxbrlmc: entry to the leaf block kdxbrsno: last index entry to be modified kdxbrbksz: size of usable block space Index key Address to Leaf Block
  • 13. Above is the output of index dump for branch block. There is only 1 column for every row in the branch block, which holds index key. The address of leaf block is kept in the row header. Now let’s take a look into Leaf Block. Object ID DBA of L1 kdxlespl: bytes of uncommitted data at time of block split that have been cleaned out kdxlende: number of deleted entries kdxlenxt: pointer to the next leaf block in the index structure kdxleprv: pointer to the previous leaf block in the index kdxlebksz: usable block space kdxledsz: size of data in row header Index value ROWID It is clear now that Oracle keeps the linked list information in the leaf block (kdxlenxt and kdxleprv), and those information are being used to move from one leaf to another leaf block. In Unique Index, Oracle stores ROWID information in the row header. ROWID contains information of relative file number, block number and row number. ROWID is being used to pointing to the respective row in the table. It is represented in hexadecimal format. There are 6 bytes in the ROWID, first 2 bytes are relative file number, next 2 bytes are block number and the latest 2 bytes are row number. To break down the ROWID, first convert the value into decimal format and follow below rules (ex. 01 0055 c400 00): 0x 01 00 = 256, to get relative file number, we need to divide by 64, so 256 / 64 = 4 0x 55 c4 = 21956  block number 0x 00 00 = 0  row number So, the results are matched with below query.
  • 14. After we understand how Oracle stores index key and ROWID in the branch and leaf block, let’s try to draw the index structure in different way. Below trace files are being used as source (this index has root – branch – leaf structure). Since the index key comes from numeric column, we can use the same rules for reading data block value for Number data type. And finally we can draw the index’s structure in a “tree” form as below (this is why it’s called as B-tree )
  • 15. Non-Unique Index I am going to use the same table ONE_ROW, add one column (ID2) and insert few duplicate values for ID2. The purpose is to see in non-unique index, how Oracle handles duplicate data in the branch and leaf block. index_unique_add_d uplicate_value.LST Index value with multiple rows Index value with 1 row Index value ROWID First let’s take a look at the branch block. For non unique index, Oracle adds 1 extra column to keep the entry unique. There are 2 types of information in that column, if there is only 1 row for the respective index key, Oracle stores “TERM” in the new column. Apart from that, if there is more than 1 row for any index
  • 16. key (see index key c1 03, which is storing ID2 = 2), Oracle stores ROWID information and perhaps Oracle will uses “TERM”as well. It looks like Oracle uses “TERM” for row with lowest ROWID, but it doesn’t means anything I guess. We can see for row#3 in the branch block, Oracle doesn’t store the complete ROWID information (no row number information), maybe it is part of the internal algorithm to keep the additional column (ROWID column) as short as possible. It should be 01 00 55 d1 00 00 right? Moving to the leaf block, we can see that Oracle uses different approach to store the ROWID. In unique index, Oracle keeps the ROWID information in the row header, while in non-unique index, Oracle add one new column (mentioned as col 1;) to store the ROWID. The purpose of this approach is to keep the index entry unique (exactly the same reason for branch block). Composite Index Composite index is index with more than 1 column as the index key. In this section we are going to observe composite unique index, let’s create table and index as following and capture the branch and leaf block dump. Branch block Leaf block
  • 17. It is as expected that this index has 2 columns (ID and ID2), and since this is unique index, ROWID information is kept in the row header. We are aware that in single column index, NULL value is not indexed. How about NULL value, is it indexed in composite index? Let’s observe the behavior by creating small table with single and composite index and populate it with few rows. The output shows that for Composite Index, index key will not be created only if all values, for the columns which are part of the index, are NULL. Output of TINY_1IDX, there are 2 index entries only: C1 02 for X = 1 C1 03 for X = 2 Output of TINY_2IDX, there are 3 index entries only. Entry for 4th row is not created (X = NULL and Y = NULL) Function-Based Index How about function-based index? Does Oracle stores the original value or the result of the function? In this section we are going to create function-based index using LOWER function as below:
  • 18. Leaf block 0x 61 = 97 0x 62 = 98 0x 63 = 99 Function-based index is stored using B-tree structure and Oracle stores the result of the function, instead of the column’s value. Local and Global Index It is interesting as well to see how Oracle stores local and global index in partitioned table. For this purpose, I have created small partitioned table, PART, from ONE_ROW table and then create non unique local index on ID column and global index on ID2 column. part.sql For local index, Oracle stores the index key in the same way Oracle handles ordinary index (in nonpartitioned table). There is nothing special or any different in the structure. This is the capture of partition P10 of PART table.
  • 19. Interesting result is shown for global index. Instead of storing ROWID information only, Oracle also stores object_id (or maybe data_object_id) along with index key. Object ID of partition ROWID Let’s take 2 examples from above orange parts and break down the information. 00 00 d6 de01 0056 a400 07 o 0x 00 07 = 7  row number o 0x 56 a4 = 22180  block number o 0x 01 00= 256 / 64 = 4  relative file number o 0xd6 deor 0x00 00 d6 de= 55006 object_id/ data_object_id of partition P20 00 00 d6 df 01 0056 e400 00 o 0x 00 00 = 0  row number o 0x 56 e4 = 22244  block number o 0x 01 00= 256 / 64 = 4  relative file number o 0xd6 dfor 0x00 00 d6 df= 55007 object_id/ data_object_id of partition PX
  • 20. Bitmap Index Bitmap index is another option that we can use to index our table. Usually (not always, but in most of the cases), bitmap index is being used in the table with low cardinality (few distinct values in the column). The most famous example is to bitmap index onSEX column, where we have only 2 values: female and male (not sure if someone will requires another entry, such as “half male” or “half female” ) While NULL value is not indexed in B-tree index, it is indexed in Bitmap index. For every index key, Oracle creates bitmap to track where the data is. The bigger the data (number of rows) the bigger bitmap it is. Let’s create small table with huge PCTFREE to create more than 1 extent in the table. bitmap.sql Begin ROWID End ROWID Bitmap information NULL is indexed
  • 21. Oracle doesn’t store the exact ROWID to identify the table’s data, but 2 ROWIDs; 1 is the Begin ROWID and the other is the End ROWID(it’s kind of range of ROWID). To see what is the changes in the bitmap column if we have more rows, let’s add another 3 rows to the table. But how Oracle converts the bitmap into ROWID, vice a versa? Ggrrrr, I don’t know yet, still I am trying to get the information how this thing is working. IOT IOT (Index Organized Table) is a special table in Oracle which is maintained and created using B-tree structure (there will be root – branch – leaf block). Data will be stored in an ordered form (based on primary key’s column). Oracle will create system generated name for the segment_name, something like “SYS_IOT_TOP_<object_id>”. Following is an example of branch and leaf block of an IOT without an overflow segment. iot.sql
  • 22. Branch block structure, exactly the same with ordinary B-tree index Leaf block structure. No ROWID information since the table data is stored together within the index leaf block. Information inside green rectangle is the index part, while information inside purple rectangle is the table part We have an option to store the overflow column (column which is not part of primary key) into another tablespace. In this case, in addition of “SYS_IOT_TOP_<object_id>” segment, Oracle creates one more segment with this pattern “SYS_IOT_OVER_<object_id>”. When we create an overflow segment, Oracle stores this information using heap table structure and also creates a pointer in the B-tree structure to point to the table structure (the relation of IOT and overflow segment is very similar to the relation of heap table and Btree index). The pointer is stored using “DBA.ROWNO” notation, the same notation we will see when we have row migration or row chaining in table segment.
  • 23. IOT part In the table part (purple rectangle), Oracle stores the pointer to the overflow segment using “DBA.ROWNO” notation (red rectangle) Overflow segment, stored using heap table structure Row Migration The Symptom When database block doesn’t sufficient enough to hold a row (for example: user update some row with bigger data), Oracle will move that row into another block and create a pointer to join those 2 rows. This symptom is called as “Row Migration”. Row Migration gives a negative impact in performance perspective because it will makes additional consistent to get the data (regardless the access path, whether it is full table scan or index scan). If the block is still enough to hold the data, Oracle will move the row into another part of the block (the offset of that row will be changed). Row migration doesn’t give any impact in the structure of index (if the table has index), so it is independent with the index, the only impacted is the structure of table’s data block. Below is the illustration of row migration.
  • 24. How Things are Working? I will demonstrate how this symptom is happened and how Oracle creates the pointer for the migrated row. Below are the complete steps in how to reproduce the symptom. row_migration.sql row_migration_trace .zip First I created small table with only 2 columns. Then I populated with 20 rows only, but please take a look for the second column, instead of inserting 1,000 characters (the maximum length of Y column), I put only single character (this behavior we saw frequently in the application). It will make all those rows packed into single block. Before demonstrate row migration, let’s try to update single row with bigger value for Y column using below update statement. Since there is enough room in the current block, Oracle only moves the row into another offset within the block, but not to another the block. You can see in the following figure that Oracle moves the offset of row 0, from 0x1ef8 to 0x1ed8
  • 25. Before update After update Now let’s update the whole table with bigger value (update Y column to 1,000) and check the result.
  • 26. row 0 is moved from 0x0100570d.0 to 0x010057e.0 In the original row (0x0100570d.0), Oracle uses nrid to locate to the new row’s location (0x0100570e.0), and in the new location, Oracle uses hrid to point to the original location. In this case, Oracle doesn’t use ROWID format in the pointer, but in “DBA.ROWNO” notation, from above example: 0x0100570d is the DBA part while 0 is the row number. Only pointer information (nrid: ) is left in the original location (purple rectangle), all data has been moved into the new location In above picture, Oracle only put pointer information in the original location (there is no data information), it means that Oracle moves all data from old location into the new location). We can identify this symptom by monitoring “table fetch continued row” session statistic. From below figure we can see there are 15 row migrations when we select the table. Defrag the table (Alter Table Move, Shrink Space, CTAS or Export – Import data) is the sensible solution for this problem, but that is only for temporary solution, because row migration is something related with application and table design.
  • 27. The Impact As mentioned previously, row migration will increase the number of consistent gets (and probably physical read as well) during index scan or full table scan. We can identify this behavior by simply enable autotrace (sqlplus) to get the statistics or by turn on events 10200 to dump the consistent gets. Below are the results from events 10200 for both index scan and full table scan. For both index scan and full table scan, row migration makes the consistent gets bigger compare to table without row migration. In index scan example, Oracle requires 2 consistent gets for the table with row migration (it is only 1 for table without row migration). And for full table scan example, Oracle requires 20 consistent gets for the table with row migration (table without row migration only requires 3 consistent gets). Index Scan Without Row Migration Full Table Scan Without Row Migration Index Scan With Row Migration
  • 28. Full Table Scan With Row Migration Row Chaining Row chaining is happened when single block doesn’t sufficient enough to hold 1 row, due to too many number of columns or column size is too width. Considering below example, if we update all the 3 columns (B, C and D) with the maximum 4,000 characters, the total row size will be more than 12,000 byte. With default block size (8k), it will require at least 2 blocks to hold the row. row_chained.sql
  • 29. Original row After the update, it splits into 3 blocks. We don’t see hrid information in the new location, and Oracle didn’t split the column into several blocks. Every column will be stored in different block to avoid split information of column.
  • 30. From above picture we can see Oracle requires 3 blocks to hold the row, because Oracle will not stores split column. Column A and B are stored in block 0x01005755, column C is stored in 0x01005757 and column D is stored in 0x01005756. To identify row chaining, we can use the same session statistic, “table fetch continued row”. References http://www.dbafan.com/book/oracle_index_internals.pdf http://www.jlcomp.demon.co.uk/03_bitmap_1.doc http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf http://www.orafaq.com/node/2810 http://arup.blogspot.com/2011/01/how-oracle-locking-works.html Jonathan Lewis’s “Oracle Core Essential Internals for DBAs and Developers” Thomas Kyte’s “Expert Oracle Database Architechture” What’s Next? In part 2, I will try to cover the following items, so that we can see complete figure how the internal things are working. Undo and Redo Transaction Consistent Read Few other things: deadlocks, snapshot too old, etc. -heri-