SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Recipes of Data Warehouse and
Business Intelligence

The NULL values management in the ETL process
The NULL management
•

In the Data Warehouse community , the presence or absence of NULL values ​has always been the subject of
conflicting opinions.

•

It is of great interest to see how seemingly insignificant details, may instead affect the loading and/or the result
of the extracted information, manually, or with the Business Intelligence tools .

•

Topics such as NULL management, have the particular ability to take the form of technical details, stuff for
programmers. We think we can neglect it because of the presence of many other complexities involved in the
development of a Data Warehouse project .

•

Unfortunately, in a Data Warehouse, there is nothing , absolutely nothing that can be overlooked. Each of its
components, is linked to each other and always has consequences on the final result.

•

This means being aware of the problems that may arise in the future to address them now, before it's too late .
Do not forget that in the Data Warehouse the "go back" because of a wrong choice, or even worse, ignored, it
can be very painful.

•

The management of NULL , to put it in technical language , or the management of the absence of
information, to put it in a logical language, it is just one of these topics .
The meaning of NULL
•

In a relational database , then in the majority of the databases that are the basis of the Data Warehouse &
Business Intelligence solutions , a NULL value in a field of a table, means the lack of information, so it is not a
value, but the absence of value .

•

This does not mean that it is a mistake, although it is possible that it is the result of a problem in the system
that provides the data feeding . Often it is not really possible to associate a value.

•

Suppose we consider a loan agreement . Among its various information , there is the closing day of the
contract. It ' obvious that this field remains NULL , as it is an information that you can only see in the future , at
the time of closing . For the moment it will be NULL .

•

Even in the domain of numerical values ​, the presence of NULL has a precise meaning , which is different from
the value 0 (zero). Think of a list of values ​that a customer pays as commissions to a bank . A value of 0 means
that the customer , perhaps because it is connected to a special agreement , pay a 0 value on a given
committee, but that committee is part of the contract. The value NULL may mean that the commission is not
covered because the customer has not that contract.

•

So, the presence of a NULL value , can have many meanings.
The NULL problem
•

Beyond the intrinsic meaning of NULL values​​, what are the consequences to the Data Warehouse? The
problems occur at the data extraction time. Let's see two examples.

•

Example 1
Suppose you have a list of contracts with its own expiration date. For simplicity, we simulate 3 contracts
using the SQL clause “WITH" on the fly to simulate a table with three rows.
The first row represents a contract that expired two days ago, the second line is a contract that you already
know that will expire in 5 days, the third row represents a contract has not expired (NULL). The request (or
report) is to extract all contracts that do not expire in the next 10 days.
The SQL is conceptually very simple: just select all contracts whose expiration date is greater than today +10.
It should be only one. Unfortunately the NULL will produce an incorrect result: 0 rows
SQL>
2
3
4
5
6
7
8

with tab as (
Select 'C1' contr, sysdate-2 data_scad from dual
union all
Select 'C1' contr, sysdate+5 data_scad from dual
union all
Select 'C1' contr, null data_scad from dual)
select * from tab
where data_scad > sysdate+10;

no rows selected
The NULL problem
Example 2
•
Suppose you have a table that contains in a line, the customer and the amount of commissions of all possible
contracts subscribed. Among them, the third column is a commission for that customer does not make
sense, so it has a NULL value. The request is to have the total amount of fees paid by the customer.
•
Even in this case, the SQL solution is very simple: just do the sum of all the commission fields .
Unfortunately, as in the previous example, the presence of NULL values will produce an incorrect result
because it nullifies the sum.
•
These two examples, very simple, show the pitfalls inherent in the presence of NULL values in the Data
Warehouse. Of course you can force, within the current SQL, the default values ​that manage the NULL, but this
should always be done at the risk to forget. I suggest the following rule.
SQL> with tab as (Select 'C1' cliente, 10 com1, 40 com2,null com3,18 com4 from dual )
1 select cliente,com1+com2+com3+com4 tot
2 from tab;
CL
TOT
-- ---------C1
Rule 1
Do not allow into the Data Warehouse the lack of information. Each field must have a default value that goes
to replace the NULL value. This must be done immediately, in the Staging Area, which will be the basis for the
next loading. You must not have NULL values.
The default values
•

As a consequence of the previous rule , we must decide which default values ​must be used to replace NULL
values. In order to make this decision, it is necessary to suggest a new rule:
Rule 2
Simplify the data types to use in the Data Warehouse. Use, if possible, only 2 types: text values ​(for Oracle
VARCHAR2 ) and numerical values ​( for Oracle NUMBER ) . The “day” fields must all be expressed as the
concatenation of year, month and day, ie numerical format YYYYMMDD .

•
•
•
•

•

Obviously , if you have values of CLOB or BLOB type, use these types as well, we donot associate default values​​.
The use of the DATE format , but only for technical fields, may be allowed.
For textual values would try to occupy less space as possible, so not ' Undefined ', but something simple.
Personally , I use ' ? ' .
With regard to the numerical values, the default value can be Zero. While doing so , we lose the meaning of the
absence of information, however, does not produce wrong results (do not forget that in mathematics, 0 is the
neutral element of addition and subtraction ) . If the numerical value representing a day, then the default
should not be zero, but , basically the 99991231 , which is the maximum day. Using these two default , the two
examples we have seen previously would produce a correct result.
For technical fields of DATE type , it may be helpful to set the system date .
The exceptions
•

There are no rules without exceptions. The exceptions are those cases , indeed quite limited , in which the use
of the default value should be avoided because the business logic of the field. Let's take two examples :

•

Sometimes a field that define a day is not valued in the feeding system. It means that it is a day that start from
the beginning of time. In these cases, the default value should not be the maximum day, but the minimum
possible day, as, for example, 1-jan-1111 (11110101).

•

In the customer data table, the full name of the company can be very long. It is often broken into multiple
fields because the limited length of the fields of the feeding system . It means that to get the full name we
must concatenate multiple fields. In this case it would be wrong to use the default value, for example
‘?’, because the concatenation would produce a name full of '? '.
We can then state a new rule.

•

Rule 3
The choice of the global default values ​and the values ​of the exceptions (and the maintenance of the NULL
value is one of the options) must be decided on the basis of business requirement. It will be the analysis
phase to determine this choice .
The recipe
•

We will create a Staging Area table. This table will have, for each field, the definition of the default value, which
will be set according to the general rule and will take account of the exceptions.

•

The SQL statement that will replace the NULL values with the default value, will act as post-processing. I call it
the enrichment phase of Staging Area

•

This implementation will use a configuration table that make easy the creation of dynamic SQL statements that
could be used for all Staging Area tables. This will provide maximum scalability to the solution.

•

To do this, we need a naming convention. I have written several times about the importance of naming
convention inside a Data Warehouse project. In this implementation, we have the following conventions:





EDW = project code
COM_MEF = Common Area (COM), subarea Micro ETL Foundation (MEF)
CUST = data source code
STA_SS1 = Staging Area (STA), subarea Source System 1 (SS1)
Global Configuration of the default values

•

Let's start by creating a configuration table for the entire
Data Warehouse.
•
In it we will set the default values for the data types used.
•
In the SQL statement at your right, we will create the
table and initialize it with the default values ​that we
decided:
– a question mark for the text values​​
– zero for numeric values
– 99991231 for DATE fields in numeric format
– the system date for the date type of the technical
fields.
If we see the contents of the table, we get:

SQL> CREATE TABLE EDW_COM_MEF_CFT (
2 DEF_V VARCHAR2(30)
3 ,DEF_N NUMBER
4 ,DEF_YMD NUMBER
5 ,DEF_D VARCHAR2(30)
6 );
Table created.
SQL> INSERT INTO EDW_COM_MEF_CFT
2 VALUES (''''||'?'||'''',0,99991231,'SYSDATE');
1 row created.
Data Source Configuration
•

At this point we create the configuration table of
the data source file, with the following structure.
–
–
–
–

•

the unique code of the data source
the name of the table that configures the
fields of the data source
the name of the object with the data to be
loaded into the staging table
the name of the staging table.

This configuration table is very important because it
will allow us to generalize the loading process using
dynamic SQL statements.

SQL> CREATE TABLE EDW_COM_MEF_IO_CFT (
2 IO_COD VARCHAR2(10)
3 ,CXT_COD VARCHAR2(30)
4 ,FXV_COD VARCHAR2(30)
5 ,STT_COD VARCHAR2(30)
6 );
Table created.
SQL>
SQL> INSERT INTO EDW_COM_MEF_IO_CFT
2 VALUES ('CUST'
3 ,'EDW_STA_SS1_CUST_CXT'
4 ,'EDW_STA_SS1_CUST_FXV'
5 ,'EDW_STA_SS1_CUST_STT'
6 );
1 row created.
Creating and configuring the detail table of the data source
•

After configuring the data source, you must
configure its columns, (which will be the
same of the Staging table), their
type, and, what we need, the default value if
you want to make an exception to the global
default value for that data type.

SQL> CREATE TABLE EDW_STA_SS1_CUST_CXT (
2 COLUMN_COD VARCHAR2(30)
3 ,DATA_TYPE VARCHAR2(30)
4 ,DEF_TXT VARCHAR2(30)
5 );
Table created.
SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('KEY_ID','NUMBER',NULL);

•

With this configuration, we want to leave the
global default values ​for fields
KEY_ID, F1_COD, F2_NUM F4_DAT.

1 row created.
SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F1_COD','VARCHAR2',NULL);
1 row created.

•

We want to force a different default value for
specific fields F3_YMD and F5_COD.

SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F2_NUM','NUMBER',NULL);
1 row created.
SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F3_YMD','NUMBER',11110101);
1 row created.
SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F4_DAT','DATE',NULL);
1 row created.
SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F5_COD','VARCHAR2','NULL');
1 row created.
Simulation of source data
•
•
•
•

We simulate a data source with two lines, one with
all NULL values and one with real values.
The source data, in a real case, it could be a regular
table, an external table to the physical source or
otherwise.
We will use the WITH clause to create a view that
simulates the two lines. This is done solely for
convenience of exposition.
The content of the table will be:

SQL> CREATE OR REPLACE VIEW EDW_STA_SS1_CUST_FXV AS
2 SELECT
3 CAST(1 AS NUMBER) KEY_ID
4 ,CAST(NULL AS VARCHAR2(30)) F1_COD
5 ,CAST(NULL AS NUMBER) F2_NUM
6 ,CAST(NULL AS NUMBER) F3_YMD
7 ,CAST(NULL AS DATE) F4_DAT
8 ,CAST(NULL AS VARCHAR2(30)) F5_COD
9 FROM DUAL
10 UNION ALL
11 SELECT 2 KEY_ID
12 ,'CODE1' F1_COD
13 ,250 F2_NUM
14 ,20140207 F3_YMD
15 ,sysdate-10 F4_DAT
16 ,'CODE2'
17 FROM DUAL;
View created
Creating the Staging Area Table
•

We create the Staging Area table that will be
loaded from the data source showed in the
previous slide.

SQL> CREATE TABLE EDW_STA_SS1_CUST_STT (
2 KEY_ID NUMBER
3 ,F1_COD VARCHAR2(30)
4 ,F2_NUM NUMBER
5 ,F3_YMD NUMBER
6 ,F4_DAT DATE
7 ,F5_COD VARCHAR2(30)
8 );
Table created.
Setting the default values for the Staging Area table
•

•
•

Using the above settings, we can create a
dynamic procedure, which receiving the input
source code is able to set the default value to
the Staging Area table
It will associate the general approach when
there is no exception present in the
configuration table.
The names of the columns involved are
extracted directly from Oracle's data
dictionary (COLS, ie USER_TAB_COLUMNS)

After you create the procedure, we can run it.
SQL> exec p_default ('CUST');

•

We can verify the outcome of the procedure
by seeing the table structure present in the
data dictionary. It will show the default
values ​setted in the field
USER_TAB_COLUMNS.DATA_DEFAULT

create or replace procedure p_default(p_io varchar2) as
v_sql varchar2(4000);
v_io edw_com_mef_io_cft%rowtype;
v_cft edw_com_mef_cft%rowtype;
v_def varchar2(60);
type t_rc is ref cursor;
v_cur t_rc;
v_column_name varchar2(30);
v_data_type varchar2(30);
v_def_txt varchar2(30);
begin
select * into v_cft from edw_com_mef_cft;
select * into v_io from edw_com_mef_io_cft where io_cod = p_io;
v_sql := 'select a.column_name,a.data_type,b.def_txt'||
' from cols a'||' left outer join '||v_io.cxt_cod||' b'||
' on (a.column_name = b.column_cod)'||
' where a.table_name = '||''''||v_io.stt_cod||'''';
open v_cur for v_sql;
loop
fetch v_cur into v_column_name,v_data_type,v_def_txt;
exit when v_cur%notfound;
if (v_data_type = 'NUMBER') then
if (v_column_name like '%_YMD') then v_def := vl(v_def_txt,v_cft.def_ymd);
else v_def := nvl(v_def_txt,v_cft.def_n);
end if;
elsif (v_data_type = 'DATE') then v_def := nvl(v_def_txt,v_cft.def_d);
else v_def := nvl(v_def_txt,v_cft.def_v);
end if;
v_sql := 'ALTER TABLE '||v_io.stt_cod||
' MODIFY('||v_column_name||' DEFAULT '||v_def||')';
execute immediate v_sql;
end loop;
close v_cur;
end;
/
Loading Staging Area table
•

•
•
•

In order to load the data, we can use the following
procedure, dynamic and usable for any Staging Area
table.
After you create the procedure, we can run it.
SQL> exec p_ins_stt ('CUST');
We can verify the outcome of the procedure by
removing rows from the Staging table .
I wish to emphasize the fact that the load should not
change the data source. Obviously the forcing of the
default data could be performed at the time of
loading of the Staging table . The reason why it is
convenient to do as post-processing, is related to
the presence of consistency checks that we want to
implement on the input data. In order to perform
these checks, the data must not be modified or
changed, must be the same. Only after the positive
outcome of the checks we can enrich the data with
the default values.

create or replace procedure p_ins_stt(p_io varchar2) as
v_io edw_com_mef_io_cft%rowtype;
v_sql varchar2(32000);
v_list varchar2(4000);
begin
select * into v_io
from edw_com_mef_io_cft
where io_cod = p_io;
v_sql :=
'select listagg(f.column_name,'||''''||','||''''||') '||
'within group (order by f.column_id) '||
'from cols f '||
'inner join cols t on ( f.column_name = t.column_name '||
'and t.table_name = upper('||''''||v_io.stt_cod||''''||')) '||
'where f.table_name = upper('||''''||v_io.fxv_cod||''''||')';
execute immediate v_sql into v_list;
v_sql := 'insert into '||v_io.stt_cod||'('||v_list||')'||
' select distinct '||v_list||' from '||v_io.fxv_cod;
execute immediate v_sql;
commit;
end;
/
Creating the Function to extract the default values
•

•

The creation of this function is useful for getting in a
readable format, the default value from the Oracle data
dictionary as it is of type LONG.
This function will use it in the next procedure.

create or replace function f_dd(
p_tab varchar2
, p_col varchar2
) return varchar2 as
v_out varchar2(4000);
begin
select data_default
into v_out
from cols
where table_name = p_tab
and column_name = p_col;
return nvl(v_out,'null');
end;
/
sho errors
Updating the Staging Area table
•

With the help of the previous function, we can
now create a procedure that will change all
NULL values ​according to the default value.

•

We can launch it with:
> p_upd_stt exec ('CUST');

Now verify the result.

create or replace procedure p_upd_stt(p_io varchar2) as
v_sql clob;
v_io edw_com_mef_io_cft%rowtype;
begin
select * into v_io
from edw_com_mef_io_cft
where io_cod = p_io;
for r in (select ','||column_name||' = '||
'nvl('||column_name||
','||f_dd(table_name,column_name)||')' stm
from cols
where table_name = v_io.stt_cod) loop
v_sql := v_sql ||r.stm;
end loop;
v_sql := 'UPDATE '||v_io.stt_cod||' SET '||substr(v_sql,2);
execute immediate v_sql;
commit;
end;
/
Flow of the NULL management in a Data Warehouse
4

1
<prj>_ COM_MEF_IO_CFT

<prj>_ STA_<sio>_<io>_FXV

IO_COD

CXT_COD

FXV_COD

STT_COD

<key_id>

<io>

<prj>_STA_<sio>_<io>_CXT

<prj>_ STA_<sio>_<io>_FXV

<prj>_ STA_<sio>_<io>_STT

1
2

<f1_cod>

<f2>_num

<f3_ymd>

<f4_dat>

<f5_cod>

CODE1

250

20140207

27/01/2014 9.34.35

CODE2

2
p_ins_stt (<io>)

6

5

<prj>_COM_MEF_CFT

<prj>_ STA_<sio>_<io>_STT

DEF_V

DEF_N

DEF_YMD

DEF_D

‘?’

0

99991231

SYSDATE

p_default (<io>)

<key_id>

<f1_cod>

<f2>_num

<f3_ymd>

<f4_dat>

<f5_cod>

CODE1

250

20140207

27/01/2014 9.34.35

CODE2

1
2

3

p_upd_stt (<io>)

<prj>_STA_<sio>_<io>_CXT
column_cod

DATA_TYPE

<key_id>

NUMBER

<f1_cod>

NUMBER

<f4_dat>

DATE

<f5_cod>

1.
2.
3.
4.
5.
6.
7.

NUMBER

<f3_ymd>

VARCHAR2

<prj>_ STA_<sio>_<io>_STT

VARCHAR2

<f2>_num

DEF_TXT

7

Data Dictionary
(cols)

<key_id>
11110101

NULL

<prj> = Project code
<sio> = Sorce Subsystem
<io> = source cod

Configure the data source
Configure the global default values
Configure the exception values for every field of data source
Load (simulate) the data source
Set the default values for the Staging Area table
Load the Staging Area table
Update the Staging Area table with the default values

<f1_cod>

<f2>_num

<f3_ymd>

<f4_dat>

1

?

0

11110101

23/02/2014 11.21.30

2

CODE1

250

20140207

27/01/2014 9.34.35

<f5_cod>

CODE2

Weitere ähnliche Inhalte

Was ist angesagt?

Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Massimo Cenci
 
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Massimo Cenci
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Massimo Cenci
 
Multiple files single target single interface
Multiple files single target single interfaceMultiple files single target single interface
Multiple files single target single interfaceDharmaraj Borse
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 Richie Rump
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL ISankhya_Analytics
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017Dave Stokes
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System iChuck Walker
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsDave Stokes
 
Sql introduction
Sql introductionSql introduction
Sql introductionvimal_guru
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questionsNaveen P
 

Was ist angesagt? (20)

Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
 
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
 
DataBase Management System Lab File
DataBase Management System Lab FileDataBase Management System Lab File
DataBase Management System Lab File
 
Multiple files single target single interface
Multiple files single target single interfaceMultiple files single target single interface
Multiple files single target single interface
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
 
Dbms lab Manual
Dbms lab ManualDbms lab Manual
Dbms lab Manual
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017
 
Oracle notes
Oracle notesOracle notes
Oracle notes
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System i
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
Sql loader good example
Sql loader good exampleSql loader good example
Sql loader good example
 
Oracle sql loader utility
Oracle sql loader utilityOracle sql loader utility
Oracle sql loader utility
 
Ado.Net
Ado.NetAdo.Net
Ado.Net
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
 

Ähnlich wie Recipe 5 of Data Warehouse and Business Intelligence - The null values management in the etl process

Handling errors in t sql code (1)
Handling errors in t sql code (1)Handling errors in t sql code (1)
Handling errors in t sql code (1)Ris Fernandez
 
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweq
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweqdbms-unit-_part-1.pptxeqweqweqweqweqweqweqweq
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweqwrushabhsirsat
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersBRIJESH KUMAR
 
Randomizing Data With SQL Server
Randomizing Data With SQL ServerRandomizing Data With SQL Server
Randomizing Data With SQL ServerWally Pons
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptSourabhPal46
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptMard Geer
 
fdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptxfdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptxhesham alataby
 
Handling SQL Server Null Values
Handling SQL Server Null ValuesHandling SQL Server Null Values
Handling SQL Server Null ValuesDuncan Greaves PhD
 
Managing the Complexities of Conversion to S1000D
Managing the Complexities of Conversion to S1000DManaging the Complexities of Conversion to S1000D
Managing the Complexities of Conversion to S1000Ddclsocialmedia
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
 
Oracle Sql Tuning
Oracle Sql TuningOracle Sql Tuning
Oracle Sql TuningChris Adkin
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Mahesh Vallampati
 
problem solving and design By ZAK
problem solving and design By ZAKproblem solving and design By ZAK
problem solving and design By ZAKTabsheer Hasan
 
The ultimate-guide-to-sql
The ultimate-guide-to-sqlThe ultimate-guide-to-sql
The ultimate-guide-to-sqlMcNamaraChiwaye
 

Ähnlich wie Recipe 5 of Data Warehouse and Business Intelligence - The null values management in the etl process (20)

Oracle SQL Basics
Oracle SQL BasicsOracle SQL Basics
Oracle SQL Basics
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
Handling errors in t sql code (1)
Handling errors in t sql code (1)Handling errors in t sql code (1)
Handling errors in t sql code (1)
 
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweq
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweqdbms-unit-_part-1.pptxeqweqweqweqweqweqweqweq
dbms-unit-_part-1.pptxeqweqweqweqweqweqweqweq
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for Developers
 
Randomizing Data With SQL Server
Randomizing Data With SQL ServerRandomizing Data With SQL Server
Randomizing Data With SQL Server
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.ppt
 
lec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.pptlec_4_data_structures_and_algorithm_analysis.ppt
lec_4_data_structures_and_algorithm_analysis.ppt
 
fdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptxfdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptx
 
Handling SQL Server Null Values
Handling SQL Server Null ValuesHandling SQL Server Null Values
Handling SQL Server Null Values
 
Managing the Complexities of Conversion to S1000D
Managing the Complexities of Conversion to S1000DManaging the Complexities of Conversion to S1000D
Managing the Complexities of Conversion to S1000D
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4
 
Oracle Sql Tuning
Oracle Sql TuningOracle Sql Tuning
Oracle Sql Tuning
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
 
problem solving and design By ZAK
problem solving and design By ZAKproblem solving and design By ZAK
problem solving and design By ZAK
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
The ultimate-guide-to-sql
The ultimate-guide-to-sqlThe ultimate-guide-to-sql
The ultimate-guide-to-sql
 

Mehr von Massimo Cenci

Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaMassimo Cenci
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlMassimo Cenci
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Massimo Cenci
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniMassimo Cenci
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongMassimo Cenci
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmerMassimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Massimo Cenci
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlMassimo Cenci
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiMassimo Cenci
 

Mehr von Massimo Cenci (16)

Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
 

Kürzlich hochgeladen

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Kürzlich hochgeladen (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Recipe 5 of Data Warehouse and Business Intelligence - The null values management in the etl process

  • 1. Recipes of Data Warehouse and Business Intelligence The NULL values management in the ETL process
  • 2. The NULL management • In the Data Warehouse community , the presence or absence of NULL values ​has always been the subject of conflicting opinions. • It is of great interest to see how seemingly insignificant details, may instead affect the loading and/or the result of the extracted information, manually, or with the Business Intelligence tools . • Topics such as NULL management, have the particular ability to take the form of technical details, stuff for programmers. We think we can neglect it because of the presence of many other complexities involved in the development of a Data Warehouse project . • Unfortunately, in a Data Warehouse, there is nothing , absolutely nothing that can be overlooked. Each of its components, is linked to each other and always has consequences on the final result. • This means being aware of the problems that may arise in the future to address them now, before it's too late . Do not forget that in the Data Warehouse the "go back" because of a wrong choice, or even worse, ignored, it can be very painful. • The management of NULL , to put it in technical language , or the management of the absence of information, to put it in a logical language, it is just one of these topics .
  • 3. The meaning of NULL • In a relational database , then in the majority of the databases that are the basis of the Data Warehouse & Business Intelligence solutions , a NULL value in a field of a table, means the lack of information, so it is not a value, but the absence of value . • This does not mean that it is a mistake, although it is possible that it is the result of a problem in the system that provides the data feeding . Often it is not really possible to associate a value. • Suppose we consider a loan agreement . Among its various information , there is the closing day of the contract. It ' obvious that this field remains NULL , as it is an information that you can only see in the future , at the time of closing . For the moment it will be NULL . • Even in the domain of numerical values ​, the presence of NULL has a precise meaning , which is different from the value 0 (zero). Think of a list of values ​that a customer pays as commissions to a bank . A value of 0 means that the customer , perhaps because it is connected to a special agreement , pay a 0 value on a given committee, but that committee is part of the contract. The value NULL may mean that the commission is not covered because the customer has not that contract. • So, the presence of a NULL value , can have many meanings.
  • 4. The NULL problem • Beyond the intrinsic meaning of NULL values​​, what are the consequences to the Data Warehouse? The problems occur at the data extraction time. Let's see two examples. • Example 1 Suppose you have a list of contracts with its own expiration date. For simplicity, we simulate 3 contracts using the SQL clause “WITH" on the fly to simulate a table with three rows. The first row represents a contract that expired two days ago, the second line is a contract that you already know that will expire in 5 days, the third row represents a contract has not expired (NULL). The request (or report) is to extract all contracts that do not expire in the next 10 days. The SQL is conceptually very simple: just select all contracts whose expiration date is greater than today +10. It should be only one. Unfortunately the NULL will produce an incorrect result: 0 rows SQL> 2 3 4 5 6 7 8 with tab as ( Select 'C1' contr, sysdate-2 data_scad from dual union all Select 'C1' contr, sysdate+5 data_scad from dual union all Select 'C1' contr, null data_scad from dual) select * from tab where data_scad > sysdate+10; no rows selected
  • 5. The NULL problem Example 2 • Suppose you have a table that contains in a line, the customer and the amount of commissions of all possible contracts subscribed. Among them, the third column is a commission for that customer does not make sense, so it has a NULL value. The request is to have the total amount of fees paid by the customer. • Even in this case, the SQL solution is very simple: just do the sum of all the commission fields . Unfortunately, as in the previous example, the presence of NULL values will produce an incorrect result because it nullifies the sum. • These two examples, very simple, show the pitfalls inherent in the presence of NULL values in the Data Warehouse. Of course you can force, within the current SQL, the default values ​that manage the NULL, but this should always be done at the risk to forget. I suggest the following rule. SQL> with tab as (Select 'C1' cliente, 10 com1, 40 com2,null com3,18 com4 from dual ) 1 select cliente,com1+com2+com3+com4 tot 2 from tab; CL TOT -- ---------C1 Rule 1 Do not allow into the Data Warehouse the lack of information. Each field must have a default value that goes to replace the NULL value. This must be done immediately, in the Staging Area, which will be the basis for the next loading. You must not have NULL values.
  • 6. The default values • As a consequence of the previous rule , we must decide which default values ​must be used to replace NULL values. In order to make this decision, it is necessary to suggest a new rule: Rule 2 Simplify the data types to use in the Data Warehouse. Use, if possible, only 2 types: text values ​(for Oracle VARCHAR2 ) and numerical values ​( for Oracle NUMBER ) . The “day” fields must all be expressed as the concatenation of year, month and day, ie numerical format YYYYMMDD . • • • • • Obviously , if you have values of CLOB or BLOB type, use these types as well, we donot associate default values​​. The use of the DATE format , but only for technical fields, may be allowed. For textual values would try to occupy less space as possible, so not ' Undefined ', but something simple. Personally , I use ' ? ' . With regard to the numerical values, the default value can be Zero. While doing so , we lose the meaning of the absence of information, however, does not produce wrong results (do not forget that in mathematics, 0 is the neutral element of addition and subtraction ) . If the numerical value representing a day, then the default should not be zero, but , basically the 99991231 , which is the maximum day. Using these two default , the two examples we have seen previously would produce a correct result. For technical fields of DATE type , it may be helpful to set the system date .
  • 7. The exceptions • There are no rules without exceptions. The exceptions are those cases , indeed quite limited , in which the use of the default value should be avoided because the business logic of the field. Let's take two examples : • Sometimes a field that define a day is not valued in the feeding system. It means that it is a day that start from the beginning of time. In these cases, the default value should not be the maximum day, but the minimum possible day, as, for example, 1-jan-1111 (11110101). • In the customer data table, the full name of the company can be very long. It is often broken into multiple fields because the limited length of the fields of the feeding system . It means that to get the full name we must concatenate multiple fields. In this case it would be wrong to use the default value, for example ‘?’, because the concatenation would produce a name full of '? '. We can then state a new rule. • Rule 3 The choice of the global default values ​and the values ​of the exceptions (and the maintenance of the NULL value is one of the options) must be decided on the basis of business requirement. It will be the analysis phase to determine this choice .
  • 8. The recipe • We will create a Staging Area table. This table will have, for each field, the definition of the default value, which will be set according to the general rule and will take account of the exceptions. • The SQL statement that will replace the NULL values with the default value, will act as post-processing. I call it the enrichment phase of Staging Area • This implementation will use a configuration table that make easy the creation of dynamic SQL statements that could be used for all Staging Area tables. This will provide maximum scalability to the solution. • To do this, we need a naming convention. I have written several times about the importance of naming convention inside a Data Warehouse project. In this implementation, we have the following conventions:     EDW = project code COM_MEF = Common Area (COM), subarea Micro ETL Foundation (MEF) CUST = data source code STA_SS1 = Staging Area (STA), subarea Source System 1 (SS1)
  • 9. Global Configuration of the default values • Let's start by creating a configuration table for the entire Data Warehouse. • In it we will set the default values for the data types used. • In the SQL statement at your right, we will create the table and initialize it with the default values ​that we decided: – a question mark for the text values​​ – zero for numeric values – 99991231 for DATE fields in numeric format – the system date for the date type of the technical fields. If we see the contents of the table, we get: SQL> CREATE TABLE EDW_COM_MEF_CFT ( 2 DEF_V VARCHAR2(30) 3 ,DEF_N NUMBER 4 ,DEF_YMD NUMBER 5 ,DEF_D VARCHAR2(30) 6 ); Table created. SQL> INSERT INTO EDW_COM_MEF_CFT 2 VALUES (''''||'?'||'''',0,99991231,'SYSDATE'); 1 row created.
  • 10. Data Source Configuration • At this point we create the configuration table of the data source file, with the following structure. – – – – • the unique code of the data source the name of the table that configures the fields of the data source the name of the object with the data to be loaded into the staging table the name of the staging table. This configuration table is very important because it will allow us to generalize the loading process using dynamic SQL statements. SQL> CREATE TABLE EDW_COM_MEF_IO_CFT ( 2 IO_COD VARCHAR2(10) 3 ,CXT_COD VARCHAR2(30) 4 ,FXV_COD VARCHAR2(30) 5 ,STT_COD VARCHAR2(30) 6 ); Table created. SQL> SQL> INSERT INTO EDW_COM_MEF_IO_CFT 2 VALUES ('CUST' 3 ,'EDW_STA_SS1_CUST_CXT' 4 ,'EDW_STA_SS1_CUST_FXV' 5 ,'EDW_STA_SS1_CUST_STT' 6 ); 1 row created.
  • 11. Creating and configuring the detail table of the data source • After configuring the data source, you must configure its columns, (which will be the same of the Staging table), their type, and, what we need, the default value if you want to make an exception to the global default value for that data type. SQL> CREATE TABLE EDW_STA_SS1_CUST_CXT ( 2 COLUMN_COD VARCHAR2(30) 3 ,DATA_TYPE VARCHAR2(30) 4 ,DEF_TXT VARCHAR2(30) 5 ); Table created. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('KEY_ID','NUMBER',NULL); • With this configuration, we want to leave the global default values ​for fields KEY_ID, F1_COD, F2_NUM F4_DAT. 1 row created. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F1_COD','VARCHAR2',NULL); 1 row created. • We want to force a different default value for specific fields F3_YMD and F5_COD. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F2_NUM','NUMBER',NULL); 1 row created. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F3_YMD','NUMBER',11110101); 1 row created. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F4_DAT','DATE',NULL); 1 row created. SQL> INSERT INTO EDW_STA_SS1_CUST_CXT VALUES ('F5_COD','VARCHAR2','NULL'); 1 row created.
  • 12. Simulation of source data • • • • We simulate a data source with two lines, one with all NULL values and one with real values. The source data, in a real case, it could be a regular table, an external table to the physical source or otherwise. We will use the WITH clause to create a view that simulates the two lines. This is done solely for convenience of exposition. The content of the table will be: SQL> CREATE OR REPLACE VIEW EDW_STA_SS1_CUST_FXV AS 2 SELECT 3 CAST(1 AS NUMBER) KEY_ID 4 ,CAST(NULL AS VARCHAR2(30)) F1_COD 5 ,CAST(NULL AS NUMBER) F2_NUM 6 ,CAST(NULL AS NUMBER) F3_YMD 7 ,CAST(NULL AS DATE) F4_DAT 8 ,CAST(NULL AS VARCHAR2(30)) F5_COD 9 FROM DUAL 10 UNION ALL 11 SELECT 2 KEY_ID 12 ,'CODE1' F1_COD 13 ,250 F2_NUM 14 ,20140207 F3_YMD 15 ,sysdate-10 F4_DAT 16 ,'CODE2' 17 FROM DUAL; View created
  • 13. Creating the Staging Area Table • We create the Staging Area table that will be loaded from the data source showed in the previous slide. SQL> CREATE TABLE EDW_STA_SS1_CUST_STT ( 2 KEY_ID NUMBER 3 ,F1_COD VARCHAR2(30) 4 ,F2_NUM NUMBER 5 ,F3_YMD NUMBER 6 ,F4_DAT DATE 7 ,F5_COD VARCHAR2(30) 8 ); Table created.
  • 14. Setting the default values for the Staging Area table • • • Using the above settings, we can create a dynamic procedure, which receiving the input source code is able to set the default value to the Staging Area table It will associate the general approach when there is no exception present in the configuration table. The names of the columns involved are extracted directly from Oracle's data dictionary (COLS, ie USER_TAB_COLUMNS) After you create the procedure, we can run it. SQL> exec p_default ('CUST'); • We can verify the outcome of the procedure by seeing the table structure present in the data dictionary. It will show the default values ​setted in the field USER_TAB_COLUMNS.DATA_DEFAULT create or replace procedure p_default(p_io varchar2) as v_sql varchar2(4000); v_io edw_com_mef_io_cft%rowtype; v_cft edw_com_mef_cft%rowtype; v_def varchar2(60); type t_rc is ref cursor; v_cur t_rc; v_column_name varchar2(30); v_data_type varchar2(30); v_def_txt varchar2(30); begin select * into v_cft from edw_com_mef_cft; select * into v_io from edw_com_mef_io_cft where io_cod = p_io; v_sql := 'select a.column_name,a.data_type,b.def_txt'|| ' from cols a'||' left outer join '||v_io.cxt_cod||' b'|| ' on (a.column_name = b.column_cod)'|| ' where a.table_name = '||''''||v_io.stt_cod||''''; open v_cur for v_sql; loop fetch v_cur into v_column_name,v_data_type,v_def_txt; exit when v_cur%notfound; if (v_data_type = 'NUMBER') then if (v_column_name like '%_YMD') then v_def := vl(v_def_txt,v_cft.def_ymd); else v_def := nvl(v_def_txt,v_cft.def_n); end if; elsif (v_data_type = 'DATE') then v_def := nvl(v_def_txt,v_cft.def_d); else v_def := nvl(v_def_txt,v_cft.def_v); end if; v_sql := 'ALTER TABLE '||v_io.stt_cod|| ' MODIFY('||v_column_name||' DEFAULT '||v_def||')'; execute immediate v_sql; end loop; close v_cur; end; /
  • 15. Loading Staging Area table • • • • In order to load the data, we can use the following procedure, dynamic and usable for any Staging Area table. After you create the procedure, we can run it. SQL> exec p_ins_stt ('CUST'); We can verify the outcome of the procedure by removing rows from the Staging table . I wish to emphasize the fact that the load should not change the data source. Obviously the forcing of the default data could be performed at the time of loading of the Staging table . The reason why it is convenient to do as post-processing, is related to the presence of consistency checks that we want to implement on the input data. In order to perform these checks, the data must not be modified or changed, must be the same. Only after the positive outcome of the checks we can enrich the data with the default values. create or replace procedure p_ins_stt(p_io varchar2) as v_io edw_com_mef_io_cft%rowtype; v_sql varchar2(32000); v_list varchar2(4000); begin select * into v_io from edw_com_mef_io_cft where io_cod = p_io; v_sql := 'select listagg(f.column_name,'||''''||','||''''||') '|| 'within group (order by f.column_id) '|| 'from cols f '|| 'inner join cols t on ( f.column_name = t.column_name '|| 'and t.table_name = upper('||''''||v_io.stt_cod||''''||')) '|| 'where f.table_name = upper('||''''||v_io.fxv_cod||''''||')'; execute immediate v_sql into v_list; v_sql := 'insert into '||v_io.stt_cod||'('||v_list||')'|| ' select distinct '||v_list||' from '||v_io.fxv_cod; execute immediate v_sql; commit; end; /
  • 16. Creating the Function to extract the default values • • The creation of this function is useful for getting in a readable format, the default value from the Oracle data dictionary as it is of type LONG. This function will use it in the next procedure. create or replace function f_dd( p_tab varchar2 , p_col varchar2 ) return varchar2 as v_out varchar2(4000); begin select data_default into v_out from cols where table_name = p_tab and column_name = p_col; return nvl(v_out,'null'); end; / sho errors
  • 17. Updating the Staging Area table • With the help of the previous function, we can now create a procedure that will change all NULL values ​according to the default value. • We can launch it with: > p_upd_stt exec ('CUST'); Now verify the result. create or replace procedure p_upd_stt(p_io varchar2) as v_sql clob; v_io edw_com_mef_io_cft%rowtype; begin select * into v_io from edw_com_mef_io_cft where io_cod = p_io; for r in (select ','||column_name||' = '|| 'nvl('||column_name|| ','||f_dd(table_name,column_name)||')' stm from cols where table_name = v_io.stt_cod) loop v_sql := v_sql ||r.stm; end loop; v_sql := 'UPDATE '||v_io.stt_cod||' SET '||substr(v_sql,2); execute immediate v_sql; commit; end; /
  • 18. Flow of the NULL management in a Data Warehouse 4 1 <prj>_ COM_MEF_IO_CFT <prj>_ STA_<sio>_<io>_FXV IO_COD CXT_COD FXV_COD STT_COD <key_id> <io> <prj>_STA_<sio>_<io>_CXT <prj>_ STA_<sio>_<io>_FXV <prj>_ STA_<sio>_<io>_STT 1 2 <f1_cod> <f2>_num <f3_ymd> <f4_dat> <f5_cod> CODE1 250 20140207 27/01/2014 9.34.35 CODE2 2 p_ins_stt (<io>) 6 5 <prj>_COM_MEF_CFT <prj>_ STA_<sio>_<io>_STT DEF_V DEF_N DEF_YMD DEF_D ‘?’ 0 99991231 SYSDATE p_default (<io>) <key_id> <f1_cod> <f2>_num <f3_ymd> <f4_dat> <f5_cod> CODE1 250 20140207 27/01/2014 9.34.35 CODE2 1 2 3 p_upd_stt (<io>) <prj>_STA_<sio>_<io>_CXT column_cod DATA_TYPE <key_id> NUMBER <f1_cod> NUMBER <f4_dat> DATE <f5_cod> 1. 2. 3. 4. 5. 6. 7. NUMBER <f3_ymd> VARCHAR2 <prj>_ STA_<sio>_<io>_STT VARCHAR2 <f2>_num DEF_TXT 7 Data Dictionary (cols) <key_id> 11110101 NULL <prj> = Project code <sio> = Sorce Subsystem <io> = source cod Configure the data source Configure the global default values Configure the exception values for every field of data source Load (simulate) the data source Set the default values for the Staging Area table Load the Staging Area table Update the Staging Area table with the default values <f1_cod> <f2>_num <f3_ymd> <f4_dat> 1 ? 0 11110101 23/02/2014 11.21.30 2 CODE1 250 20140207 27/01/2014 9.34.35 <f5_cod> CODE2