SlideShare ist ein Scribd-Unternehmen logo
1 von 20
MICRO
ETL
FOUNDATION
Ideas and solutions for the
Data Warehouse and Business
Intelligence projects in Oracle
environment
Techniques to control the processing
units in the ETL process
In the loading of a Data Warehouse is important to have full control of the processing units that compose it.
Each processing unit must be carefully monitored both in the detection of errors that may occur,
both in the analysis of the execution times.
MASSIMO CENCI
MICRO ETL FOUNDATION
Introduction
In the loading of a Data Warehouse is important to have full control of the processing units that compose it.
Each processing unit must be carefully monitored both in the detection of errors that may occur, both in the
analysis of the execution times. In this article, which uses the messaging techniques described in the
slideshare
[http://www.slideshare.net/jackbim/recipe-7-of-data-warehouse-a-messaging-system-for-oracle-dwh-1],I
enrich the Micro ETL Foundation, by building a control system of the PL/SQL elaboration units.
By using the MEF messaging system, we have seen how to activate messages that are then stored in a log
table. In the demonstration tests, it is seen as a suitable sequence of messages and a correct settings of the
package variables , has provided an idea of the processing flow and the delay between a message and the
other.
It could call the setting of those tests as an "excess of zeal". In fact the purpose of the messaging system was
much less ambitious. It wanted provide only generic information at any moment of the ETL process, simply
calling a procedure.
It is now time to take the next step, to implement an agile control system for the processing units. The goal is
always the same: it must be simple and non-invasive. This means that it can plug into existing DWH systems
(and not only) gradually, without changing the elaborative flow. There is no doubt, however, that if you have
already set up your ETL process in a modular way, the application of the techniques that will be described, it
will be simpler and more natural.
I will use concepts and definitions already given in the messaging system. Now let us concentrate on the
concept of modularity, which is fundamental to the control system.
MICRO ETL FOUNDATION
Modularity and sequencing
The concept of modularity is the basis of the techniques that will be exhibited. As we know, a complex system,
and the ETL process of a Data Warehouse can be defined without doubt very complex, it can be managed and
understood only if we can break its overall complexity, in less complex components. In other words: you can not
have one big main program that contains thousands code lines. This is the first point.
The second point is the sequentiality. We must try to think, and you can almost always do, that each component
of the process is connected to the next, and that their sequential execution leads to the final loading of the Data
Warehouse. Please note, I am not saying it is not possible the parallelism, but to identify which components are
completely independent of each other (so they can run in parallel), it is not an easy task; without forgetting all
the problems of their synchronization.
Moreover, the parallelism also requires a specific hardware structure and specific Oracle settings. And the
performance improvement, I speak from my experience, it is not so sure. I suggest, therefore, to try to apply
parallelism on the objects rather than on the processes. Usually, the dimension tables may be loaded in parallel
(if there are not logical connections between them), but why complicate our lives if we can reason in a simple
sequential way ?
Recall that simplicity is a pillar of the Micro ETL Foundation. So the advice is: modularity and sequencing. Do not
forget that the ETL process, is physiologically sequential in its basic pillars. You can not load a Data Mart of level
2 before loading the Data Mart of level 1. And the Data Mart of level 1 cannot be loaded until you first load the
dimensions, which in turn, cannot be loaded if you have not first loaded the Staging Area tables, and so on.
The figure below shows the concepts of modularity and sequencing applied to a hypothetical schedule S1. On
the left we have the "logical" components of the ETL process, ie not code, but names, configured in a table.
On the right we have the "physical" components, ie the real programming code.
MICRO ETL FOUNDATION
Modularity and sequencing
….
….
….
procedure p_xxx is
begin
end;
Logical (configuration) Physical (code)
MICRO ETL FOUNDATION
Requirements
The main requirement, as already mentioned, is to have the control of each processing unit (unit) which
constitutes the ETL process . Having control means that, for each unit, at any time, I need to know:
• when it started and when it ended.
• how long was his execution.
• if it is successful or has had some problems (exception).
• if you received an exception, what is the error that occurred
• what are the consequences in case of error.
In order to meet those requirements, it is not necessary to use very complex structures. In accordance with
the MEF philosophy, I will use just a configuration table (MEF_UNIT_CFT), that allows me to enter the main
characteristics of the units that make up the loading job, and a logging table (MEF_UNIT_LOT) that allows me
to see the log of the executions.
The most important information present in the configuration table, along with some context information, is
the continuity flag, which allows me to decide whether, in the event of an error, the error is critical (ie it must
abort the job that has called the unit) or not critical (ie it must allows the running of the next unit ).
To be able logging the execution, you will have to "surround" the unit call by a procedure call that logs the
beginning, and from a procedure call that logs the end. In addition, the error situations should be treated in
a uniform manner by a unique procedure, which logs the error and implements the consequences using the
continuity flag.
To control the behavior of a unit means, first of all, to understand its life cycle in the global context of the job
to which it belongs. To that end, we will use the the theory of the finite-state machine, or, if we want to be a
bit more modern, we will use a simplified version of the state-diagrams of the Unified Modelling Language.
MICRO ETL FOUNDATION
The state diagram
As stated previously, the MEF control system, in order to perform its function, it must surround the execution
of every elaborative unit, with a procedure call (p_start) that registers its beginning, and with a procedure
call (p_end) that registers the end.
In practice, the program code (we will see him clearly in the test case) must have calls like:
p_start
<unit call x>
p_end
p_start
<unit call y>
p_end
...
This seems so simple, but to complicate the situation, there may be exceptions in the execution. These
exceptions should be handled by a procedure call (p_exc) always present inside the unit.
A state diagram is the most useful tool to understand the units life cycle.
MICRO ETL FOUNDATION
Requirements
(1) – Unit running
---------------------------------
Status_cod = running
Return_cod = ?
(2) – Unit ended OK
--------------------------------
Status_cod = Done
Return_cod = OK
(0) - No unit running
------------------------------------
Status_cod = ?
Return_cod = ?
(4) – Unit aborted
---------------------------------
Status_cod = Aborted
Return_cod = NOT OK
(3) - Unit ended with warning
-----------------------------------------
Status_cod = Done
Return_cod = OK (Warning)
p_start
p_endp_start
p_exc
(continue=0)
p_exc
(continue=1)
p_start
Elaboration units life-cycle
p_end
MICRO ETL FOUNDATION
The state diagram
The Figure shows the various states and the possible state changes of the units inside a block of executions.
Everything shown graphically it was translated into PL/SQL language.
Because each unit must be preceded by a procedure call that logs the beginning of the execution, the
procedure p_start places the unit in the "running" state , and set the return code to a obviously unknown
("?") value. It is important to note that after the p_start must be present only the call of the unit and no
other procedure.
The unit can conclude its run in two different ways. It finishes without any problems, ie no Oracle error,or it
fails. In the first case the unit will call the p_end procedure, which leads us into a "Done" terminal state with
return code = "OK". In the second case, depending on the setting of the continuity flag in the configuration
table, it can behave in two different ways.
The unit ends in a definitive way and thus prevents the execution of any other unit, it switches in the
"Aborted" state and sets the return code = "NOT OK". The unit ends with a warning, switches in the "Done"
state but with "OK (Warning)" as return code.
This doesn't prevent the execution of the next unit; in fact, in this state is again possible call a p_start
procedure that will switch the next unit in the "Running" state. The p_end procedure however, will do
nothing, leaving the unit in the final state.
Each state change that is not present in the diagram, will always give an error message. This ensures that, for
distraction, you have not followed the correct sequence of calls (eg. you have forgotten the p_end
procedure or it is called more than one time, or other).
MICRO ETL FOUNDATION
The exception management
Since the more complex change of state is related to the error situations, we explore briefly the Exception
management, (of which, however, we have already seen some examples inside the messaging techniques).
In the PL/SQL language, an error situation that occurs while the program is running, it is called "exception".
The error condition can be generated by the program itself (eg. Division by zero), or forced by the logic of
the program (eg,. An amount that does not exceed a certain threshold).
In the latter case, the exception is explicitly reached using the RAISE_APPLICATION_ERROR statement.
Regardless of the cause of the error, officially identified as internal exception or user-defined exception, the
Oracle PL/SQL engine transfers the control of the program toward the exception handler of the running
module (or PL/SQL block). In practice the code after the EXCEPTION keyword .
Obviously, if the EXCEPTION keyword is not present, the program will end immediately because did not find
any indication for error handling.
Let us now analyze the error propagation. If there are several nested procedures between them, an
unhandled error by the most internal procedure , propagates into the caller procedure, and so forth up to
the main program.
If the error is handled, the procedure will continue its regular work. (unless inside the exception handler
there is the RAISE keyword or there is a software error into the exception handler).
To clarify the positioning of the exception call, we try to anticipate what will be the structure of the program
code, using the next figure.
MICRO ETL FOUNDATION
The Exception management
mef_unit.p_start(‘s1’,’job1’,’ unit_test.module2’)
mef_unit.p_end
unit_test.module2 procedure module2 is
begin
exception
end;
mef_unit.p_exc
PKG unit_test
MICRO ETL FOUNDATION
The design
The design of the unit control system, is composed of two tables, and a sequence. The MEF_CFT table is the
configuration table of the processing units. The MEF_UNIT_LOT table is the one that keeps the log of the
executions of the units. The MEF_UNIT_LOT_SEQ sequence serves to give a sequence number to each log
line. Let us now see in detail these objects. Download the code from MEF_01:
https://drive.google.com/folderview?id=0B2dQ0EtjqAOTN3I1MU9JQmpOUEE&usp=sharing
The DDW_COM_ETL_UNIT_LOT_SEQ sequence
It is the most functional of the time stamp to sort the table. Because sometimes the units begin and end in
fractions of a second of each other, the time stamp might not be sufficiently discriminating.
The DDW_COM_ETL_UNIT_CFT table
This table contains the configuration information of the processing units In it you configure schedules, jobs
and units. For the purpose of the control system of the elaboration units, jobs and schedules will be used in a
static way, as parameters to the procedure calls. Their management will be explained, in the future, in the
control system of the jobs. As well the SORT_CNT, UNIT_ACTIVE_FLG and JOB_ACTIVE_FLG fields not have
an immediate use, and we not have to set them.
• SCHED_COD: Identifies the schedule to which the job belongs. It is a logical entity, in the sense that we
have to think of it as the identifier of a job list.
• JOB_COD: identifier of the job. It is a logical entity, in the sense that we have to think of it as the
identifier of a list of elaboration units.
• UNIT_COD: Identifier of the processing unit within the job. We can think of it as a Oracle packaged
procedure.
• SORT_CNT: Counter of the unit inside the job.
• CONTINUE_FLG: Continuity flag. If set = 1 means that in the event of an error, the next unit can continue,
if set = 0 means that the job should have an abort because the error is blocker.
MICRO ETL FOUNDATION
The design
• UNIT_ACTIVE_FLG: A flag that indicates whether the unit is active.
• JOB_ACTIVE_FLG: Flag indicating whether the job is active
The DDW_COM_ETL_UNIT_LOT table
This table stores all information relating to the executions of all processing units. Its structure is very similar
to that of messaging system as regards the timing information, but in addition also retains the unit status
and the final outcome of its execution.
• SEQ_NUM: Sequential number of the line obtained from the Oracle sequence .
• DAY_COD: day of the execution in the YYYYMMDD (year, month, day) format
• SCHED_COD: identifier of the schedule to which the job belongs.
• JOB_COD: identifier of the job.
• UNIT_COD: Identifier of the processing unit within the job.
• EXEC_CNT: Identifier of the job execution. Every job execution should be tagged by a number, in turn
extracted from a Oracle sequence.
• STATUS_COD: State of the unit.
• RETURN_COD: return code of the execution of the unit.
• SS_NUM: Number of seconds consumed by the processing unit. This information, together with the two
following, is a summable statistical number.
• MI_NUM: Number of minutes consumed by the processing unit
• HH_NUM: Number of hours consumed by the processing unit
• ELAPSED_TXT: execution time in the HH24MISS format
• ERRMSG_TXT: Error Message.
• STAMP_DTS: Time stam
MICRO ETL FOUNDATION
The MEF_UNIT package
This package is the core of the control system. I will give a brief description of the main procedures.
p_init_unit
This procedure has the only task to initialize all global variables, extracting the information from the
MEF_UNIT_CFT table basing on parameters received in input.
p_start
The task of the p_start procedure, is to record the start of the processing unit. In practice, the code
implements the logic present in the diagram of the status changes.
The initial test is related to the recognition of the current state to see if the p_start procedure is permitted at
this time.
If we are not in a state 0, ie, the first elaboration unit of the job, in a 2/3 state, (that is, after the correct end
or the end with warning of the previous unit), it generates an abort that prevents to the process to continue.
If we are in the correct state, is called the initialization procedure of the unit and there is the setting of the
status variable and of the return code variable. So all this informations are stored in the
DDW_COM_ETL_UNIT_LOT table. Finally, there is the change of the unit current status.
p_exc
The procedure of the exceptions management begins immediately with the recording of this anomalous
situation in the MEF_MSG_LOT table.
At this point there is the test on the state, which, obviously, can only be that of unit in running (ie the state
1). Are then updated two variables, the use of which we will see in the future, which preserve the history of
the units in error.
The fail_unit_cnt is simply a counter of the units that have had problems, the fail_list_txt variable, links using
a carriage return, the name of these units.
The next test, based on the continuity flag, invokes the corresponding management procedures. Let's see.
MICRO ETL FOUNDATION
The MEF_UNIT package
p_exc_continue
This procedure has the task of recording the error situation, but should not block the process of working
through. It then sets the state and the return code of the unit as specified by the state diagram.
It will pass in the final state 3, ie ends with warning, and updates the MEF_UNIT_LOT table.
p_exc_abort
This procedure has the task to terminate the processing flow. Like the previous procedure, it will sets the
ending state, it will sets the return code and updates the MEF_UNIT_LOT table, but ends, with the
RAISE_APPLICATION_ERROR, in a definitive way, the running job.
MICRO ETL FOUNDATION
The UNIT_TEST package
We are now able to start our tests. To this end, we will build the UNIT_TEST package which will contain a
small number of processing units. Its code is only for demonstration.
Note, that each processing unit has the exception according to the standard described above, ie using the
SQL statement:
when others then mef_unit.p_exc (v_module_cod, sqlerrm);
We see the functionality of these modules:
MODULE1, MODULE2, MODULE3: They are procedures that finish without problems. They set into a local
variable, the number of rows of an Oracle system table. This select was chosen to occupy a little time, and
allows us to check the correctness of the information thunderstorms in the MEF_UNIT_LOT table .
MODULE_W, MODULE_W2: These procedures have some instructions that force an error, that, using the
flag of continuity, is non-blocking.
MODULE_A: This procedure will end in failure, as the number of rows in the table is surely greater than 1
digit ,which is the constraint associated with the v_num local variable. As the flag of continuity says, the
error is blocking.
Remember, in your own tests, to exit and enter from SQL, between a test and the other, to reset the package
variables. In addition, you must place a lot of attention to the names of units: If you execute the p_start of
the X unit and you start the Y unit, the elaboration log will not be reliable.
MICRO ETL FOUNDATION
Installation
I remember that all the code of the control system of units, can be downloaded from MEF_01:
https://drive.google.com/folderview?id=0B2dQ0EtjqAOTN3I1MU9JQmpOUEE&usp=sharing
Before you use it, you must install the base of the Micro ETL Foundation, that is the messaging system. This is
the link to MEF_00:
https://drive.google.com/folderview?id=0B2dQ0EtjqAOTaU5WNmc5MkVnVFE&usp=sharing
Its installation is explained on slideshare:
http://www.slideshare.net/jackbim/recipe-7-of-data-warehouse-a-messaging-system-for-oracle-dwh-2
Regarding the control system of the units, do this. You must go in SQL*Plus with the user you created/
configured in the messaging system. Then run:
SQL> @ mef_unit_install.sql
You do not need to do anything else. We're ready for the test phase.
MICRO ETL FOUNDATION
Test1 (it is all right)
We enter into SQL*Plus, and we launch the unit_test_run1.sql SQL script. The script is very simple. First of
all, it configures the units of the testing job, by inserting a row for units, into the MEF_UNIT_CFT table.
So then runs the three units that surely will end with a positive outcome (it only takes about 30 seconds).
As you can see, each unit is limited by the start/end pair.
Now we can verify the result, by seeing the contents of the MEF_UNIT_LOT table. You can enter into
SQL*Plus and run the select of the table, but for ease of viewing, I will show the result (reduced) in graphical
format.
MICRO ETL FOUNDATION
Test2 (not-blocking errors)
In this second test, we show the behavior of the control system, in case of non-blocking errors. We go into
SQL*Plus and launch the unit_test_run2.sql script.
The script is similar to the previous one, changing only the calls and the setting of the continuity flag of the
unit.
The final result obtained, clearly shows the errors encountered at run-time from the job2 and the fact that,
not being blockers, module_w2 and module3 led to term, with successful, their executions.
MICRO ETL FOUNDATION
Test3 (fatal error)
In this third test, we show the behavior of the control system in case of fatal error. We go into SQL*Plus and
we launch the unit_test_run3.sql script.
The final result obtained, clearly shows how the module_a of the job3, having configured the continue_flg =
0, prevents the execution of the subsequent module2 and module3 units.
Obviously such exception situations were inserted automatically also in the log messages table.
MICRO ETL FOUNDATION
Conclusions
As the messaging system, also the control system of the processing units is simple and very useful, to answer
all those questions that, inevitably, we are asked in the case of problems with the loading process of the Data
Warehouse. Its simplicity is based on the fact that only three steps are sufficient in an already existing code:
1. Configure the unit
2. Insert the p_start and the p_end procedure call.
3. Replace the exception-handler with the p_exc call.
Soon, we will also see some scheduling techniques. That is, based on the configuration table here described,
we will launch a loading job without care to insert the p_start and p_end calls.
It will be all automatical and dynamic. We will not have a separate main program for each job. And we will
get what I theorized in the past in an my article that appeared, years ago, for Data Mart Review. [The
Infrastructural Data Warehouse] (unfortunately the site no longer exists and has been incorporated into the
Information Management site).
We will have implemented a method that, in my opinion, it is essential for an ETL process: get the clear
separation between infrastructure code and business code of a Data Warehouse.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
 
Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 3
 
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
 
All Oracle-dba-interview-questions
All Oracle-dba-interview-questionsAll Oracle-dba-interview-questions
All Oracle-dba-interview-questions
 
Apps1
Apps1Apps1
Apps1
 
SQL2SPARQL
SQL2SPARQLSQL2SPARQL
SQL2SPARQL
 
Database consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MXDatabase consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MX
 
Introduction to embedded sql for NonStop SQL
Introduction to embedded sql for NonStop SQLIntroduction to embedded sql for NonStop SQL
Introduction to embedded sql for NonStop SQL
 
Concepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to MetadataConcepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
 
Native tables in NonStop SQL database
Native tables in NonStop SQL databaseNative tables in NonStop SQL database
Native tables in NonStop SQL database
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
MFC Whitepaper
MFC WhitepaperMFC Whitepaper
MFC Whitepaper
 
R12 d49656 gc10-apps dba 10
R12 d49656 gc10-apps dba 10R12 d49656 gc10-apps dba 10
R12 d49656 gc10-apps dba 10
 
Dbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comDbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.com
 
Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.
 
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objectsConcepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
 

Andere mochten auch

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
Massimo Cenci
 
Education Data Warehouse System
Education Data Warehouse SystemEducation Data Warehouse System
Education Data Warehouse System
daniyalqureshi712
 
Capstone Project - Roofing Shingles Shrinkage
Capstone Project - Roofing Shingles ShrinkageCapstone Project - Roofing Shingles Shrinkage
Capstone Project - Roofing Shingles Shrinkage
Polina Ryabko
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
Rohit Kumar
 
Warehousing proposal
Warehousing proposalWarehousing proposal
Warehousing proposal
zulfimac
 

Andere mochten auch (20)

Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Education Data Warehouse System
Education Data Warehouse SystemEducation Data Warehouse System
Education Data Warehouse System
 
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Capstone Project - Roofing Shingles Shrinkage
Capstone Project - Roofing Shingles ShrinkageCapstone Project - Roofing Shingles Shrinkage
Capstone Project - Roofing Shingles Shrinkage
 
Factory &amp; Warehouse Security
Factory &amp; Warehouse SecurityFactory &amp; Warehouse Security
Factory &amp; Warehouse Security
 
John Frias Morales, Dr.BA, MS Resume
John Frias Morales, Dr.BA, MS ResumeJohn Frias Morales, Dr.BA, MS Resume
John Frias Morales, Dr.BA, MS Resume
 
Warehouse management system
Warehouse management systemWarehouse management system
Warehouse management system
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7
 
342ch09
342ch09342ch09
342ch09
 
Warehousing proposal
Warehousing proposalWarehousing proposal
Warehousing proposal
 
Warehouse Management System
Warehouse Management SystemWarehouse Management System
Warehouse Management System
 
Warehouse Management System – WMCentral
Warehouse Management System – WMCentralWarehouse Management System – WMCentral
Warehouse Management System – WMCentral
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Realize the potential of sap material ledger
Realize the potential of sap material ledgerRealize the potential of sap material ledger
Realize the potential of sap material ledger
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 

Ähnlich wie Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control the processing units in the ETL process

Project report of ustos
Project report of ustosProject report of ustos
Project report of ustos
Murali Mc
 
Building Cultural Awareness through EmotionPresented By Team .docx
Building Cultural Awareness through EmotionPresented By Team .docxBuilding Cultural Awareness through EmotionPresented By Team .docx
Building Cultural Awareness through EmotionPresented By Team .docx
hartrobert670
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
Jeff McQuigg
 
Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
marangburu42
 
Etl interview questions
Etl interview questionsEtl interview questions
Etl interview questions
ashokvirtual
 
Programmable logic-control
Programmable logic-controlProgrammable logic-control
Programmable logic-control
akashganesan
 

Ähnlich wie Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control the processing units in the ETL process (20)

Project report of ustos
Project report of ustosProject report of ustos
Project report of ustos
 
Dbms
DbmsDbms
Dbms
 
Top answers to etl interview questions
Top answers to etl interview questionsTop answers to etl interview questions
Top answers to etl interview questions
 
Building Cultural Awareness through EmotionPresented By Team .docx
Building Cultural Awareness through EmotionPresented By Team .docxBuilding Cultural Awareness through EmotionPresented By Team .docx
Building Cultural Awareness through EmotionPresented By Team .docx
 
Modul PLC Programming.pdf
Modul PLC Programming.pdfModul PLC Programming.pdf
Modul PLC Programming.pdf
 
Batch processing
Batch processingBatch processing
Batch processing
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
 
Transaction handling in com, ejb and .net
Transaction handling in com, ejb and .netTransaction handling in com, ejb and .net
Transaction handling in com, ejb and .net
 
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
 
Operating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and DeadlocksOperating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and Deadlocks
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
PL/SQL Interview Questions
PL/SQL Interview QuestionsPL/SQL Interview Questions
PL/SQL Interview Questions
 
Introduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theoryIntroduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theory
 
Etl interview questions
Etl interview questionsEtl interview questions
Etl interview questions
 
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptxCS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
 
Batch processing
Batch processingBatch processing
Batch processing
 
Batch processing
Batch processingBatch processing
Batch processing
 
Programmable logic-control
Programmable logic-controlProgrammable logic-control
Programmable logic-control
 

Mehr von Massimo Cenci

Mehr von Massimo Cenci (15)

Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisiNote di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control the processing units in the ETL process

  • 1. MICRO ETL FOUNDATION Ideas and solutions for the Data Warehouse and Business Intelligence projects in Oracle environment Techniques to control the processing units in the ETL process In the loading of a Data Warehouse is important to have full control of the processing units that compose it. Each processing unit must be carefully monitored both in the detection of errors that may occur, both in the analysis of the execution times. MASSIMO CENCI
  • 2. MICRO ETL FOUNDATION Introduction In the loading of a Data Warehouse is important to have full control of the processing units that compose it. Each processing unit must be carefully monitored both in the detection of errors that may occur, both in the analysis of the execution times. In this article, which uses the messaging techniques described in the slideshare [http://www.slideshare.net/jackbim/recipe-7-of-data-warehouse-a-messaging-system-for-oracle-dwh-1],I enrich the Micro ETL Foundation, by building a control system of the PL/SQL elaboration units. By using the MEF messaging system, we have seen how to activate messages that are then stored in a log table. In the demonstration tests, it is seen as a suitable sequence of messages and a correct settings of the package variables , has provided an idea of the processing flow and the delay between a message and the other. It could call the setting of those tests as an "excess of zeal". In fact the purpose of the messaging system was much less ambitious. It wanted provide only generic information at any moment of the ETL process, simply calling a procedure. It is now time to take the next step, to implement an agile control system for the processing units. The goal is always the same: it must be simple and non-invasive. This means that it can plug into existing DWH systems (and not only) gradually, without changing the elaborative flow. There is no doubt, however, that if you have already set up your ETL process in a modular way, the application of the techniques that will be described, it will be simpler and more natural. I will use concepts and definitions already given in the messaging system. Now let us concentrate on the concept of modularity, which is fundamental to the control system.
  • 3. MICRO ETL FOUNDATION Modularity and sequencing The concept of modularity is the basis of the techniques that will be exhibited. As we know, a complex system, and the ETL process of a Data Warehouse can be defined without doubt very complex, it can be managed and understood only if we can break its overall complexity, in less complex components. In other words: you can not have one big main program that contains thousands code lines. This is the first point. The second point is the sequentiality. We must try to think, and you can almost always do, that each component of the process is connected to the next, and that their sequential execution leads to the final loading of the Data Warehouse. Please note, I am not saying it is not possible the parallelism, but to identify which components are completely independent of each other (so they can run in parallel), it is not an easy task; without forgetting all the problems of their synchronization. Moreover, the parallelism also requires a specific hardware structure and specific Oracle settings. And the performance improvement, I speak from my experience, it is not so sure. I suggest, therefore, to try to apply parallelism on the objects rather than on the processes. Usually, the dimension tables may be loaded in parallel (if there are not logical connections between them), but why complicate our lives if we can reason in a simple sequential way ? Recall that simplicity is a pillar of the Micro ETL Foundation. So the advice is: modularity and sequencing. Do not forget that the ETL process, is physiologically sequential in its basic pillars. You can not load a Data Mart of level 2 before loading the Data Mart of level 1. And the Data Mart of level 1 cannot be loaded until you first load the dimensions, which in turn, cannot be loaded if you have not first loaded the Staging Area tables, and so on. The figure below shows the concepts of modularity and sequencing applied to a hypothetical schedule S1. On the left we have the "logical" components of the ETL process, ie not code, but names, configured in a table. On the right we have the "physical" components, ie the real programming code.
  • 4. MICRO ETL FOUNDATION Modularity and sequencing …. …. …. procedure p_xxx is begin end; Logical (configuration) Physical (code)
  • 5. MICRO ETL FOUNDATION Requirements The main requirement, as already mentioned, is to have the control of each processing unit (unit) which constitutes the ETL process . Having control means that, for each unit, at any time, I need to know: • when it started and when it ended. • how long was his execution. • if it is successful or has had some problems (exception). • if you received an exception, what is the error that occurred • what are the consequences in case of error. In order to meet those requirements, it is not necessary to use very complex structures. In accordance with the MEF philosophy, I will use just a configuration table (MEF_UNIT_CFT), that allows me to enter the main characteristics of the units that make up the loading job, and a logging table (MEF_UNIT_LOT) that allows me to see the log of the executions. The most important information present in the configuration table, along with some context information, is the continuity flag, which allows me to decide whether, in the event of an error, the error is critical (ie it must abort the job that has called the unit) or not critical (ie it must allows the running of the next unit ). To be able logging the execution, you will have to "surround" the unit call by a procedure call that logs the beginning, and from a procedure call that logs the end. In addition, the error situations should be treated in a uniform manner by a unique procedure, which logs the error and implements the consequences using the continuity flag. To control the behavior of a unit means, first of all, to understand its life cycle in the global context of the job to which it belongs. To that end, we will use the the theory of the finite-state machine, or, if we want to be a bit more modern, we will use a simplified version of the state-diagrams of the Unified Modelling Language.
  • 6. MICRO ETL FOUNDATION The state diagram As stated previously, the MEF control system, in order to perform its function, it must surround the execution of every elaborative unit, with a procedure call (p_start) that registers its beginning, and with a procedure call (p_end) that registers the end. In practice, the program code (we will see him clearly in the test case) must have calls like: p_start <unit call x> p_end p_start <unit call y> p_end ... This seems so simple, but to complicate the situation, there may be exceptions in the execution. These exceptions should be handled by a procedure call (p_exc) always present inside the unit. A state diagram is the most useful tool to understand the units life cycle.
  • 7. MICRO ETL FOUNDATION Requirements (1) – Unit running --------------------------------- Status_cod = running Return_cod = ? (2) – Unit ended OK -------------------------------- Status_cod = Done Return_cod = OK (0) - No unit running ------------------------------------ Status_cod = ? Return_cod = ? (4) – Unit aborted --------------------------------- Status_cod = Aborted Return_cod = NOT OK (3) - Unit ended with warning ----------------------------------------- Status_cod = Done Return_cod = OK (Warning) p_start p_endp_start p_exc (continue=0) p_exc (continue=1) p_start Elaboration units life-cycle p_end
  • 8. MICRO ETL FOUNDATION The state diagram The Figure shows the various states and the possible state changes of the units inside a block of executions. Everything shown graphically it was translated into PL/SQL language. Because each unit must be preceded by a procedure call that logs the beginning of the execution, the procedure p_start places the unit in the "running" state , and set the return code to a obviously unknown ("?") value. It is important to note that after the p_start must be present only the call of the unit and no other procedure. The unit can conclude its run in two different ways. It finishes without any problems, ie no Oracle error,or it fails. In the first case the unit will call the p_end procedure, which leads us into a "Done" terminal state with return code = "OK". In the second case, depending on the setting of the continuity flag in the configuration table, it can behave in two different ways. The unit ends in a definitive way and thus prevents the execution of any other unit, it switches in the "Aborted" state and sets the return code = "NOT OK". The unit ends with a warning, switches in the "Done" state but with "OK (Warning)" as return code. This doesn't prevent the execution of the next unit; in fact, in this state is again possible call a p_start procedure that will switch the next unit in the "Running" state. The p_end procedure however, will do nothing, leaving the unit in the final state. Each state change that is not present in the diagram, will always give an error message. This ensures that, for distraction, you have not followed the correct sequence of calls (eg. you have forgotten the p_end procedure or it is called more than one time, or other).
  • 9. MICRO ETL FOUNDATION The exception management Since the more complex change of state is related to the error situations, we explore briefly the Exception management, (of which, however, we have already seen some examples inside the messaging techniques). In the PL/SQL language, an error situation that occurs while the program is running, it is called "exception". The error condition can be generated by the program itself (eg. Division by zero), or forced by the logic of the program (eg,. An amount that does not exceed a certain threshold). In the latter case, the exception is explicitly reached using the RAISE_APPLICATION_ERROR statement. Regardless of the cause of the error, officially identified as internal exception or user-defined exception, the Oracle PL/SQL engine transfers the control of the program toward the exception handler of the running module (or PL/SQL block). In practice the code after the EXCEPTION keyword . Obviously, if the EXCEPTION keyword is not present, the program will end immediately because did not find any indication for error handling. Let us now analyze the error propagation. If there are several nested procedures between them, an unhandled error by the most internal procedure , propagates into the caller procedure, and so forth up to the main program. If the error is handled, the procedure will continue its regular work. (unless inside the exception handler there is the RAISE keyword or there is a software error into the exception handler). To clarify the positioning of the exception call, we try to anticipate what will be the structure of the program code, using the next figure.
  • 10. MICRO ETL FOUNDATION The Exception management mef_unit.p_start(‘s1’,’job1’,’ unit_test.module2’) mef_unit.p_end unit_test.module2 procedure module2 is begin exception end; mef_unit.p_exc PKG unit_test
  • 11. MICRO ETL FOUNDATION The design The design of the unit control system, is composed of two tables, and a sequence. The MEF_CFT table is the configuration table of the processing units. The MEF_UNIT_LOT table is the one that keeps the log of the executions of the units. The MEF_UNIT_LOT_SEQ sequence serves to give a sequence number to each log line. Let us now see in detail these objects. Download the code from MEF_01: https://drive.google.com/folderview?id=0B2dQ0EtjqAOTN3I1MU9JQmpOUEE&usp=sharing The DDW_COM_ETL_UNIT_LOT_SEQ sequence It is the most functional of the time stamp to sort the table. Because sometimes the units begin and end in fractions of a second of each other, the time stamp might not be sufficiently discriminating. The DDW_COM_ETL_UNIT_CFT table This table contains the configuration information of the processing units In it you configure schedules, jobs and units. For the purpose of the control system of the elaboration units, jobs and schedules will be used in a static way, as parameters to the procedure calls. Their management will be explained, in the future, in the control system of the jobs. As well the SORT_CNT, UNIT_ACTIVE_FLG and JOB_ACTIVE_FLG fields not have an immediate use, and we not have to set them. • SCHED_COD: Identifies the schedule to which the job belongs. It is a logical entity, in the sense that we have to think of it as the identifier of a job list. • JOB_COD: identifier of the job. It is a logical entity, in the sense that we have to think of it as the identifier of a list of elaboration units. • UNIT_COD: Identifier of the processing unit within the job. We can think of it as a Oracle packaged procedure. • SORT_CNT: Counter of the unit inside the job. • CONTINUE_FLG: Continuity flag. If set = 1 means that in the event of an error, the next unit can continue, if set = 0 means that the job should have an abort because the error is blocker.
  • 12. MICRO ETL FOUNDATION The design • UNIT_ACTIVE_FLG: A flag that indicates whether the unit is active. • JOB_ACTIVE_FLG: Flag indicating whether the job is active The DDW_COM_ETL_UNIT_LOT table This table stores all information relating to the executions of all processing units. Its structure is very similar to that of messaging system as regards the timing information, but in addition also retains the unit status and the final outcome of its execution. • SEQ_NUM: Sequential number of the line obtained from the Oracle sequence . • DAY_COD: day of the execution in the YYYYMMDD (year, month, day) format • SCHED_COD: identifier of the schedule to which the job belongs. • JOB_COD: identifier of the job. • UNIT_COD: Identifier of the processing unit within the job. • EXEC_CNT: Identifier of the job execution. Every job execution should be tagged by a number, in turn extracted from a Oracle sequence. • STATUS_COD: State of the unit. • RETURN_COD: return code of the execution of the unit. • SS_NUM: Number of seconds consumed by the processing unit. This information, together with the two following, is a summable statistical number. • MI_NUM: Number of minutes consumed by the processing unit • HH_NUM: Number of hours consumed by the processing unit • ELAPSED_TXT: execution time in the HH24MISS format • ERRMSG_TXT: Error Message. • STAMP_DTS: Time stam
  • 13. MICRO ETL FOUNDATION The MEF_UNIT package This package is the core of the control system. I will give a brief description of the main procedures. p_init_unit This procedure has the only task to initialize all global variables, extracting the information from the MEF_UNIT_CFT table basing on parameters received in input. p_start The task of the p_start procedure, is to record the start of the processing unit. In practice, the code implements the logic present in the diagram of the status changes. The initial test is related to the recognition of the current state to see if the p_start procedure is permitted at this time. If we are not in a state 0, ie, the first elaboration unit of the job, in a 2/3 state, (that is, after the correct end or the end with warning of the previous unit), it generates an abort that prevents to the process to continue. If we are in the correct state, is called the initialization procedure of the unit and there is the setting of the status variable and of the return code variable. So all this informations are stored in the DDW_COM_ETL_UNIT_LOT table. Finally, there is the change of the unit current status. p_exc The procedure of the exceptions management begins immediately with the recording of this anomalous situation in the MEF_MSG_LOT table. At this point there is the test on the state, which, obviously, can only be that of unit in running (ie the state 1). Are then updated two variables, the use of which we will see in the future, which preserve the history of the units in error. The fail_unit_cnt is simply a counter of the units that have had problems, the fail_list_txt variable, links using a carriage return, the name of these units. The next test, based on the continuity flag, invokes the corresponding management procedures. Let's see.
  • 14. MICRO ETL FOUNDATION The MEF_UNIT package p_exc_continue This procedure has the task of recording the error situation, but should not block the process of working through. It then sets the state and the return code of the unit as specified by the state diagram. It will pass in the final state 3, ie ends with warning, and updates the MEF_UNIT_LOT table. p_exc_abort This procedure has the task to terminate the processing flow. Like the previous procedure, it will sets the ending state, it will sets the return code and updates the MEF_UNIT_LOT table, but ends, with the RAISE_APPLICATION_ERROR, in a definitive way, the running job.
  • 15. MICRO ETL FOUNDATION The UNIT_TEST package We are now able to start our tests. To this end, we will build the UNIT_TEST package which will contain a small number of processing units. Its code is only for demonstration. Note, that each processing unit has the exception according to the standard described above, ie using the SQL statement: when others then mef_unit.p_exc (v_module_cod, sqlerrm); We see the functionality of these modules: MODULE1, MODULE2, MODULE3: They are procedures that finish without problems. They set into a local variable, the number of rows of an Oracle system table. This select was chosen to occupy a little time, and allows us to check the correctness of the information thunderstorms in the MEF_UNIT_LOT table . MODULE_W, MODULE_W2: These procedures have some instructions that force an error, that, using the flag of continuity, is non-blocking. MODULE_A: This procedure will end in failure, as the number of rows in the table is surely greater than 1 digit ,which is the constraint associated with the v_num local variable. As the flag of continuity says, the error is blocking. Remember, in your own tests, to exit and enter from SQL, between a test and the other, to reset the package variables. In addition, you must place a lot of attention to the names of units: If you execute the p_start of the X unit and you start the Y unit, the elaboration log will not be reliable.
  • 16. MICRO ETL FOUNDATION Installation I remember that all the code of the control system of units, can be downloaded from MEF_01: https://drive.google.com/folderview?id=0B2dQ0EtjqAOTN3I1MU9JQmpOUEE&usp=sharing Before you use it, you must install the base of the Micro ETL Foundation, that is the messaging system. This is the link to MEF_00: https://drive.google.com/folderview?id=0B2dQ0EtjqAOTaU5WNmc5MkVnVFE&usp=sharing Its installation is explained on slideshare: http://www.slideshare.net/jackbim/recipe-7-of-data-warehouse-a-messaging-system-for-oracle-dwh-2 Regarding the control system of the units, do this. You must go in SQL*Plus with the user you created/ configured in the messaging system. Then run: SQL> @ mef_unit_install.sql You do not need to do anything else. We're ready for the test phase.
  • 17. MICRO ETL FOUNDATION Test1 (it is all right) We enter into SQL*Plus, and we launch the unit_test_run1.sql SQL script. The script is very simple. First of all, it configures the units of the testing job, by inserting a row for units, into the MEF_UNIT_CFT table. So then runs the three units that surely will end with a positive outcome (it only takes about 30 seconds). As you can see, each unit is limited by the start/end pair. Now we can verify the result, by seeing the contents of the MEF_UNIT_LOT table. You can enter into SQL*Plus and run the select of the table, but for ease of viewing, I will show the result (reduced) in graphical format.
  • 18. MICRO ETL FOUNDATION Test2 (not-blocking errors) In this second test, we show the behavior of the control system, in case of non-blocking errors. We go into SQL*Plus and launch the unit_test_run2.sql script. The script is similar to the previous one, changing only the calls and the setting of the continuity flag of the unit. The final result obtained, clearly shows the errors encountered at run-time from the job2 and the fact that, not being blockers, module_w2 and module3 led to term, with successful, their executions.
  • 19. MICRO ETL FOUNDATION Test3 (fatal error) In this third test, we show the behavior of the control system in case of fatal error. We go into SQL*Plus and we launch the unit_test_run3.sql script. The final result obtained, clearly shows how the module_a of the job3, having configured the continue_flg = 0, prevents the execution of the subsequent module2 and module3 units. Obviously such exception situations were inserted automatically also in the log messages table.
  • 20. MICRO ETL FOUNDATION Conclusions As the messaging system, also the control system of the processing units is simple and very useful, to answer all those questions that, inevitably, we are asked in the case of problems with the loading process of the Data Warehouse. Its simplicity is based on the fact that only three steps are sufficient in an already existing code: 1. Configure the unit 2. Insert the p_start and the p_end procedure call. 3. Replace the exception-handler with the p_exc call. Soon, we will also see some scheduling techniques. That is, based on the configuration table here described, we will launch a loading job without care to insert the p_start and p_end calls. It will be all automatical and dynamic. We will not have a separate main program for each job. And we will get what I theorized in the past in an my article that appeared, years ago, for Data Mart Review. [The Infrastructural Data Warehouse] (unfortunately the site no longer exists and has been incorporated into the Information Management site). We will have implemented a method that, in my opinion, it is essential for an ETL process: get the clear separation between infrastructure code and business code of a Data Warehouse.