The document discusses extending ETL capabilities to support heterogeneous and unstructured data sources. It describes customizing code templates to enable extracting data from various sources like databases, XML/CSV files, PDF documents, HTML files, and Microsoft SharePoint using web services. Specific examples provided include customizing platforms for non-Oracle databases, integrating PDF documents using metadata interfaces, and parsing HTML tables to extract tabular data.
74. XML returned after Invoking the Web Service <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/"> <GetListItemsResult> <listitems xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882' xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882' xmlns:rs='urn:schemas-microsoft-com:rowset’ xmlns:z='#RowsetSchema'> <rs:data ItemCount="2"> <z:row ows_Attachments='0' ows_LinkIssueIDNoMenu='1' ows_LinkTitle=‘ODI-EE' ows_Status='Active' ows_Priority='(2) Normal' ows_MetaInfo='1;#' ows__ModerationStatus='0' ows__Level='1' ows_Title=‘ODI-EE' ows_ID='1' ows_owshiddenversion='1' ows_UniqueId='1;#{962F968C-6C61-4097-91C3-5A2899C5F8B4}' ows_FSObjType='1;#0' ows_Created_x0020_Date='1;#2010-07-23 15:51:36' ows_Created='2010-08-23 15:51:36' ows_FileLeafRef='1;#1_.000' ows_FileRef='1;#yoursite/yoursubsite/Lists/Test/1_.000' /> </rs:data></listitems></GetListItemsResult></GetListItemsResponse></soap:Body></soap:Envelope> Data in the returned XML
Agenda for this session is to cover new features of OWB 11gR2 that were used to meet various challenges faced while working on the project We will be looking at …
Extracting from any source requires metadata information about different objects in the source. OWB 11gR2 provides Platform - An extensible framework, for representing native data types from heterogeneous databases. Platform supports extraction of metadata using native methods like JDBC. Platforms can be easily created for different databases like Sybase, Postgre, MySQL. In some cases where custom metadata extraction techniques are required, OWB provides an open API framework CMI to register Custom data extraction methods. CMI definition allows to define custom metadata extraction methods for a given platform. For SQL based platforms, CMI can take the form of custom (Oracle or native SQL) queries. For non-SQL platforms, CMI references to a Java class that extracts the metadata using whatever means the platform provides. CMI can easily help reverse engineer metadata extraction in cases where Custom ERP adapters are required to be build. In our case it was defining platform for reverse engineering metadata from PDF documents.
For native support , Platforms are to be manually created to extract metadata from the sources Platforms can be created using.. OMB*plus Script which uses OMB commands OMB script creates definition to map data types of source database to OWB generic data types. OWB uses these definitions to translate different data types among platforms, in mappings that move data across platforms. Experts released via OWB Team is another faster approach for creating platforms Experts uses different techniques to create platform definition JDBC driver - This requires adding the JAR file to the driver path ODI technology - XML file in ODI which holds the platform definition is used as input to create platform definition These experts generates OMB script dynamically and executes it to create the platform definition.
This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
One of the major new feature introduced with OWB11.2 was Code Templates. Code Templates generates native EL-T code in SQL and other languages, to run on heterogeneous platforms. This brings the power of ELT into OWB. They define set of task to be executed. These task can be a mixture of SQL statements, DML , DDL and even performing Operating System tasks like FTP , movement of files, etc Code templates provide the flexibility to generate generic logic which is to be applied across different mappings The logic can be set of extraction task or set of transformation task or a set of task to load data into target. With Code Templates comes the Declarative Design. Declarative Design – Keeps the technical implementation of a mapping different from logical design. Code templates provides flexibility to customize the way code is generated to meet different challenges. This provides the flexibility to change code depending upon the situation without modifying the design and abstracts the technical implementation from the developer designing the mapping. Code Generation - Various different coding languages can be used to generate code. The metadata in the code is generated using ODI substitution API. Java Bean Shell provides to flexibility to generate code on runtime. Java and Jython can be used to perform different integration tasks.
One of the key requirement was to incrementally extract data from different sources. Need was to customize templates to pick up last changed data from sources providing last_update_date timestamp values. Challenges … Solution …
Filter condition uses runtime variable to substitute value at execution time into the SQL. Runtime Variables store data as VARCHAR . Some database converts varcharstring into date and do not need explicit conversion to date like SQL Server. While some need explicit conversion like Oracle.
Filter condition uses runtime variable to substitute value at execution time into the SQL. Runtime Variables store data as VARCHAR . Some database converts varcharstring into date and do not need explicit conversion to date like SQL Server. While some need explicit conversion like Oracle.
The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
This template task carries out the task of capturing the new value and updating the Runtime variable value for next run , after mapping has successfully completed.
This template task carries out the task of capturing the new value and updating the Runtime variable value for next run , after mapping has successfully completed.
One of the common requirements is to handle huge volumes of data from the sources. Conventional methods of extracting data using JDBC/ODBC are fine with smaller volumes of data but to handle large volume of data (tables with millions of rows) its extremely slow. One of the fastest way of extracting data is dumping the source data into CSV file and then using the CSV file load into target table. But the challenge is every database has its own unique way to dump data into CSV file. This can be easily handled using Customized code template.
AS we see MySQL bulk unload is a SQL Statement But Postgre provides that feauture using the command line syntax.
This is a simple code template design. Where an entire table is dumped to a file. This table can also be a sql query.
This is the task within code template to carry out bulk unloading for MySQL database. It generates the needed CSV file. It uses ODI substitution API to extract metadata and Jython carries out the execution.
This is the task within code template to carry out bulk unloading for MySQL database. It generates the needed CSV file. It uses ODI substitution API to extract metadata and Jython carries out the execution.
Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
When dealing with CSV file or flat files , OS tasks are required. These tasks can be moving files to archive folder … This task can be easily incorporated as part of code templates. Advantages – Eliminates additional overhead of writing shell or batch scripts This is turn simplifies the process flow no need of invoking external process Same proven design is enforced on all the related sources and followed by all the developers
A simple move command to file after bulk unloading to a processing directory.
CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
CAML (Collaborative Application Markup Language) XML based markup language within Web Services Groups of tags to both define and render data Generation can be done via Input XML format in template Expression Operator in the mapping Usage Incremental Extraction or Selective Extraction Creating ListItems , etc
OWB11gr2 as compared to its previous release allow