SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Suraj Bang BI Consultant Extend ETL to Heterogeneous and  Unstructured Data Sources
[object Object],[object Object]
Databases Non - Databases ,[object Object],[object Object],[object Object],[object Object],[object Object],XML files CSV files HTML files PDF documents Web Services
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Name of the Platform
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],JDBC Driver Class
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Source data type to OWB generic data type
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Runtime Variable Name
Variable Action  (SQL function to calculate new value)
TO_CHAR(MAX(LAST_UPDATE_DATE),’MM/DD/YYYY HH24:MI:SS.FF3’)
Calculates the New value
Updates Variable with New Value
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SQL Statement Command Line
Simple Design with CT assigned
Generates the SQL statement
SQL Statement  executes  at source
Generates SQL Statement
Generates OS specific Command
Executes Command
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Form Fields
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Types Defined
Data types mapped to OWB Generic Data types
CMI_DEFINITION
Defines what kind of objects can be Imported Tables,views,etc
An API based CMI implements  oracle.wh.service.sdk.integrator.MetadataImport OWB interface
Uses the Location Details to create metadata objects
getColumns routine gets the PDF form fields names from the PDF document . These are represented as columns in the Metadata.
iText Api – reader.getAcroForm().getFields()  collects all information from the PDF document
All the form fields are generated as columns in the table
JDBC Stub Driver to treat the PDF location as a JDBC source and register the location
URL field used for reverse engineering the PDF document by CMI
Internal Names of the PDF form fields extracted from the PDF Document
Representing PDF as a table
Assigned Code Template for PDF Extraction
API calls to get Metadata
API calls to get Metadata
Inserting Row for every processed PDF
iText API to extract data from PDF document
HTML Table (Columns & Rows)
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tags parsed to capture metadata and data by LCT
[object Object],FILENAME HEADER_NAME HEADER_VALUE RW_NBR MSERVLOC Backup Host testhost 1 MSERVLOC Backup IP 1.1.1.1 1 MSERVLOC Days Mon,Wed,Fri 1 MSERVLOC Start Time 10:00 1 MSERVLOC End Time 11:00 1 MSERVLOC Location rosh 1 MSERVLOC Backup Host hosttest 2 MSERVLOC Backup IP 2.2.2.2 2 MSERVLOC Days Tue,Thu 2 MSERVLOC Start Time 11:00 2 MSERVLOC End Time 12:30 2 MSERVLOC Location hugh 2
LCT to parse HTML files
UNPIVOT & AGGREGATOR gets all the data in the tabular format
Input to LCT to specify number of Columns on the HTML Document  ( table tags )
Importing Jython Libraries
Utilizing the HTMLLIB library to parse <td> tags
Data extraction from the tags
Inserts Data into Work table for every  <td>  tag
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Lists List Items Attachments
Web Service lists.asmx SOAP Action’s -  Getddpdateelete list, Getddpdateelete list items, Add Attachments, etc
HTTP Post operation to  invoke the web service Placeholders are replaced by values from tables in the database
CAML Definition
XML with CAML <soap:Envelope xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot; xmlns:soap=&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;>  <soap:Body>  <UpdateListItems xmlns=&quot;http://schemas.microsoft.com/sharepoint/soap/&quot;>  <listName>%s</listName>  <updates>  <Batch>  <Method ID='1' Cmd='New'>  <Field Name='Title'> %S </Field>  <Field Name='AssignedTo'> %S </Field>    <Field Name='Status'> %S </Field> … ..    </Method>  </Batch>  </updates>  </UpdateListItems>  </soap:Body>  </soap:Envelope>
Input Parameters- Web Service End Point SOAP Action SOAP Content Type Parallel Threads BASE64 Encoded Field
Input Parameter - SOAP XML Format <soap:Envelope  xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot; xmlns:soap=&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;> <soap:Body> <AddList xmlns=&quot;http://schemas.microsoft.com/sharepoint/soap/&quot;> <listName> %S </listName> <description> %S </description> <templateID> %S </templateID> </AddList> </soap:Body> </soap:Envelope>
JAVA Bean Shell to populate the XML for every row
Class to Invoke Web Service
Uses the Input Parameter to setup the web service call
XML returned after Invoking the Web Service <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <soap:Envelope xmlns:soap=&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;  xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;  xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot;> <soap:Body> <GetListItemsResponse xmlns=&quot;http://schemas.microsoft.com/sharepoint/soap/&quot;> <GetListItemsResult> <listitems xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882' xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882' xmlns:rs='urn:schemas-microsoft-com:rowset’  xmlns:z='#RowsetSchema'> <rs:data ItemCount=&quot;2&quot;> <z:row ows_Attachments='0' ows_LinkIssueIDNoMenu='1' ows_LinkTitle=‘ODI-EE' ows_Status='Active' ows_Priority='(2) Normal' ows_MetaInfo='1;#' ows__ModerationStatus='0' ows__Level='1' ows_Title=‘ODI-EE' ows_ID='1' ows_owshiddenversion='1' ows_UniqueId='1;#{962F968C-6C61-4097-91C3-5A2899C5F8B4}' ows_FSObjType='1;#0' ows_Created_x0020_Date='1;#2010-07-23 15:51:36' ows_Created='2010-08-23 15:51:36' ows_FileLeafRef='1;#1_.000' ows_FileRef='1;#yoursite/yoursubsite/Lists/Test/1_.000' /> </rs:data></listitems></GetListItemsResult></GetListItemsResponse></soap:Body></soap:Envelope> Data in the returned XML
Unbounded View using Inline SQL
 
[object Object],[object Object],[object Object],[object Object],[object Object]
LCT Invokes Web Service for every row
Return XML parsed to get Conversion Rate
TO_NUMBER(EXTRACTVALUE(  INGRP1.SOAP_XML  ,'//ConversionRateResult/text()', 'xmlns=&quot;http://www.webserviceX.NET/&quot;'))
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text search
Peter Gfader
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
Luc Bors
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Cloudera, Inc.
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 

Was ist angesagt? (19)

StORM: a lightweight ORM for Android SQLite
StORM: a lightweight ORM for Android SQLiteStORM: a lightweight ORM for Android SQLite
StORM: a lightweight ORM for Android SQLite
 
SQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershellSQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershell
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text search
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
Marmagna desai
Marmagna desaiMarmagna desai
Marmagna desai
 
OGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's ViewOGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's View
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling Water
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI Indexes
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 

Andere mochten auch (6)

Handout Scratchdisk Presentation
Handout Scratchdisk PresentationHandout Scratchdisk Presentation
Handout Scratchdisk Presentation
 
Tankawa weapons 2
Tankawa weapons 2Tankawa weapons 2
Tankawa weapons 2
 
Cultural Treasures Bidbook 2.0
Cultural Treasures Bidbook 2.0Cultural Treasures Bidbook 2.0
Cultural Treasures Bidbook 2.0
 
модные прически и макияж осень зима 2012-2013 - 2
модные прически и макияж осень зима 2012-2013 - 2модные прически и макияж осень зима 2012-2013 - 2
модные прически и макияж осень зима 2012-2013 - 2
 
модные сумки 2012
модные сумки 2012модные сумки 2012
модные сумки 2012
 
Presentatie Toothguard
Presentatie ToothguardPresentatie Toothguard
Presentatie Toothguard
 

Ähnlich wie OWB11gR2 - Extending ETL

NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
askankit
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Marco Gralike
 
30 5 Database Jdbc
30 5 Database Jdbc30 5 Database Jdbc
30 5 Database Jdbc
phanleson
 

Ähnlich wie OWB11gR2 - Extending ETL (20)

Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Jdbc
JdbcJdbc
Jdbc
 
Exploring Symfony's Code
Exploring Symfony's CodeExploring Symfony's Code
Exploring Symfony's Code
 
Mainframe Technology Overview
Mainframe Technology OverviewMainframe Technology Overview
Mainframe Technology Overview
 
Cloud State of the Union for Java Developers
Cloud State of the Union for Java DevelopersCloud State of the Union for Java Developers
Cloud State of the Union for Java Developers
 
Sql Summit Clr, Service Broker And Xml
Sql Summit   Clr, Service Broker And XmlSql Summit   Clr, Service Broker And Xml
Sql Summit Clr, Service Broker And Xml
 
03 Biz Talk 2010 Hands On Day Adapter Pack
03 Biz Talk 2010 Hands On Day  Adapter Pack03 Biz Talk 2010 Hands On Day  Adapter Pack
03 Biz Talk 2010 Hands On Day Adapter Pack
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
I Feel Pretty
I Feel PrettyI Feel Pretty
I Feel Pretty
 
Boston Computing Review - Java Server Pages
Boston Computing Review - Java Server PagesBoston Computing Review - Java Server Pages
Boston Computing Review - Java Server Pages
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
 
30 5 Database Jdbc
30 5 Database Jdbc30 5 Database Jdbc
30 5 Database Jdbc
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

OWB11gR2 - Extending ETL

Hinweis der Redaktion

  1. Agenda for this session is to cover new features of OWB 11gR2 that were used to meet various challenges faced while working on the project We will be looking at …
  2. Extracting from any source requires metadata information about different objects in the source. OWB 11gR2 provides Platform - An extensible framework, for representing native data types from heterogeneous databases. Platform supports extraction of metadata using native methods like JDBC. Platforms can be easily created for different databases like Sybase, Postgre, MySQL. In some cases where custom metadata extraction techniques are required, OWB provides an open API framework CMI to register Custom data extraction methods. CMI definition allows to define custom metadata extraction methods for a given platform. For SQL based platforms, CMI can take the form of custom (Oracle or native SQL) queries. For non-SQL platforms, CMI references to a Java class that extracts the metadata using whatever means the platform provides. CMI can easily help reverse engineer metadata extraction in cases where Custom ERP adapters are required to be build. In our case it was defining platform for reverse engineering metadata from PDF documents.
  3. For native support , Platforms are to be manually created to extract metadata from the sources Platforms can be created using.. OMB*plus Script which uses OMB commands OMB script creates definition to map data types of source database to OWB generic data types. OWB uses these definitions to translate different data types among platforms, in mappings that move data across platforms. Experts released via OWB Team is another faster approach for creating platforms Experts uses different techniques to create platform definition JDBC driver - This requires adding the JAR file to the driver path ODI technology - XML file in ODI which holds the platform definition is used as input to create platform definition These experts generates OMB script dynamically and executes it to create the platform definition.
  4. This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
  5. This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
  6. This is a snippet of an OMB script for a Sybase Platform. Script sets up properties like name of the platform , driver information , local object mask of the database, date mask Other information like mapping source data type to OWB generic data type is also done in the script
  7. One of the major new feature introduced with OWB11.2 was Code Templates. Code Templates generates native EL-T code in SQL and other languages, to run on heterogeneous platforms. This brings the power of ELT into OWB. They define set of task to be executed. These task can be a mixture of SQL statements, DML , DDL and even performing Operating System tasks like FTP , movement of files, etc Code templates provide the flexibility to generate generic logic which is to be applied across different mappings The logic can be set of extraction task or set of transformation task or a set of task to load data into target. With Code Templates comes the Declarative Design. Declarative Design – Keeps the technical implementation of a mapping different from logical design. Code templates provides flexibility to customize the way code is generated to meet different challenges. This provides the flexibility to change code depending upon the situation without modifying the design and abstracts the technical implementation from the developer designing the mapping. Code Generation - Various different coding languages can be used to generate code. The metadata in the code is generated using ODI substitution API. Java Bean Shell provides to flexibility to generate code on runtime. Java and Jython can be used to perform different integration tasks.
  8. One of the key requirement was to incrementally extract data from different sources. Need was to customize templates to pick up last changed data from sources providing last_update_date timestamp values. Challenges … Solution …
  9. Filter condition uses runtime variable to substitute value at execution time into the SQL. Runtime Variables store data as VARCHAR . Some database converts varcharstring into date and do not need explicit conversion to date like SQL Server. While some need explicit conversion like Oracle.
  10. Filter condition uses runtime variable to substitute value at execution time into the SQL. Runtime Variables store data as VARCHAR . Some database converts varcharstring into date and do not need explicit conversion to date like SQL Server. While some need explicit conversion like Oracle.
  11. The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
  12. The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
  13. The customized template takes in two input value like Variable Name Action to be carried out on Variable.. This allows to store runtime values in the same format as in source database.
  14. This template task carries out the task of capturing the new value and updating the Runtime variable value for next run , after mapping has successfully completed.
  15. This template task carries out the task of capturing the new value and updating the Runtime variable value for next run , after mapping has successfully completed.
  16. One of the common requirements is to handle huge volumes of data from the sources. Conventional methods of extracting data using JDBC/ODBC are fine with smaller volumes of data but to handle large volume of data (tables with millions of rows) its extremely slow. One of the fastest way of extracting data is dumping the source data into CSV file and then using the CSV file load into target table. But the challenge is every database has its own unique way to dump data into CSV file. This can be easily handled using Customized code template.
  17. AS we see MySQL bulk unload is a SQL Statement But Postgre provides that feauture using the command line syntax.
  18. This is a simple code template design. Where an entire table is dumped to a file. This table can also be a sql query.
  19. This is the task within code template to carry out bulk unloading for MySQL database. It generates the needed CSV file. It uses ODI substitution API to extract metadata and Jython carries out the execution.
  20. This is the task within code template to carry out bulk unloading for MySQL database. It generates the needed CSV file. It uses ODI substitution API to extract metadata and Jython carries out the execution.
  21. Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
  22. Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
  23. Depending upon the OS where Postgre is running , task will generate the command to be executed. If your Postgre is on a different box then a remote agent needs to be installed to execute the task
  24. When dealing with CSV file or flat files , OS tasks are required. These tasks can be moving files to archive folder … This task can be easily incorporated as part of code templates. Advantages – Eliminates additional overhead of writing shell or batch scripts This is turn simplifies the process flow no need of invoking external process Same proven design is enforced on all the related sources and followed by all the developers
  25. A simple move command to file after bulk unloading to a processing directory.
  26. CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
  27. CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
  28. CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
  29. CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
  30. CMI is invoked during import wizard. Various operations are invoked like tell me what tables are in the source , what are the columns in the source,etc. This is where the PDF form fields are represented as columns.
  31. When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
  32. When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
  33. When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
  34. When dealing with multiple sources involving bulk loading, OS tasks are extremely helpful in carrying out OS commands like.. This is extremely helpful when dealing with multiple sources. Where in the same code template is applied to all the mappings. Additional overhead of writing customized shell or batch scripts to carry out trivial task is eliminated. Advantages - Simpler process flows Same design is enforced on all the related sources.
  35. CAML (Collaborative Application Markup Language) XML based markup language within Web Services Groups of tags to both define and render data Generation can be done via Input XML format in template Expression Operator in the mapping Usage Incremental Extraction or Selective Extraction Creating ListItems , etc
  36. OWB11gr2 as compared to its previous release allow