Weitere ähnliche Inhalte
Ähnlich wie Os Pittaro (20)
Kürzlich hochgeladen (20)
Os Pittaro
- 2. A Resource-oriented approach to data services.
! Description: This technical session will describe how a resource-
oriented approach can be used to transform data into data
services.
! Using a combination of REST, Python and RDF, we'll show you
how to create data resources which can be composed into
transformation pipelines. In this system, pipelines are also
resources, allowing incremental composition of new data services
based on existing ones.
! This session will include an overview of the SnapLogic Open
Source data integration toolkit.
! We will also take a look at some real world examples:
quot; service enable an existing application
quot; create a transformation pipeline
quot; combine data from multiple pipelines to create a 'mashup' resource.
Copyright © 2007 SnapLogic, Inc.
- 3. SnapLogic Introduction
# SnapLogic is a data transformation framework
! Open Source project, GPL Version 2.0 license
! Implemented in Python
# Our goal is to provide a general solution for data access
and transformation.
Data access and transformation is a universal problem.
!
So far, there has been no consistent solution.
!
The problem is getting worse, not better.
!
More API's, versions, and formats than ever before.
!
Slide 2
Copyright © 2007 SnapLogic, Inc.
- 4. Fundamental Integration Problems
# Very Complicated Applications and Systems
Tightly Coupled, Many Inter-Dependencies
!
Heterogeneous environments
!
Systems must continually evolve
!
Upgrades, Conversions, and Consolidations are difficult
!
Vendor proprietary internal details
!
Limited vendor support lifecycles
!
Real systems knowledge is possessed by the implementers
!
# More data is being generated than ever.
Explosion of data formats
!
'Unstructured' data, with little or no metadata
!
Data quality and validity
!
Data feeds and conversions everywhere
!
Copyright © 2007 SnapLogic, Inc.
- 5. What makes Integration so Complex ?
1. Multiple Access Protocols
2. Multiple Access Methods
3. Multiple Data Schemas
ODBC
Oracle Oracle
ODBC
SAP SAP
Native
Native
SSL
SSL
3rd Party 3rd Party
SOAP
SOAP
Web Services Web Services
FTP
FTP
Flat Files Flat Files
LDAP
LDAP
LDAP/AD LDAP/AD
Slide 4
Copyright © 2007 SnapLogic, Inc.
- 6. Is There A Better Solution ?
# Design Goals
Scalable
!
Extensible (by ordinary developer)
!
Easier to use than writing code for every data interface
!
Target developers, not business users
!
quot; 'Data Crunching / Data Munging' (Greg Wilson / David Cross)
! Bridge the gap between the Web and Enterprise Data access
# To solve the problem, we need to minimize the variables
! The protocols
! The access methods
! The data formats / schemas
# We started looking for a better integration solution....
! I realized the Web seemed to be less affected by the problem
Copyright © 2007 SnapLogic, Inc.
- 7. The Web and Integration
# The largest integration venture ever
! 17 million web servers
! Totally Decentralized
! Fundamentally heterogeneous model
quot; operating systems web servers
quot; applications tools and frameworks
# It should be a nightmare of compatibility problems...
! but it's not !
# All compatible and interoperable
! Based on open standards and protocols
! HTTP and (X)HTML
! Using a common architecture
Copyright © 2007 SnapLogic, Inc.
- 8. The Web has an Architecture ?!
# There are deep design principles behind the web
! Based on Representational State Transfer (REST)
! Developed by Roy Fielding at UC Irvine
# The key abstraction in REST is a Resource
! A resource is any information that can be named
# The (simplified) principles of REST:
! state and functionality are divided into resources
! resources are addressable using URI's
! all resources share a uniform interface
quot; constrained set of operations
quot; limited set of content types
! manipulating resources is done by exchanging representations
Copyright © 2007 SnapLogic, Inc.
- 9. The SnapLogic Approach
# Apply the principles of web architecture to data access
and transformation.
Oracle
SL
SL
Oracle
SL SAP
SL
SAP
3rd Party
SL
SL
3rd Party
SL Web Services
SL
Web Services
SL Flat Files
SL
Flat Files
SL LDAP/AD
SL
LDAP/AD
Consistent Protocol - HTTP
Consistent Methods - REST 'verbs'
Consistent Data Schema - Normalized Tables
Slide 8
Copyright © 2007 SnapLogic, Inc.
- 10. Basic Data Integration Operations
# Read from Data Sources
! Files, Databases, Applications, 'Feeds' (RSS/Atom), XML
# Write to Data Sinks (targets)
! Files, Databases, Applications, 'Feeds', XML
# Transform Data
! Filter, Sort, Aggregate, Join, Union
! string operations, formatting, general calculation
# Pipelined Operations
! It's a data flow model, not really procedural
! It's useful to cascade these operations in sequence.
! The data really should stream when possible
Copyright © 2007 SnapLogic, Inc.
- 11. Resource Oriented Data Services
# Mapping Data Operations to REST
Data set => resource
!
Data description => resource description
!
data format => representation (mime type)
!
read => HTTP GET from a resource
!
HTTP GET
/customer_list Response with Data
! write => HTTP POST to a resource, with the
URL to GET from
HTTP POST
HTTP GET
/customer_list /new_location
Response with Data
Copyright © 2007 SnapLogic, Inc.
- 12. Resource Oriented Pipelines
# Pipelines are a set of coordinated resources
HTTP POST
HTTP GET HTTP GET
/customer_list /remove_dups /modified_list
Copyright © 2007 SnapLogic, Inc.
- 13. Applications for Resource Oriented Services
# Traditional Integration
! Data Interfaces between systems
! Data conversion and migration
! ETL for warehousing and analysis/BI
# Data for 'Mash up' Applications
! Expose data as a service from any application
! Allow data to be reprocessed and reused
# General Purpose Data Manipulation
! 'Data Crunching' / 'Data Munging'
Copyright © 2007 SnapLogic, Inc.
- 14. Benefits of a Resource Approach
# All resources have consistent interfaces
! Easy to mix, match, and compose them together.
! Application / Interface details are hidden at the endpoints.
# All resources have a full http://... URI
! mix, match, and compose across servers easily
# Pipelines are also resources
! A pipeline has a URL
! Can be read/written like any other resource
! Simplifies the composition of complex scenarios
/pipe1 /pipe2 /pipe3
Copyright © 2007 SnapLogic, Inc.
- 15. What's in the Download ?
# SnapLogic Data Server
Container for components
!
Coordinates pipeline execution
!
Maintains resource definition repository as an RDF store
!
Provides metadata services and client tool interfaces
!
# Components
! Database read/write, file read/write, RSS/JSON read/write
! SaleForce Read, QuickBooks read, Apache Log reader
! Transformations – Sort, Aggregate, Filter, Join, Mixer, Sequence
# Management Server
! Support for graphical web client (Flex application)
# SnapScript package
! Python classes for programmers to define and access resources
# SnapAdmin
! command line management utility
Copyright © 2007 SnapLogic, Inc.
- 16. SnapLogic Development Model
# Create resource definitions and load onto the server
! Can be done with Python or through the Web client
# Create pipelines that connect the resources
! Again, via Python or the Web Client
# Execute the pipeline
! The server takes care of coordinating the HTTP operations behind
the scene.
Copyright © 2007 SnapLogic, Inc.
- 17. •Example 1 – Reading from SugarCRM
# Create a resource to read an account list from SugarCRM
#
# Using:
Python
!
The SnapLogic.SnapScript package
!
The Database Reader Component
!
The Connection Component
!
Copyright © 2007 SnapLogic, Inc.
- 18. •Example 2 – Reading from QuickBooks
# Create a resource to read from QuickBooks
#
# Using:
! The QuickBooks read component
! The Web client
Copyright © 2007 SnapLogic, Inc.
- 19. •Example 3 – Merge Sugar and Quickbooks
# Create a pipeline using our two resources
! Union the two streams, mark as customer or prospect.
!
# Using:
! Pipeline Component
! Mixer Component
! File Write Component
Copyright © 2007 SnapLogic, Inc.
- 20. •Metadata and Resource Descriptions
# Metadata is required for serious integration
! Lack of metadata the biggest limitation of custom code
# In SnapLogic, all resources definitions use RDF
! We maintain a complete description of the resource
! The SnapLogic server repository is an RDF Store
# RDF is managed by the server and clients
! Metadata is automatically generated for the web client or
SnapScript
! You need the metadata, but don't have to deal with it directly.
# All resources can be queried for information
! GET from /url....?target=meta
! http://localhost:8088/OSCon/ReadSugarAccounts?target=meta
# SnapScript can also generate metadata
! Resource.getAsRDFString()
Copyright © 2007 SnapLogic, Inc.
- 21. •The URL's you need
# Everything is at http://www.snaplogic.org
! Full source, GPL V 2.0 license
! Forums, mailing lists, Wiki, and bugs
# http://packages.snaplogic.org
A download site for SnapLogic content
!
SugarCRM Data Mart
!
Apache Log Reader
!
Dojo / Javascript Mashup Example
!
Copyright © 2007 SnapLogic, Inc.
- 23. Why Open Source the Product?
# LAMP is the future of all infrastructure
# Proprietary development model is broken
# Integration remains a coding problem
# Open Source economics work for integration
! Eliminates deal-driven !adapterquot; development
! No vendor can support the whole connectivity matrix
! Enable reuse of data and components
Slide 22
Copyright © 2007 SnapLogic, Inc.