SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Presented by Derek Meng
Data Integration
On the Alibaba Cloud
Big Data Platform
From OSS, RDS to
MaxCompute
01
03
02
04
General Process of Data Integration
DataWorks Basics
MaxCompute Basics
Getting Start with Alibaba Cloud
DATA INTEGRATION MAXCOMPUTE
DATAWORKS DEMO
Overview
2 /25
(Slide No. 3-9) (Slide No. 10-20)
(Slide No. 23-24)(Slide No. 21-22)
01
General Process of Data Integration
3 /25
Data Source and Type
Data Source and Type Introduction
1
2
General Process of Data Integration
4 /25
Data Integration
Data Integration
Data Acquisition Data Transformation Data Governance
5 /25
Unstructured Data
TXT
Picture
Video
Audio
….
Semi-Structured
Log
XML
JSON
….
Structured Data
Oracle
MySQL
SQLServer
PostgreSQL
…
Test Data Set
6 /25
Alibaba Cloud Big Data Architecture
General Process of Data Integration
1
2
General Process Data Integration
7 /25
01
Offline
Streaming
Real-Time
Streaming Process
Schedule / Maintain
02
03 Get Insight
Decision Support
Data Warehouse
8 /25
Data Source
Acquisition
• Database
• Local File
• OSS
Data Scrubbing
• SQL
• Custom Code
Data EDA
• Statistics
• Modeling
Data Storage
• Database
Report BI
Agent
• Console App
• Servers
• Sensors
Transfer and Buffer
• Streaming Transfer
Tools
Streaming Process
• Streaming Process
Tools
Data Storage
• Database
01
02
Unified Data Storage
• Database
Ad-Hoc
• Ad-Hoc Query
General Data Processing Workflow
Offline Data Process
9 /25
RDS
Database
OSS Data
Store
Server Load
Balancer
ECS Cluster
Table Store
Auto Scaling
MaxCompute
RDBMS
MySql, Sql Server, Oracle, DB2……
Hadoop Data
Hive, HBASE
Other Data Source
Txt File, Web logs, Vedio /
Audio
Data Source
MaxCompute Basics
02
10 /25
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
11 /25
12 /25
• Project is the most basic unit for resource
isolation
• Multiple projects can share the resources of
the same cluster
• A Project is similar to Oracle’s Database
• Tables, users and jobs are all subordinate to
a project
• After authorization, various projects can
achieve data interoperability
Basic Concepts
PROJECT 2 PROJECT 4
PROJECT 3
PROJECT 1
Table
User
Security
Policy
Job
Resource
13 /25
• Most of the MaxCompute-processed data is stored in a structured bi-dimensional table
• Tables are subordinate to the project
• Tables can be partitioned
• Data types in a table include Bigint, Boolean, Double, Date/Time, String, and Decimal
• Data is managed by the Pangu storage system. The automatic multi-replica storage
policy improves the data availability and blocks underlying hardware faults
• Column-store structure, compressed storage
• Built-in data lifecycle management policy
• Storage quota-based multi-tenant management mechanism
Storage
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
14 /25
MaxCompute Basics
15 /25
SQL MapReduce Graph
Machine
Learning
10000 10000 10000
Cluster 1 Cluster 2 Cluster 3
Apsara Distributed System
MaxCompute Engine
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
16 /25
17 /25
Tunnel
• The channel for data to go in and out of MaxCompute
• High-concurrency upload/download
• Horizontal expansion of service capabilities
• 1P throughput supported in a single day
• Batch and Real-time modes
• The real-time mode supports pub/sub models
• ODPS Tunnel-based tools include TT, CDP, Flume, and Fluentd
18 /25
• Reads and writes to tables are supported, but views are not supported
• Writes to tables adopt the Append mode
• Concurrency is supported to improve overall throughput
• Frequent commits are avoided
• The target partition for data uploads must exist
• Real-time upload mode
Tunnel
19 /25
Data Upload/Download in Tunnel
• odps@ > tunnel upload log.txt test_project.test_table/p1="b1",p2="b2“;
• odps@ > tunnel download test_project.test_table/p1="b1",p2="b2" log.txt;
• It is a Tunnel SDK-based command line tool that can be used for uploading local text
files to ODPS or downloading table data to a local location
• The table partitions should be established
• DataX, CDP, and TT have implemented better tools based on Tunnel, and the tools
can be used to support data interaction between ODPS and relational databases
• The log data can be imported using Flume, and Fluentd tools
• Special scenario users can develop custom tools based on Tunnel
Tunnel Command
20 /25
SQL
• Applicable to process a large amount of data (terabytes to petabytes)
• High Latency: the running time of every SQL statement ranges from dozens of
seconds to several hours.
• The syntax is similar to HQL of Hive, with some extensions on the basis of the
standard SQL.
• There is no transaction, and no primary key.
• UPDATE and DELETE commands are not supported.
DataWorks Basics
03
21 /25
22 /25
DataWorks
04
Getting Started with Alibaba Cloud
23 /25
DEMO
24 /25
Q&A
Data Integration on Alibaba Cloud's Big Data Platform: MaxCompute, DataWorks Demo

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with ElasticsearchAlibaba Cloud
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSJohn Burwell
 
Kubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerKubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerAkshay Mathur
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...DataStax Academy
 
Choosing the right Cloud Database
Choosing the right Cloud DatabaseChoosing the right Cloud Database
Choosing the right Cloud DatabaseJanakiram MSV
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China TelecomMichael Stack
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackIvan Donev
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Ivan Donev
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersIvan Donev
 
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineJelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineRuslan Synytsky
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...confluent
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session IVMware Tanzu
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaMichael Stack
 
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...QCloudMentor
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...HostedbyConfluent
 

Was ist angesagt? (20)

Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CS
 
Kubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerKubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning Controller
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
Aneka platform
Aneka platformAneka platform
Aneka platform
 
Choosing the right Cloud Database
Choosing the right Cloud DatabaseChoosing the right Cloud Database
Choosing the right Cloud Database
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stack
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineJelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
 

Ähnlich wie Data Integration on Alibaba Cloud's Big Data Platform: MaxCompute, DataWorks Demo

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationDenodo
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936Alf Baez
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEOMACHBASE
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
History of Oracle and Databases
History of Oracle and DatabasesHistory of Oracle and Databases
History of Oracle and DatabasesConnor McDonald
 
Presentation racsig 090730
Presentation racsig 090730Presentation racsig 090730
Presentation racsig 090730maclean liu
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...moneyjh
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIBM_Info_Management
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecturemmubashirkhan
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 

Ähnlich wie Data Integration on Alibaba Cloud's Big Data Platform: MaxCompute, DataWorks Demo (20)

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
History of Oracle and Databases
History of Oracle and DatabasesHistory of Oracle and Databases
History of Oracle and Databases
 
Presentation racsig 090730
Presentation racsig 090730Presentation racsig 090730
Presentation racsig 090730
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_Capabilities
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecture
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 

Mehr von Alibaba Cloud

Why a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialWhy a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialAlibaba Cloud
 
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud
 
Serverless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueServerless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueAlibaba Cloud
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningAlibaba Cloud
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Alibaba Cloud
 
An Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAn Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAlibaba Cloud
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesAlibaba Cloud
 
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Alibaba Cloud
 
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudThe Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudAlibaba Cloud
 
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudHow to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudAlibaba Cloud
 
Big Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationBig Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationAlibaba Cloud
 
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Alibaba Cloud
 
Guide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaGuide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaAlibaba Cloud
 
Introduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityIntroduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityAlibaba Cloud
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
 
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017Alibaba Cloud
 

Mehr von Alibaba Cloud (16)

Why a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialWhy a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is Essential
 
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
 
Serverless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueServerless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business Value
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)
 
An Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAn Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message Service
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best Practices
 
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
 
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudThe Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
 
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudHow to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
 
Big Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationBig Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data Visualization
 
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
 
Guide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaGuide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in China
 
Introduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityIntroduction to WAF and Network Application Security
Introduction to WAF and Network Application Security
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart Logistics
 
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
 

Kürzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Kürzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Data Integration on Alibaba Cloud's Big Data Platform: MaxCompute, DataWorks Demo

  • 1. Presented by Derek Meng Data Integration On the Alibaba Cloud Big Data Platform From OSS, RDS to MaxCompute
  • 2. 01 03 02 04 General Process of Data Integration DataWorks Basics MaxCompute Basics Getting Start with Alibaba Cloud DATA INTEGRATION MAXCOMPUTE DATAWORKS DEMO Overview 2 /25 (Slide No. 3-9) (Slide No. 10-20) (Slide No. 23-24)(Slide No. 21-22)
  • 3. 01 General Process of Data Integration 3 /25
  • 4. Data Source and Type Data Source and Type Introduction 1 2 General Process of Data Integration 4 /25
  • 5. Data Integration Data Integration Data Acquisition Data Transformation Data Governance 5 /25 Unstructured Data TXT Picture Video Audio …. Semi-Structured Log XML JSON …. Structured Data Oracle MySQL SQLServer PostgreSQL …
  • 7. Alibaba Cloud Big Data Architecture General Process of Data Integration 1 2 General Process Data Integration 7 /25
  • 8. 01 Offline Streaming Real-Time Streaming Process Schedule / Maintain 02 03 Get Insight Decision Support Data Warehouse 8 /25 Data Source Acquisition • Database • Local File • OSS Data Scrubbing • SQL • Custom Code Data EDA • Statistics • Modeling Data Storage • Database Report BI Agent • Console App • Servers • Sensors Transfer and Buffer • Streaming Transfer Tools Streaming Process • Streaming Process Tools Data Storage • Database 01 02 Unified Data Storage • Database Ad-Hoc • Ad-Hoc Query General Data Processing Workflow
  • 9. Offline Data Process 9 /25 RDS Database OSS Data Store Server Load Balancer ECS Cluster Table Store Auto Scaling MaxCompute RDBMS MySql, Sql Server, Oracle, DB2…… Hadoop Data Hive, HBASE Other Data Source Txt File, Web logs, Vedio / Audio Data Source
  • 11. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 11 /25
  • 12. 12 /25 • Project is the most basic unit for resource isolation • Multiple projects can share the resources of the same cluster • A Project is similar to Oracle’s Database • Tables, users and jobs are all subordinate to a project • After authorization, various projects can achieve data interoperability Basic Concepts PROJECT 2 PROJECT 4 PROJECT 3 PROJECT 1 Table User Security Policy Job Resource
  • 13. 13 /25 • Most of the MaxCompute-processed data is stored in a structured bi-dimensional table • Tables are subordinate to the project • Tables can be partitioned • Data types in a table include Bigint, Boolean, Double, Date/Time, String, and Decimal • Data is managed by the Pangu storage system. The automatic multi-replica storage policy improves the data availability and blocks underlying hardware faults • Column-store structure, compressed storage • Built-in data lifecycle management policy • Storage quota-based multi-tenant management mechanism Storage
  • 14. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 14 /25
  • 15. MaxCompute Basics 15 /25 SQL MapReduce Graph Machine Learning 10000 10000 10000 Cluster 1 Cluster 2 Cluster 3 Apsara Distributed System MaxCompute Engine
  • 16. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 16 /25
  • 17. 17 /25 Tunnel • The channel for data to go in and out of MaxCompute • High-concurrency upload/download • Horizontal expansion of service capabilities • 1P throughput supported in a single day • Batch and Real-time modes • The real-time mode supports pub/sub models • ODPS Tunnel-based tools include TT, CDP, Flume, and Fluentd
  • 18. 18 /25 • Reads and writes to tables are supported, but views are not supported • Writes to tables adopt the Append mode • Concurrency is supported to improve overall throughput • Frequent commits are avoided • The target partition for data uploads must exist • Real-time upload mode Tunnel
  • 19. 19 /25 Data Upload/Download in Tunnel • odps@ > tunnel upload log.txt test_project.test_table/p1="b1",p2="b2“; • odps@ > tunnel download test_project.test_table/p1="b1",p2="b2" log.txt; • It is a Tunnel SDK-based command line tool that can be used for uploading local text files to ODPS or downloading table data to a local location • The table partitions should be established • DataX, CDP, and TT have implemented better tools based on Tunnel, and the tools can be used to support data interaction between ODPS and relational databases • The log data can be imported using Flume, and Fluentd tools • Special scenario users can develop custom tools based on Tunnel Tunnel Command
  • 20. 20 /25 SQL • Applicable to process a large amount of data (terabytes to petabytes) • High Latency: the running time of every SQL statement ranges from dozens of seconds to several hours. • The syntax is similar to HQL of Hive, with some extensions on the basis of the standard SQL. • There is no transaction, and no primary key. • UPDATE and DELETE commands are not supported.
  • 23. 04 Getting Started with Alibaba Cloud 23 /25
  • 25. Q&A

Hinweis der Redaktion

  1. (1) Cooperation with the partners of other BUs NOTE: There must be open and feasible cooperation modes. (2) Overlap with other products A: Elements that are under planning and overlap with existing products B: Elements allowing differentiated cooperation. Emphasize on the two existing differentiated elements of the other party, and then complete the whole development. Illustrate the above information in two PPT slides.
  2. Q&A