SlideShare ist ein Scribd-Unternehmen logo
1 von 39
LINKEDIN USE CASES FOR
TERADATA CONNECTORS
FOR HADOOP
Eric Sun:
Jason Chen:

esun@LinkedIn.com
www.linkedin.com/in/ericsun
jason.chen@Teradata.com
www.linkedin.com/in/jason8chen
Agenda
• UDA
> Need for high volume data movement
> Challenges with high volume data movement

• TDCH & SQL-H
> Data movement between Teradata and Hadoop
> Architecture, key features, and various packages

• LinkedIn POC
> Architecture big picture
> Use cases
> POC environment
> POC results and learning
> Wish list of enhancements

• Next steps and Q&A

Copyright © Teradata 2013
TERADATA UNIFIED DATA ARCHITECTURE
Data Scientists

Quants
Business Analysts

Engineers

LANGUAGES

VIEWPOINT

Customers / Partners
Executives

MATH & STATS

DATA MINING

Aster Connector for
Hadoop

Operational Systems

BUSINESS INTELLIGENCE

Aster Teradata
Connector

DISCOVERY
PLATFORM

Front-Line Workers

SQL-H

Teradata Connector
for Hadoop

SQL-H

Teradata Studio Smart
Loader for Hadoop

CAPTURE | STORE | REFINE

IMAGES

TEXT

WEB & SOCIAL

MACHINE LOGS

Copyright © Teradata 2013

SUPPORT

INTEGRATED
DATA WAREHOUSE

Data Fabric

AUDIO & VIDEO

APPLICATIONS

CRM

SCM

ERP
Data Movement Challenges
• Data Movement supposed to be Easy
- So businesses can spend more time on analytics
- But it is not easy as businesses would like in reality
- Challenges are even greater with massively parallel systems

• Data Movement between Teradata and Hadoop
> Two massively parallel systems
> Any high volume data movement

– Should exploit as much underlying parallelism as appropriate
– Single–threaded or single-node processing architecture will not cut it
– Move data along the path in compressed form for as long as possible
> Various popular Hadoop data formats

– Should be supported to avoid the need for staging & intermediate files
– Automatic data type/format conversion to minimize manual work by users
> Constraints in a production environment

– Should be accommodated as much as possible
– E.g., limitation on concurrent sessions imposed by mixed workload control

Copyright © Teradata 2013
Can technologies work with each other?

Copyright © Teradata 2013
Big Data Army Together

Copyright © Teradata 2013
TERADATA CONNECTOR FOR HADOOP
TDCH Architecture

TD
Export
Import
Tools

Hive

Sqoop

HCat

Pig

MapReduce

Hadoop

I/O Format
DB I/O Format

Teradata

Teradata
I/O Format

…

File I/O Format
Text

Sequence

RC

Hadoop DFS
Teradata DB
Copyright © Teradata 2013
TDCH Technical Highlights
• Build on MapReduce
- For execution and deployment scalability
- Proven scalability for up to thousands of nodes

- For integration with various data formats
- Text, Sequence, RC, ORC (soon), Avro (soon), …

- For integration with other MR based tools
- Sqoop, Hive, Hcatalog, Pig (future), and possibly others

• Built for Usability & Integration
> Simple command-line interface
> Simple application programming interface for developers
> Metadata-based data extraction, load, and conversion
– Built-in support for reading/writing data in Hive and Hcat tables
– Built-in serialization/de-serialization support for various file formats

Copyright © Teradata 2013
TDCH Export Methods (Hadoop  Teradata)

Various Implementations
• Batch-insert
- Each mapper starts a session to insert data via JDBC batch
execution
• Multiple-fastload
- Each mapper starts a separate fastload job to load data via
JDBC fastload
• Internal-fastload
- Each mapper starts a session to load data via JDBC fastload
protocol but all sessions are operating as a single fastload job

Copyright © Teradata 2013
TDCH Import Methods (Teradata  Hadoop)
Various Implementations
• Split-by-value
- Each mapper starts a session to retrieve data in a given value range from

a source table in Teradata

• Split-by-hash
- Each mapper starts a session to retrieve data in a given hash value range

from a source table in Teradata

• Split-by-partition
- Each mapper starts a session to retrieve a subset of partitions from a

source table in Teradata if the source table is already a partitioned table

- If the source table is not a partitioned table, a partitioned staging table

will be created with a partition key that is the same as the distribution key

• Split-by-amp
- Each mapper gets data from an individual amp
- TD 14.10 required; this method makes use of table operators

Copyright © Teradata 2013
Various Packages for End Users
> Teradata Connector for Hadoop
– For users who would like to use a simple command line interface

> Sqoop Connectors for Teradata
– For users who would like to use the Sqoop command line interface
– Sqoop connector for TD from Hortonworks uses TDCH under the cover
– New Sqoop connector for TD from Cloudera uses TDCH under the cover

> TD Studio Smart Loader for Hadoop
– For users who would like to use the Teradata Studio GUI
– TD Studio Smart Loader for Hadoop uses TDCH under the cover for data
movement between Terada and Hadoop

Copyright © Teradata 2013
TERADATA SQL-H
Teradata SQL-H
• SQL-H
> Build on table operators
> Enable dynamic SQL access to Hadoop data
> Can list existing Hadoop database and files
> SQL requests parsed and executed by Teradata
> Can join data in Hadoop with tables in Teradata

• Why is this important?
> Enables analysis of Hadoop data in Teradata
> Allow standard ANSI SQL access to Hadoop data
> Lowers costs by making data analysts self-sufficient

> Leverage existing BI tool investments just like Aster SQL-H does

• Released with Teradata Database 14.10

Copyright © Teradata 2013
Teradata SQL-H Example
SELECT CAST(Price AS DECIMAL (8,2))
,CAST(Make AS VARCHAR(20))
,CAST(Model AS VARCHAR(20))
FROM LOAD_FROM_HCATALOG(
USING
SERVER('sdll4364.labs.teradata.com')
PORT('9083')
USERNAME ('hive')
DBNAME('default')
TABLENAME('CarPriceData')
COLUMNS('*')
TEMPLETON_PORT('1880')
) as CarPriceInfo;


The SQL-H Table Operator
query is launched on the
Teradata side.



Data conversion is conducted within
Teradata after the data has been
transferred from Hadoop.

Copyright © Teradata 2013
Connectors Designed for Two Different Audiences

Teradata

Hadoop

ETL Tools
BI Tools
HCat

Pig

Sqoop

Teradata
Tools

SQL-H

Hive

TD Connector for Hadoop (TDCH)
Teradata
SQL

MapReduce

Text

Sequence

HDFS

Teradata DB
TDCH: Scalable, high performance bi-directional data movement

Copyright © Teradata 2013

RC
The Challenge with Hadoop
• Hadoop
> Excellent scalability
> Rapidly evolving ecosystem
> Not yet as enterprise-ready as one would like
– Lacking support for effective performance management

• Challenge (and opportunity)
> Enterprise tools and apps to fill the gap
> Provide the instrumentation and functionality
– For fine-grain (parallel systems) performance management

• TDCH is improving
> with richer instrumentation and functionality
> to fill the performance management gap as much as possible

Copyright © Teradata 2013
LinkedIn Overall Data Flow
Hadoop
Site
(Member
Facing
Products)

Activity
Data

Kafka

Camus

Member Data

Espresso /
Voldemort /
Oracle

DWH ETL

Product,
Sciences,
Enterprise
Analytics

Changes

Databus

External
Partner Data

Lumos

Ingest
Utilities

Teradata

Enterprise
Products

Core Data
Set

Derived
Data Set

Computed Results for Member Facing Products
Copyright © Teradata 2013
LinkedIn Data System - Hadoop

Most data in Avro format

Access via Pig & Hive

Most High-volume
ETL processes run here

Specialized batch processing
Algorithmic data mining

Copyright © Teradata 2013
LinkedIn Data System - Teradata

Interactive Querying
(Low Latency)

Integrated Data Warehouse
Hourly ETL
Well-modeled Schemas

Workload Management

Standard BI Tools

Copyright © Teradata 2013
LinkedIn Use Cases
• Hadoop is the main platform for data staging, data
exploration, click stream ETL, and machine learning;
• Teradata is the main platform for data warehouse, BI and
relational data discovery;
• Hadoop holds multi-PB data; TD holds hundred-TB data;
• Data need to flow between Hadoop and Teradata;
• Analytical processes and applications need to leverage the
most appropriate platform to deliver the data intelligence:
> Are all the data needed there? (1-week/3-month/3-year…)
> Which programming interfaces are available? (SQL/HQL/Pig…)
> How fast I need/How slow I can tolerate?
> How to share the results? Who will consume them?

Copyright © Teradata 2013
LinkedIn TDCH POC Environment

Copyright © Teradata 2013
LinkedIn Use Cases - Export
• Copy Hourly/Daily Clickstream Data from HDFS to TD
• Copy Scoring & Machine Learning Result from HDFS to TD
> Challenges: Big volume and tight SLA
> Steps:
1. Converted data files from Avro to many ^Z-delimited *.gz files
via Pig first
(flatten map/bag/tuple, and remove special unicode chars)
2. Quickly load *.gz files using Teradata Connector into the staging
table with the help of internal.fastload protocol
3. TDCH execute INSERT DML to copy records from the staging table
into the final fact table
>

Other Options:
1. Combine many *.gz into a few, download to NFS, load via TPT
2. Download many *.gz via webHDFS to NFS, load via TPT

Copyright © Teradata 2013
LinkedIn Use Cases - Export

Copyright © Teradata 2013
LinkedIn Use Cases - Import
• Publish dimension and aggregate tables from TD to HDFS
> Challenges: Heavy query workload on TD and tight SLA.

Traditional JDBC data dump does not yield high throughput to
extract all the dimensional tables within the limited window.
> Steps:
1. Full dump for small to medium size dimensional tables
2. Timestamp-based incremental for big dimensional tables
Then use M/R job to merge the incremental file with the existing
dimensional file on HDFS
Save the new dimensional file using LiAvroStorage() as #LATEST
copy, and retire the previous version
3. Date-partition-based incremental dump for aggregate tables

>

Other Options:
1. Home-grown M/R job to extract using split.by.partition and write
to LiAvroStorage() directory
2. Write Custom TPT OUTMOD Adapter to convert EXPORT operator’s
data parcel to Avro, upload via webHDFS

Copyright © Teradata 2013
High Level Findings

split.by.value is not tested

Network latency plays a
big factor to E2E speed

# sessions is subject
to TD workload rules

Copyright © Teradata 2013
LinkedIn POC Benchmark Reference
Test Data Set:
> About 275M rows and 250 bytes/row
> 2700MB in TD with BLC and 2044MB as GZip text in HDFS
> 64664MB as uncompressed text

• Import uses split-by-hash & Export uses internal-fastload
* Import will spend the first a couple of minutes to spool data
* Export M/R job may combine the specified # of mappers to smaller #

# Mappers

32

64

128

960s

758s

330s

67MB/sec

85MB/sec

190MB/sec

# Mappers
Export

Import Time

Throughput

Import

15

28

52

Import Time

970s

870s

420s

Throughput

67MB/sec

75MB/sec

154MB/sec

Copyright © Teradata 2013
LinkedIn POC Findings
• TDCH is very easy to setup and use
• TDCH provides good throughput for JDBC-based bulk data
movement (import and export)
• TDCH simplifies the bulk data exchange interface, so more
robust ETL system can rely on TDCH
• Network latency can be a big performance factor
• In production environment, it is not practical to execute
TDCH with too many mappers (e.g. over 150+) for TDCH
• Depends on the data set, using too many mappers will not
result in performance gain (because the overhead is high)
• Many factors can impact E2E performance, debug is hard
> Some mappers can run for much longer than the others even

with the similar number of records to process
> Multiple mappers can run on the same DD – is that wrong?
Copyright © Teradata 2013
LinkedIn Business Benefits/Results
• LinkedIn can have simpler methods to move and access
data seamlessly throughout their environment using the
Teradata Connector for Hadoop.
• This leads to reduced costs and operation complexity
because
> Command is invoked from Hadoop gateway machine, so the

security verification is taken care by SSH session and
Kerberos token already.
> The data synchronization between HDFS and Teradata is
faster with the help of both systems’ parallel capability.
> Less ETL jobs are needed for the move data movement,
hence easier to support and troubleshoot.

Copyright © Teradata 2013
Sample Code - Export
hadoop com.teradata.hadoop.tool.TeradataExportTool 
-D mapred.job.queue.name=marathon 
-D mapred.job.name=tdch.export_from_rcfile_to_td.table_name.no_blc 
-D mapred.map.max.attempts=1 
-D mapred.child.java.opts="-Xmx1G -Djava.security.egd=file:/dev/./urandom" 
-D mapreduce.job.max.split.locations=256 
-libjars $LIB_JARS 
-url jdbc:teradata://DWDEV/database=DWH,CHARSET=UTF8 
-username DataCopy -password $DEV_DATACOPY_PASS 
-classname com.teradata.jdbc.TeraDriver 
-queryband "App=TDCH;ClientHost=$HOSTNAME;PID=$$;BLOCKCOMPRESSION=NO;" 
-fileformat rcfile -jobtype hive -method internal.fastload 
-sourcepaths
/user/$USER/tdch/example_table_name.rc 
-debughdfsfile /user/$USER/tdch/mapper_debug_info 
-nummappers 32 
-targettable dwh.tdch_example_table_name 
-sourcetableschema "COL_PK BIGINT, SORT_ID SMALLINT, COL_FLAG TINYINT,
ORDER_ID INT, ORDER_STATE STRING, ..., ORDER_UPDATE_TIME TIMESTAMP"
Copyright © Teradata 2013
Sample Code - Import
hadoop com.teradata.hadoop.tool.TeradataImportTool 
-D mapred.job.queue.name=marathon 
-D mapred.job.name=tdch.import_from_td_to_textfile.table_name 
-D mapred.map.max.attempts=1 
-D mapred.child.java.opts="-Xmx1G -Djava.security.egd=file:/dev/./urandom" 
-D mapred.output.compress=true 
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec 
-libjars $LIB_JARS 
-url jdbc:teradata://DWDEV/database=DWH,CHARSET=UTF8 
-username DataCopy -password $DEV_DATACOPY_PASS 
-classname com.teradata.jdbc.TeraDriver 
-queryband "App=TDCH;ClientHost=$HOSTNAME;PID=$$;" 
-fileformat textfile -jobtype hdfs –method split.by.hash 
-targetpaths
/user/$USER/tdch/example_table_name.text 
-debughdfsfile /user/$USER/tdch/mapper_debug_info 
-nummappers 32 
-sourcetable dwh.tdch_example_table_name
Copyright © Teradata 2013
Mapper Level Performance Info
-debughdfsfile option (sample output)
mapper id is: task_201307092106_634296_m_000000
initialization time of this mapper is: 1382143274810
elapsed time of connection created of this mapper is: 992
total elapsed time of query execution and first record returned
is:221472
total elapsed time of data processing and HDFS write operation
is:296364
end time of this mapper is: 1382143848579

mapper id is: task_201307092106_634273_m_000025
initialization time of this mapper is: 1382143015468
elapsed time of connection created of this mapper is: 463
total elapsed time of data processing and send it to Teradata
is:701876
end time of this mapper is: 1382143720637

Copyright © Teradata 2013
Track Mapper Session in DBQL
TDCH injects task attempt id into QueryBand
Select SubStr( RegExp_SubStr(QueryBand,
'=attempt_[[:digit:]]+_[[:digit:]]+_.*[[:digit:]]+'), 10 ) MR_Task,
SessionID, QueryID,
min(StartTime), min(FirstStepTime), min(FirstRespTime),
sum(NumResultRows), cast(sum(SpoolUsage) as bigint) SpoolUsage,
sum(TotalIOCount), max(MaxIOAmpNumber)
from DBC.DBQLogTbl
where StatementGroup not like 'Other'
and NumResultRows > 0
and UserName = 'DataCopy'
and CollectTimeStamp >= timestamp '2013-10-17 09:10:11'
and QueryBand like '%attemp_id=attempt_201310161616_118033_%'
group by 1,2,3
order by 1, SessionID, QueryID;
* This feature does not work for internal-fastload yet
Copyright © Teradata 2013
Adjust TCP Send/Receive Window
• 1ms ~ 70ms network round trip time (traceroute) can be
the indicator of suboptimal latency, which will significantly
affect TDCH throughput
• If the network just has bad latency without dropping many
packets, increase the TCP window buffer from its default
value 64K to 6MB~8MB can improve TDCH performance
• The result varies based the network and data set’s size
> When data set size for each mapper is small, no visible

improvement is observed
> When data set size for each mapper is big, on a network with
high latency, the TDCH throughput can improve 30% ~ 100%
with big TCP window
> TCP window size has impact on both Import and Export jobs
(Import) jdbc:teradata://tdpid/database=DBC,TCP=RECEIVE8192000
(Export) jdbc:teradata://tdpid/database=DBC,TCP=SEND8192000
Copyright © Teradata 2013
Wish List based on LinkedIn POC
• Compression over the wire/network protocol
> If JDBC and FASTLOAD session can compress records and then

transmit, the TDCH speed can be 3~10 times faster.
> Otherwise, a pair of data import/export proxy agents can help to
buffer, consolidate and compress the network traffic

• Split-By-Partition enhancement
> TDCH can create many partitions for stage table to avoid data

skew (e.g. # partitions = # AMPs)
> But it can then effectively loop these partitions through 32 or 64
sessions without further consuming too much spool

• Avro format support with simple capability of mapping
element/attribute from map/record/array… to columns
• Auto map TD data types to RC & Avro primitive types
• Easier way to use special chars as parameter in command
• More meaningful error message for mapper failure
• Better and granular performance trace and debug info
Copyright © Teradata 2013
What is Next?
• Turn learning into TDCH enhancements
> Data formats:
– Support Avro and Optimized RC

> Metadata access:
– Use dictionary tables through views without extra privileges
> Performance management:
– Instrument for fine-grain monitoring and efficient trouble shooting

• Start proof-of-concept work with SQL-H
> Was in the original plan but ran out of time
> Will start the SQL-H POC after release upgrade to TD 14.10

Copyright © Teradata 2013
Acknowledgement
• Many have contributed to this effort …
• LinkedIn:
> Eric Sun, Jerry Wang, Mark Wagner, Mohammad Islam

• Teradata:
> Bob Hahn, Zoom Han, Ariff Kassam, David Kunz, Ming Lei,

Paul Lett, Mark Li, Deron Ma, Hau Nguyen, Xu Sun, Darrick
Sogabe, Rick Stellwagen, Todd Sylvester, Sherry Wang,
Wenny Wang, Nick Xie, Dehui Zhang, …

Copyright © Teradata 2013
Email: esun@LinkedIn.com
Email: jason.chen@Teradata.com
PARTNERS Mobile App
InfoHub Kiosks
teradata-partners.com

Copyright © Teradata 2013

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2DataWorks Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 

Was ist angesagt? (20)

Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Aster getting started
Aster getting startedAster getting started
Aster getting started
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 

Andere mochten auch

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Working with informtiaca teradata parallel transporter
Working with informtiaca teradata parallel transporterWorking with informtiaca teradata parallel transporter
Working with informtiaca teradata parallel transporterAnjaneyulu Gunti
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInrajappaiyer
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for TeradataAttunity
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudAttunity
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseAttunity
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW) Karan Gulati
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Cloudera, Inc.
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondInside Analysis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 

Andere mochten auch (20)

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Working with informtiaca teradata parallel transporter
Working with informtiaca teradata parallel transporterWorking with informtiaca teradata parallel transporter
Working with informtiaca teradata parallel transporter
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data Streaming
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for Teradata
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the Cloud
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data Warehouse
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 

Ähnlich wie Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop

Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseDataWorks Summit
 
Data Mover for Hadoop | Diyotta
Data Mover for Hadoop | DiyottaData Mover for Hadoop | Diyotta
Data Mover for Hadoop | Diyottadiyotta
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profileSantosh Dandge
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...In-Memory Computing Summit
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 

Ähnlich wie Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop (20)

Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Data Mover for Hadoop | Diyotta
Data Mover for Hadoop | DiyottaData Mover for Hadoop | Diyotta
Data Mover for Hadoop | Diyotta
 
SQL In/On/Around Hadoop
SQL In/On/Around Hadoop SQL In/On/Around Hadoop
SQL In/On/Around Hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profile
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 

Kürzlich hochgeladen

The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdfMintel Group
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...SOFTTECHHUB
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Associazione Digital Days
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxRakhi Bazaar
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryWhittensFineJewelry1
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfJamesConcepcion7
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 

Kürzlich hochgeladen (20)

The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdf
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 

Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop

  • 1. LINKEDIN USE CASES FOR TERADATA CONNECTORS FOR HADOOP Eric Sun: Jason Chen: esun@LinkedIn.com www.linkedin.com/in/ericsun jason.chen@Teradata.com www.linkedin.com/in/jason8chen
  • 2. Agenda • UDA > Need for high volume data movement > Challenges with high volume data movement • TDCH & SQL-H > Data movement between Teradata and Hadoop > Architecture, key features, and various packages • LinkedIn POC > Architecture big picture > Use cases > POC environment > POC results and learning > Wish list of enhancements • Next steps and Q&A Copyright © Teradata 2013
  • 3. TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Business Analysts Engineers LANGUAGES VIEWPOINT Customers / Partners Executives MATH & STATS DATA MINING Aster Connector for Hadoop Operational Systems BUSINESS INTELLIGENCE Aster Teradata Connector DISCOVERY PLATFORM Front-Line Workers SQL-H Teradata Connector for Hadoop SQL-H Teradata Studio Smart Loader for Hadoop CAPTURE | STORE | REFINE IMAGES TEXT WEB & SOCIAL MACHINE LOGS Copyright © Teradata 2013 SUPPORT INTEGRATED DATA WAREHOUSE Data Fabric AUDIO & VIDEO APPLICATIONS CRM SCM ERP
  • 4. Data Movement Challenges • Data Movement supposed to be Easy - So businesses can spend more time on analytics - But it is not easy as businesses would like in reality - Challenges are even greater with massively parallel systems • Data Movement between Teradata and Hadoop > Two massively parallel systems > Any high volume data movement – Should exploit as much underlying parallelism as appropriate – Single–threaded or single-node processing architecture will not cut it – Move data along the path in compressed form for as long as possible > Various popular Hadoop data formats – Should be supported to avoid the need for staging & intermediate files – Automatic data type/format conversion to minimize manual work by users > Constraints in a production environment – Should be accommodated as much as possible – E.g., limitation on concurrent sessions imposed by mixed workload control Copyright © Teradata 2013
  • 5. Can technologies work with each other? Copyright © Teradata 2013
  • 6. Big Data Army Together Copyright © Teradata 2013
  • 8. TDCH Architecture TD Export Import Tools Hive Sqoop HCat Pig MapReduce Hadoop I/O Format DB I/O Format Teradata Teradata I/O Format … File I/O Format Text Sequence RC Hadoop DFS Teradata DB Copyright © Teradata 2013
  • 9. TDCH Technical Highlights • Build on MapReduce - For execution and deployment scalability - Proven scalability for up to thousands of nodes - For integration with various data formats - Text, Sequence, RC, ORC (soon), Avro (soon), … - For integration with other MR based tools - Sqoop, Hive, Hcatalog, Pig (future), and possibly others • Built for Usability & Integration > Simple command-line interface > Simple application programming interface for developers > Metadata-based data extraction, load, and conversion – Built-in support for reading/writing data in Hive and Hcat tables – Built-in serialization/de-serialization support for various file formats Copyright © Teradata 2013
  • 10. TDCH Export Methods (Hadoop  Teradata) Various Implementations • Batch-insert - Each mapper starts a session to insert data via JDBC batch execution • Multiple-fastload - Each mapper starts a separate fastload job to load data via JDBC fastload • Internal-fastload - Each mapper starts a session to load data via JDBC fastload protocol but all sessions are operating as a single fastload job Copyright © Teradata 2013
  • 11. TDCH Import Methods (Teradata  Hadoop) Various Implementations • Split-by-value - Each mapper starts a session to retrieve data in a given value range from a source table in Teradata • Split-by-hash - Each mapper starts a session to retrieve data in a given hash value range from a source table in Teradata • Split-by-partition - Each mapper starts a session to retrieve a subset of partitions from a source table in Teradata if the source table is already a partitioned table - If the source table is not a partitioned table, a partitioned staging table will be created with a partition key that is the same as the distribution key • Split-by-amp - Each mapper gets data from an individual amp - TD 14.10 required; this method makes use of table operators Copyright © Teradata 2013
  • 12. Various Packages for End Users > Teradata Connector for Hadoop – For users who would like to use a simple command line interface > Sqoop Connectors for Teradata – For users who would like to use the Sqoop command line interface – Sqoop connector for TD from Hortonworks uses TDCH under the cover – New Sqoop connector for TD from Cloudera uses TDCH under the cover > TD Studio Smart Loader for Hadoop – For users who would like to use the Teradata Studio GUI – TD Studio Smart Loader for Hadoop uses TDCH under the cover for data movement between Terada and Hadoop Copyright © Teradata 2013
  • 14. Teradata SQL-H • SQL-H > Build on table operators > Enable dynamic SQL access to Hadoop data > Can list existing Hadoop database and files > SQL requests parsed and executed by Teradata > Can join data in Hadoop with tables in Teradata • Why is this important? > Enables analysis of Hadoop data in Teradata > Allow standard ANSI SQL access to Hadoop data > Lowers costs by making data analysts self-sufficient > Leverage existing BI tool investments just like Aster SQL-H does • Released with Teradata Database 14.10 Copyright © Teradata 2013
  • 15. Teradata SQL-H Example SELECT CAST(Price AS DECIMAL (8,2)) ,CAST(Make AS VARCHAR(20)) ,CAST(Model AS VARCHAR(20)) FROM LOAD_FROM_HCATALOG( USING SERVER('sdll4364.labs.teradata.com') PORT('9083') USERNAME ('hive') DBNAME('default') TABLENAME('CarPriceData') COLUMNS('*') TEMPLETON_PORT('1880') ) as CarPriceInfo;  The SQL-H Table Operator query is launched on the Teradata side.  Data conversion is conducted within Teradata after the data has been transferred from Hadoop. Copyright © Teradata 2013
  • 16. Connectors Designed for Two Different Audiences Teradata Hadoop ETL Tools BI Tools HCat Pig Sqoop Teradata Tools SQL-H Hive TD Connector for Hadoop (TDCH) Teradata SQL MapReduce Text Sequence HDFS Teradata DB TDCH: Scalable, high performance bi-directional data movement Copyright © Teradata 2013 RC
  • 17. The Challenge with Hadoop • Hadoop > Excellent scalability > Rapidly evolving ecosystem > Not yet as enterprise-ready as one would like – Lacking support for effective performance management • Challenge (and opportunity) > Enterprise tools and apps to fill the gap > Provide the instrumentation and functionality – For fine-grain (parallel systems) performance management • TDCH is improving > with richer instrumentation and functionality > to fill the performance management gap as much as possible Copyright © Teradata 2013
  • 18. LinkedIn Overall Data Flow Hadoop Site (Member Facing Products) Activity Data Kafka Camus Member Data Espresso / Voldemort / Oracle DWH ETL Product, Sciences, Enterprise Analytics Changes Databus External Partner Data Lumos Ingest Utilities Teradata Enterprise Products Core Data Set Derived Data Set Computed Results for Member Facing Products Copyright © Teradata 2013
  • 19. LinkedIn Data System - Hadoop Most data in Avro format Access via Pig & Hive Most High-volume ETL processes run here Specialized batch processing Algorithmic data mining Copyright © Teradata 2013
  • 20. LinkedIn Data System - Teradata Interactive Querying (Low Latency) Integrated Data Warehouse Hourly ETL Well-modeled Schemas Workload Management Standard BI Tools Copyright © Teradata 2013
  • 21. LinkedIn Use Cases • Hadoop is the main platform for data staging, data exploration, click stream ETL, and machine learning; • Teradata is the main platform for data warehouse, BI and relational data discovery; • Hadoop holds multi-PB data; TD holds hundred-TB data; • Data need to flow between Hadoop and Teradata; • Analytical processes and applications need to leverage the most appropriate platform to deliver the data intelligence: > Are all the data needed there? (1-week/3-month/3-year…) > Which programming interfaces are available? (SQL/HQL/Pig…) > How fast I need/How slow I can tolerate? > How to share the results? Who will consume them? Copyright © Teradata 2013
  • 22. LinkedIn TDCH POC Environment Copyright © Teradata 2013
  • 23. LinkedIn Use Cases - Export • Copy Hourly/Daily Clickstream Data from HDFS to TD • Copy Scoring & Machine Learning Result from HDFS to TD > Challenges: Big volume and tight SLA > Steps: 1. Converted data files from Avro to many ^Z-delimited *.gz files via Pig first (flatten map/bag/tuple, and remove special unicode chars) 2. Quickly load *.gz files using Teradata Connector into the staging table with the help of internal.fastload protocol 3. TDCH execute INSERT DML to copy records from the staging table into the final fact table > Other Options: 1. Combine many *.gz into a few, download to NFS, load via TPT 2. Download many *.gz via webHDFS to NFS, load via TPT Copyright © Teradata 2013
  • 24. LinkedIn Use Cases - Export Copyright © Teradata 2013
  • 25. LinkedIn Use Cases - Import • Publish dimension and aggregate tables from TD to HDFS > Challenges: Heavy query workload on TD and tight SLA. Traditional JDBC data dump does not yield high throughput to extract all the dimensional tables within the limited window. > Steps: 1. Full dump for small to medium size dimensional tables 2. Timestamp-based incremental for big dimensional tables Then use M/R job to merge the incremental file with the existing dimensional file on HDFS Save the new dimensional file using LiAvroStorage() as #LATEST copy, and retire the previous version 3. Date-partition-based incremental dump for aggregate tables > Other Options: 1. Home-grown M/R job to extract using split.by.partition and write to LiAvroStorage() directory 2. Write Custom TPT OUTMOD Adapter to convert EXPORT operator’s data parcel to Avro, upload via webHDFS Copyright © Teradata 2013
  • 26.
  • 27. High Level Findings split.by.value is not tested Network latency plays a big factor to E2E speed # sessions is subject to TD workload rules Copyright © Teradata 2013
  • 28. LinkedIn POC Benchmark Reference Test Data Set: > About 275M rows and 250 bytes/row > 2700MB in TD with BLC and 2044MB as GZip text in HDFS > 64664MB as uncompressed text • Import uses split-by-hash & Export uses internal-fastload * Import will spend the first a couple of minutes to spool data * Export M/R job may combine the specified # of mappers to smaller # # Mappers 32 64 128 960s 758s 330s 67MB/sec 85MB/sec 190MB/sec # Mappers Export Import Time Throughput Import 15 28 52 Import Time 970s 870s 420s Throughput 67MB/sec 75MB/sec 154MB/sec Copyright © Teradata 2013
  • 29. LinkedIn POC Findings • TDCH is very easy to setup and use • TDCH provides good throughput for JDBC-based bulk data movement (import and export) • TDCH simplifies the bulk data exchange interface, so more robust ETL system can rely on TDCH • Network latency can be a big performance factor • In production environment, it is not practical to execute TDCH with too many mappers (e.g. over 150+) for TDCH • Depends on the data set, using too many mappers will not result in performance gain (because the overhead is high) • Many factors can impact E2E performance, debug is hard > Some mappers can run for much longer than the others even with the similar number of records to process > Multiple mappers can run on the same DD – is that wrong? Copyright © Teradata 2013
  • 30. LinkedIn Business Benefits/Results • LinkedIn can have simpler methods to move and access data seamlessly throughout their environment using the Teradata Connector for Hadoop. • This leads to reduced costs and operation complexity because > Command is invoked from Hadoop gateway machine, so the security verification is taken care by SSH session and Kerberos token already. > The data synchronization between HDFS and Teradata is faster with the help of both systems’ parallel capability. > Less ETL jobs are needed for the move data movement, hence easier to support and troubleshoot. Copyright © Teradata 2013
  • 31. Sample Code - Export hadoop com.teradata.hadoop.tool.TeradataExportTool -D mapred.job.queue.name=marathon -D mapred.job.name=tdch.export_from_rcfile_to_td.table_name.no_blc -D mapred.map.max.attempts=1 -D mapred.child.java.opts="-Xmx1G -Djava.security.egd=file:/dev/./urandom" -D mapreduce.job.max.split.locations=256 -libjars $LIB_JARS -url jdbc:teradata://DWDEV/database=DWH,CHARSET=UTF8 -username DataCopy -password $DEV_DATACOPY_PASS -classname com.teradata.jdbc.TeraDriver -queryband "App=TDCH;ClientHost=$HOSTNAME;PID=$$;BLOCKCOMPRESSION=NO;" -fileformat rcfile -jobtype hive -method internal.fastload -sourcepaths /user/$USER/tdch/example_table_name.rc -debughdfsfile /user/$USER/tdch/mapper_debug_info -nummappers 32 -targettable dwh.tdch_example_table_name -sourcetableschema "COL_PK BIGINT, SORT_ID SMALLINT, COL_FLAG TINYINT, ORDER_ID INT, ORDER_STATE STRING, ..., ORDER_UPDATE_TIME TIMESTAMP" Copyright © Teradata 2013
  • 32. Sample Code - Import hadoop com.teradata.hadoop.tool.TeradataImportTool -D mapred.job.queue.name=marathon -D mapred.job.name=tdch.import_from_td_to_textfile.table_name -D mapred.map.max.attempts=1 -D mapred.child.java.opts="-Xmx1G -Djava.security.egd=file:/dev/./urandom" -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec -libjars $LIB_JARS -url jdbc:teradata://DWDEV/database=DWH,CHARSET=UTF8 -username DataCopy -password $DEV_DATACOPY_PASS -classname com.teradata.jdbc.TeraDriver -queryband "App=TDCH;ClientHost=$HOSTNAME;PID=$$;" -fileformat textfile -jobtype hdfs –method split.by.hash -targetpaths /user/$USER/tdch/example_table_name.text -debughdfsfile /user/$USER/tdch/mapper_debug_info -nummappers 32 -sourcetable dwh.tdch_example_table_name Copyright © Teradata 2013
  • 33. Mapper Level Performance Info -debughdfsfile option (sample output) mapper id is: task_201307092106_634296_m_000000 initialization time of this mapper is: 1382143274810 elapsed time of connection created of this mapper is: 992 total elapsed time of query execution and first record returned is:221472 total elapsed time of data processing and HDFS write operation is:296364 end time of this mapper is: 1382143848579 mapper id is: task_201307092106_634273_m_000025 initialization time of this mapper is: 1382143015468 elapsed time of connection created of this mapper is: 463 total elapsed time of data processing and send it to Teradata is:701876 end time of this mapper is: 1382143720637 Copyright © Teradata 2013
  • 34. Track Mapper Session in DBQL TDCH injects task attempt id into QueryBand Select SubStr( RegExp_SubStr(QueryBand, '=attempt_[[:digit:]]+_[[:digit:]]+_.*[[:digit:]]+'), 10 ) MR_Task, SessionID, QueryID, min(StartTime), min(FirstStepTime), min(FirstRespTime), sum(NumResultRows), cast(sum(SpoolUsage) as bigint) SpoolUsage, sum(TotalIOCount), max(MaxIOAmpNumber) from DBC.DBQLogTbl where StatementGroup not like 'Other' and NumResultRows > 0 and UserName = 'DataCopy' and CollectTimeStamp >= timestamp '2013-10-17 09:10:11' and QueryBand like '%attemp_id=attempt_201310161616_118033_%' group by 1,2,3 order by 1, SessionID, QueryID; * This feature does not work for internal-fastload yet Copyright © Teradata 2013
  • 35. Adjust TCP Send/Receive Window • 1ms ~ 70ms network round trip time (traceroute) can be the indicator of suboptimal latency, which will significantly affect TDCH throughput • If the network just has bad latency without dropping many packets, increase the TCP window buffer from its default value 64K to 6MB~8MB can improve TDCH performance • The result varies based the network and data set’s size > When data set size for each mapper is small, no visible improvement is observed > When data set size for each mapper is big, on a network with high latency, the TDCH throughput can improve 30% ~ 100% with big TCP window > TCP window size has impact on both Import and Export jobs (Import) jdbc:teradata://tdpid/database=DBC,TCP=RECEIVE8192000 (Export) jdbc:teradata://tdpid/database=DBC,TCP=SEND8192000 Copyright © Teradata 2013
  • 36. Wish List based on LinkedIn POC • Compression over the wire/network protocol > If JDBC and FASTLOAD session can compress records and then transmit, the TDCH speed can be 3~10 times faster. > Otherwise, a pair of data import/export proxy agents can help to buffer, consolidate and compress the network traffic • Split-By-Partition enhancement > TDCH can create many partitions for stage table to avoid data skew (e.g. # partitions = # AMPs) > But it can then effectively loop these partitions through 32 or 64 sessions without further consuming too much spool • Avro format support with simple capability of mapping element/attribute from map/record/array… to columns • Auto map TD data types to RC & Avro primitive types • Easier way to use special chars as parameter in command • More meaningful error message for mapper failure • Better and granular performance trace and debug info Copyright © Teradata 2013
  • 37. What is Next? • Turn learning into TDCH enhancements > Data formats: – Support Avro and Optimized RC > Metadata access: – Use dictionary tables through views without extra privileges > Performance management: – Instrument for fine-grain monitoring and efficient trouble shooting • Start proof-of-concept work with SQL-H > Was in the original plan but ran out of time > Will start the SQL-H POC after release upgrade to TD 14.10 Copyright © Teradata 2013
  • 38. Acknowledgement • Many have contributed to this effort … • LinkedIn: > Eric Sun, Jerry Wang, Mark Wagner, Mohammad Islam • Teradata: > Bob Hahn, Zoom Han, Ariff Kassam, David Kunz, Ming Lei, Paul Lett, Mark Li, Deron Ma, Hau Nguyen, Xu Sun, Darrick Sogabe, Rick Stellwagen, Todd Sylvester, Sherry Wang, Wenny Wang, Nick Xie, Dehui Zhang, … Copyright © Teradata 2013
  • 39. Email: esun@LinkedIn.com Email: jason.chen@Teradata.com PARTNERS Mobile App InfoHub Kiosks teradata-partners.com Copyright © Teradata 2013

Hinweis der Redaktion

  1. Then, through the lens of big data = “complexity” rather than “volume” – we’re seeing technology evolve that supports:new types of programming at scale such as MapReduce and graph processing engines.Better/different ways of dealing with unstructured dataLess schema dependence – have the flexibility to load data quickly, store cheaply, and process later as needed
  2. Then, through the lens of big data = “complexity” rather than “volume” – we’re seeing technology evolve that supports:new types of programming at scale such as MapReduce and graph processing engines.Better/different ways of dealing with unstructured dataLess schema dependence – have the flexibility to load data quickly, store cheaply, and process later as needed
  3. Then, through the lens of big data = “complexity” rather than “volume” – we’re seeing technology evolve that supports:new types of programming at scale such as MapReduce and graph processing engines.Better/different ways of dealing with unstructured dataLess schema dependence – have the flexibility to load data quickly, store cheaply, and process later as needed
  4. Release of a SQL-H solution for the Teradata database with Teradata Database 14.10.Teradata SQL-H provides dynamic SQL access to Hadoop data in Teradata. With TeradataSQL-H, users can join Hadoop data with Teradata tables.Teradata SQL-H is important to customers because it enables analysis of Hadoop data in Teradata. It also allows standard ANSI SQL access to Hadoop data, leverages existing BI tool investments, as well as lowers costs by making data analysts self-sufficient.