SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Salesforce API Series
Fast Parallel Data Loading with the Bulk API
February 26, 2014
Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of
the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking
statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service
availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future
operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use
of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with
possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and
motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling nonsalesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial
results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This documents and
others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be
delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available.
Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

#forcewebinar
Speakers
Steve Bobrowski
Architect Evangelist
@sbob909

#forcewebinar

Sean Regan
Architect Evangelist
@sfdcsregan
Follow Developer Force for the Latest News
@forcedotcom / #forcewebinar
Developer Force – Force.com Community
+Developer Force – Force.com Community
Developer Force
Developer Force Group
#forcewebinar
How fast can you
load data into Salesforce?
How many records can you load
into Salesforce in 1 hour?
Data load throughput
Records/Hour
25,000,000
20,000,000
15,000,000
10,000,000
5,000,000
OK

#forcewebinar

Fast

Faster
Parallel processing
A parallel processing analogy: digging a ditch

#forcewebinar
Serial processing

#forcewebinar
Parallel processing

#forcewebinar
The number of processes or threads
associated with an operation.
Optimal parallel processing
5M records

Parallel

5M records
5M records
5M records

Serial

20M records

Time
#forcewebinar
Sub-optimal parallel processing
5M records

Parallel

5M records
5M records
5M records

Serial

20M records

Time
#forcewebinar
Locks, exceptions, triggers, relationships, …
5M records

Parallel

5M records
5M records
5M records

Serial

20M records

Time
#forcewebinar

Throughput
inhibitors
Data load case studies
§  Get hands on with the Salesforce Bulk API
§  Contrast serial data loads vs. parallel data loads
§  Measure degrees of parallelism and throughput
§  Identify and avoid throughput inhibitors
§  Achieve maximum throughput

#forcewebinar
Prep work
Salesforce Bulk API
§  Asynchronous data loading
§  Optimized for large data sets
§  REST API
§  Powers many tools
§  Use to build custom tools with any programming
language (Java, etc.)

#forcewebinar
Demo schema

#forcewebinar
Bulk API Loads that …

ealize, nvestigate, and lan
Case Studies
Serial Data Load
Serial load: Expected plan
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread

• 
• 
• 
• 

Time
#forcewebinar

One job
100 batches
10,000 records/batch
1M total records
Serial load: Job configuration

#forcewebinar
Serial load: Batch creation

#forcewebinar
Serial load: Batch run

#forcewebinar
Demo
Serial load
Serial load summary
Concurrency Mode
Records Loaded
Records Failed

Serial
1 million
0

Run Time

52 minutes

Work Completed

48 minutes

Throughput
Degree of Parallelism
Key Problem
Solution

19,500 records per minute
0.94
Degree of parallelism explicitly limited to ~1.
Explore parallel load for increased throughput.

#forcewebinar
Throughput Records/Min

Parallelism vs. Throughput of a Single Job
350000

Serial Run
•  Low degree of parallelism

300000
250000
200000
150000
100000

50000 Serial
0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Degree of Parallelism
#forcewebinar
Parallel data loads
Parallel load: Expected plan
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread

• 
• 
• 
• 

One job
100 batches
10,000 records/batch
1M total records

Time
#forcewebinar
Parallel load: Job configuration

#forcewebinar
Things to watch for
§  Locks can significantly affect parallel loads
–  Wasted processing capacity
–  Reduced throughput
–  Failures

§  Retry logic is not all its cracked up to be

#forcewebinar
Demo
Parallel 1
Parallel load 1 summary
Concurrency Mode
Records Loaded
Records Failed

Parallel
125,000
875,000

Run Time

10 minutes

Work Completed

2 hours and 30 minutes

Throughput
Degree of Parallelism
Key Problem
Solution

20,000 records per minute
15.79
Lock Exceptions. Server worked significantly harder but no increase in throughput.
Run the load in serial mode or manage locks.

#forcewebinar
Throughput Records/Min

Parallelism vs. throughput of a single job
350000

Parallel Run 1
•  High degree of parallelism
•  Low throughput due to locks

300000
250000
200000
150000
100000

50000 Serial

Parallel 1

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Degree of Parallelism
#forcewebinar
Time to optimize
§ 
§ 

Let’s make your data load
ealize
–  Locks inhibit parallelism and throughput

§ 

nvestigate
–  What is causing the locks

§ 

lan
–  Manage the locks

#forcewebinar
Demo
Parallel load 2
Eliminate Locks by Modifying Schema
Parallel load: Sample results
Concurrency Mode
Records Loaded
Records Failed

Parallel
1 million
0

Run Time

3 minutes and 30 seconds

Work Completed

1 hour

Throughput
Degree of Parallelism
Key Problem
Solution

320,000 records per minute
19
None
n/a

#forcewebinar
Throughput Records/Min

Parallelism vs. throughput of a single job
350000

Parallel 2

Parallel Run 2
•  High degree of parallelism
•  High throughput

300000
250000
200000
150000
100000

50000 Serial

Parallel 1

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Degree of Parallelism
#forcewebinar
Locks can be managed by
§  Elimination
§  Ordering load file

#forcewebinar
Demo
Parallel load 3
Avoid Locks with Ordered Data
Managing locks … a discussion while we load
§  Master-detail relationships
§  Lookup relationships
§  Roll-up summary fields
§  Triggers
§  Workflow rules
§  Group membership locks*

#forcewebinar
Parallel load: Sample results
Concurrency Mode
Records Loaded
Records Failed

Parallel
1 million
0

Run Time

4 minutes

Work Completed

1 hour

Throughput
Degree of Parallelism
Key Problem
Solution

250,000 records per minute
16.5
Minimal overhead due to locks
Remove all unnecessary locks

#forcewebinar
Throughput Records/Min

Parallelism vs. throughput of a single job
350000

Parallel Run 3
•  High degree of parallelism
•  High throughput

300000
250000

Parallel 2
Parallel 3

200000
150000
100000
50000 Serial

Parallel 1

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Degree of Parallelism
#forcewebinar
Controlled feed/parallel
data loads
Controlled feed load methodology
§  Explicit throttling on parallelism and throughput
–  Parallel extraction and loading
–  Prioritization of asynchronous processing capacity

§  Manage inhibitors in complex jobs
–  Data Skews
–  Multiple Locks

#forcewebinar
Throughput Records/Min

Parallelism vs. throughput of a single job
350000

Parallel 2

Controlled Feed Run
•  Reduced parallelism
•  Expected throughput

300000
250000

Parallel 3

200000
150000
100000

Controlled Feed

50000 Serial

Parallel 1

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Degree of Parallelism
#forcewebinar
Related wiki article and Architect Core Resources

#forcewebinar
Recap
§ 
§ 

Make your parallel data loads
ealize
–  Locks inhibit parallelism and throughput

§ 

nvestigate
–  What is causing the locks

§ 

lan
–  Manage the locks

#forcewebinar
Q&A
Steve Bobrowski
Architect Evangelist
@sbob909

#forcewebinar

Sean Regan
Architect Evangelist
@sfdcsregan

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
Oracle Database Vaultのご紹介
Oracle Database Vaultのご紹介Oracle Database Vaultのご紹介
Oracle Database Vaultのご紹介
 
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S... Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Oracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイドOracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイド
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Design Patterns for Asynchronous Apex
Design Patterns for Asynchronous ApexDesign Patterns for Asynchronous Apex
Design Patterns for Asynchronous Apex
 
OutSystems Lessons: Center of Excellence and Adoption Strategies
OutSystems Lessons: Center of Excellence and Adoption StrategiesOutSystems Lessons: Center of Excellence and Adoption Strategies
OutSystems Lessons: Center of Excellence and Adoption Strategies
 
Introduction to Apex Triggers
Introduction to Apex TriggersIntroduction to Apex Triggers
Introduction to Apex Triggers
 
Deep Dive into Apex Triggers
Deep Dive into Apex TriggersDeep Dive into Apex Triggers
Deep Dive into Apex Triggers
 
From Sandbox To Production: An Introduction to Salesforce Release Management
From Sandbox To Production: An Introduction to Salesforce Release ManagementFrom Sandbox To Production: An Introduction to Salesforce Release Management
From Sandbox To Production: An Introduction to Salesforce Release Management
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Creating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle ApexCreating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle Apex
 
Circles of success - So you have created or acquired a mess - now what (1)
Circles of success - So you have created or acquired a mess - now what (1)Circles of success - So you have created or acquired a mess - now what (1)
Circles of success - So you have created or acquired a mess - now what (1)
 
Salesforce Integration Patterns
Salesforce Integration PatternsSalesforce Integration Patterns
Salesforce Integration Patterns
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
 

Ähnlich wie Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar

Building Command-line Tools with the Tooling API
Building Command-line Tools with the Tooling APIBuilding Command-line Tools with the Tooling API
Building Command-line Tools with the Tooling API
Jeff Douglas
 

Ähnlich wie Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar (20)

Fast Parallel Data Loading with the Bulk API #Forcewebinar UK Salesforce1
Fast Parallel Data Loading with the Bulk API #Forcewebinar UK Salesforce1Fast Parallel Data Loading with the Bulk API #Forcewebinar UK Salesforce1
Fast Parallel Data Loading with the Bulk API #Forcewebinar UK Salesforce1
 
Fast Parallel Data Loading with the Bulk API #Forcewebinar - Salesforce1
Fast Parallel Data Loading with the Bulk API #Forcewebinar - Salesforce1Fast Parallel Data Loading with the Bulk API #Forcewebinar - Salesforce1
Fast Parallel Data Loading with the Bulk API #Forcewebinar - Salesforce1
 
Winter 14 Release Developer Preview
Winter 14 Release Developer PreviewWinter 14 Release Developer Preview
Winter 14 Release Developer Preview
 
Mds cloud saturday 2015 salesforce intro
Mds cloud saturday 2015 salesforce introMds cloud saturday 2015 salesforce intro
Mds cloud saturday 2015 salesforce intro
 
Understanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce PlatformUnderstanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce Platform
 
Summer '13 Developer Preview Webinar
Summer '13 Developer Preview WebinarSummer '13 Developer Preview Webinar
Summer '13 Developer Preview Webinar
 
Apex Trigger Debugging: Solving the Hard Problems
Apex Trigger Debugging: Solving the Hard ProblemsApex Trigger Debugging: Solving the Hard Problems
Apex Trigger Debugging: Solving the Hard Problems
 
Building Command-line Tools with the Tooling API
Building Command-line Tools with the Tooling APIBuilding Command-line Tools with the Tooling API
Building Command-line Tools with the Tooling API
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
 
Build Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectBuild Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku Connect
 
Designing custom REST and SOAP interfaces on Force.com
Designing custom REST and SOAP interfaces on Force.comDesigning custom REST and SOAP interfaces on Force.com
Designing custom REST and SOAP interfaces on Force.com
 
Salesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinarSalesforce API Series: Release Management with the Metadata API webinar
Salesforce API Series: Release Management with the Metadata API webinar
 
Architecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your SolutionArchitecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your Solution
 
Connect Your Clouds with Force.com
Connect Your Clouds with Force.comConnect Your Clouds with Force.com
Connect Your Clouds with Force.com
 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We Do
 
Webinar: From Sandbox to Production: Demystifying Force.com Release Managemen...
Webinar: From Sandbox to Production: Demystifying Force.com Release Managemen...Webinar: From Sandbox to Production: Demystifying Force.com Release Managemen...
Webinar: From Sandbox to Production: Demystifying Force.com Release Managemen...
 
Apex Nuances: Transitioning to Force.com Development
Apex Nuances: Transitioning to Force.com DevelopmentApex Nuances: Transitioning to Force.com Development
Apex Nuances: Transitioning to Force.com Development
 
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache Phoenix
 
Large Data Management Strategies
Large Data Management StrategiesLarge Data Management Strategies
Large Data Management Strategies
 

Mehr von Salesforce Developers

Mehr von Salesforce Developers (20)

Sample Gallery: Reference Code and Best Practices for Salesforce Developers
Sample Gallery: Reference Code and Best Practices for Salesforce DevelopersSample Gallery: Reference Code and Best Practices for Salesforce Developers
Sample Gallery: Reference Code and Best Practices for Salesforce Developers
 
Maximizing Salesforce Lightning Experience and Lightning Component Performance
Maximizing Salesforce Lightning Experience and Lightning Component PerformanceMaximizing Salesforce Lightning Experience and Lightning Component Performance
Maximizing Salesforce Lightning Experience and Lightning Component Performance
 
Local development with Open Source Base Components
Local development with Open Source Base ComponentsLocal development with Open Source Base Components
Local development with Open Source Base Components
 
TrailheaDX India : Developer Highlights
TrailheaDX India : Developer HighlightsTrailheaDX India : Developer Highlights
TrailheaDX India : Developer Highlights
 
Why developers shouldn’t miss TrailheaDX India
Why developers shouldn’t miss TrailheaDX IndiaWhy developers shouldn’t miss TrailheaDX India
Why developers shouldn’t miss TrailheaDX India
 
CodeLive: Build Lightning Web Components faster with Local Development
CodeLive: Build Lightning Web Components faster with Local DevelopmentCodeLive: Build Lightning Web Components faster with Local Development
CodeLive: Build Lightning Web Components faster with Local Development
 
CodeLive: Converting Aura Components to Lightning Web Components
CodeLive: Converting Aura Components to Lightning Web ComponentsCodeLive: Converting Aura Components to Lightning Web Components
CodeLive: Converting Aura Components to Lightning Web Components
 
Enterprise-grade UI with open source Lightning Web Components
Enterprise-grade UI with open source Lightning Web ComponentsEnterprise-grade UI with open source Lightning Web Components
Enterprise-grade UI with open source Lightning Web Components
 
TrailheaDX and Summer '19: Developer Highlights
TrailheaDX and Summer '19: Developer HighlightsTrailheaDX and Summer '19: Developer Highlights
TrailheaDX and Summer '19: Developer Highlights
 
Live coding with LWC
Live coding with LWCLive coding with LWC
Live coding with LWC
 
Lightning web components - Episode 4 : Security and Testing
Lightning web components  - Episode 4 : Security and TestingLightning web components  - Episode 4 : Security and Testing
Lightning web components - Episode 4 : Security and Testing
 
LWC Episode 3- Component Communication and Aura Interoperability
LWC Episode 3- Component Communication and Aura InteroperabilityLWC Episode 3- Component Communication and Aura Interoperability
LWC Episode 3- Component Communication and Aura Interoperability
 
Lightning web components episode 2- work with salesforce data
Lightning web components   episode 2- work with salesforce dataLightning web components   episode 2- work with salesforce data
Lightning web components episode 2- work with salesforce data
 
Lightning web components - Episode 1 - An Introduction
Lightning web components - Episode 1 - An IntroductionLightning web components - Episode 1 - An Introduction
Lightning web components - Episode 1 - An Introduction
 
Migrating CPQ to Advanced Calculator and JSQCP
Migrating CPQ to Advanced Calculator and JSQCPMigrating CPQ to Advanced Calculator and JSQCP
Migrating CPQ to Advanced Calculator and JSQCP
 
Scale with Large Data Volumes and Big Objects in Salesforce
Scale with Large Data Volumes and Big Objects in SalesforceScale with Large Data Volumes and Big Objects in Salesforce
Scale with Large Data Volumes and Big Objects in Salesforce
 
Replicate Salesforce Data in Real Time with Change Data Capture
Replicate Salesforce Data in Real Time with Change Data CaptureReplicate Salesforce Data in Real Time with Change Data Capture
Replicate Salesforce Data in Real Time with Change Data Capture
 
Modern Development with Salesforce DX
Modern Development with Salesforce DXModern Development with Salesforce DX
Modern Development with Salesforce DX
 
Get Into Lightning Flow Development
Get Into Lightning Flow DevelopmentGet Into Lightning Flow Development
Get Into Lightning Flow Development
 
Integrate CMS Content Into Lightning Communities with CMS Connect
Integrate CMS Content Into Lightning Communities with CMS ConnectIntegrate CMS Content Into Lightning Communities with CMS Connect
Integrate CMS Content Into Lightning Communities with CMS Connect
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar

  • 1. Salesforce API Series Fast Parallel Data Loading with the Bulk API February 26, 2014
  • 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling nonsalesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. #forcewebinar
  • 4. Follow Developer Force for the Latest News @forcedotcom / #forcewebinar Developer Force – Force.com Community +Developer Force – Force.com Community Developer Force Developer Force Group #forcewebinar
  • 5. How fast can you load data into Salesforce?
  • 6. How many records can you load into Salesforce in 1 hour?
  • 9. A parallel processing analogy: digging a ditch #forcewebinar
  • 12. The number of processes or threads associated with an operation.
  • 13. Optimal parallel processing 5M records Parallel 5M records 5M records 5M records Serial 20M records Time #forcewebinar
  • 14. Sub-optimal parallel processing 5M records Parallel 5M records 5M records 5M records Serial 20M records Time #forcewebinar
  • 15. Locks, exceptions, triggers, relationships, … 5M records Parallel 5M records 5M records 5M records Serial 20M records Time #forcewebinar Throughput inhibitors
  • 16. Data load case studies §  Get hands on with the Salesforce Bulk API §  Contrast serial data loads vs. parallel data loads §  Measure degrees of parallelism and throughput §  Identify and avoid throughput inhibitors §  Achieve maximum throughput #forcewebinar
  • 18. Salesforce Bulk API §  Asynchronous data loading §  Optimized for large data sets §  REST API §  Powers many tools §  Use to build custom tools with any programming language (Java, etc.) #forcewebinar
  • 20. Bulk API Loads that … ealize, nvestigate, and lan
  • 23. Serial load: Expected plan Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread •  •  •  •  Time #forcewebinar One job 100 batches 10,000 records/batch 1M total records
  • 24. Serial load: Job configuration #forcewebinar
  • 25. Serial load: Batch creation #forcewebinar
  • 26. Serial load: Batch run #forcewebinar
  • 28. Serial load summary Concurrency Mode Records Loaded Records Failed Serial 1 million 0 Run Time 52 minutes Work Completed 48 minutes Throughput Degree of Parallelism Key Problem Solution 19,500 records per minute 0.94 Degree of parallelism explicitly limited to ~1. Explore parallel load for increased throughput. #forcewebinar
  • 29. Throughput Records/Min Parallelism vs. Throughput of a Single Job 350000 Serial Run •  Low degree of parallelism 300000 250000 200000 150000 100000 50000 Serial 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Degree of Parallelism #forcewebinar
  • 31. Parallel load: Expected plan Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread •  •  •  •  One job 100 batches 10,000 records/batch 1M total records Time #forcewebinar
  • 32. Parallel load: Job configuration #forcewebinar
  • 33. Things to watch for §  Locks can significantly affect parallel loads –  Wasted processing capacity –  Reduced throughput –  Failures §  Retry logic is not all its cracked up to be #forcewebinar
  • 35. Parallel load 1 summary Concurrency Mode Records Loaded Records Failed Parallel 125,000 875,000 Run Time 10 minutes Work Completed 2 hours and 30 minutes Throughput Degree of Parallelism Key Problem Solution 20,000 records per minute 15.79 Lock Exceptions. Server worked significantly harder but no increase in throughput. Run the load in serial mode or manage locks. #forcewebinar
  • 36. Throughput Records/Min Parallelism vs. throughput of a single job 350000 Parallel Run 1 •  High degree of parallelism •  Low throughput due to locks 300000 250000 200000 150000 100000 50000 Serial Parallel 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Degree of Parallelism #forcewebinar
  • 37. Time to optimize §  §  Let’s make your data load ealize –  Locks inhibit parallelism and throughput §  nvestigate –  What is causing the locks §  lan –  Manage the locks #forcewebinar
  • 38. Demo Parallel load 2 Eliminate Locks by Modifying Schema
  • 39. Parallel load: Sample results Concurrency Mode Records Loaded Records Failed Parallel 1 million 0 Run Time 3 minutes and 30 seconds Work Completed 1 hour Throughput Degree of Parallelism Key Problem Solution 320,000 records per minute 19 None n/a #forcewebinar
  • 40. Throughput Records/Min Parallelism vs. throughput of a single job 350000 Parallel 2 Parallel Run 2 •  High degree of parallelism •  High throughput 300000 250000 200000 150000 100000 50000 Serial Parallel 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Degree of Parallelism #forcewebinar
  • 41. Locks can be managed by §  Elimination §  Ordering load file #forcewebinar
  • 42. Demo Parallel load 3 Avoid Locks with Ordered Data
  • 43. Managing locks … a discussion while we load §  Master-detail relationships §  Lookup relationships §  Roll-up summary fields §  Triggers §  Workflow rules §  Group membership locks* #forcewebinar
  • 44. Parallel load: Sample results Concurrency Mode Records Loaded Records Failed Parallel 1 million 0 Run Time 4 minutes Work Completed 1 hour Throughput Degree of Parallelism Key Problem Solution 250,000 records per minute 16.5 Minimal overhead due to locks Remove all unnecessary locks #forcewebinar
  • 45. Throughput Records/Min Parallelism vs. throughput of a single job 350000 Parallel Run 3 •  High degree of parallelism •  High throughput 300000 250000 Parallel 2 Parallel 3 200000 150000 100000 50000 Serial Parallel 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Degree of Parallelism #forcewebinar
  • 47. Controlled feed load methodology §  Explicit throttling on parallelism and throughput –  Parallel extraction and loading –  Prioritization of asynchronous processing capacity §  Manage inhibitors in complex jobs –  Data Skews –  Multiple Locks #forcewebinar
  • 48. Throughput Records/Min Parallelism vs. throughput of a single job 350000 Parallel 2 Controlled Feed Run •  Reduced parallelism •  Expected throughput 300000 250000 Parallel 3 200000 150000 100000 Controlled Feed 50000 Serial Parallel 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Degree of Parallelism #forcewebinar
  • 49. Related wiki article and Architect Core Resources #forcewebinar
  • 50. Recap §  §  Make your parallel data loads ealize –  Locks inhibit parallelism and throughput §  nvestigate –  What is causing the locks §  lan –  Manage the locks #forcewebinar