This document discusses how to efficiently replicate over 1 billion Salesforce records while minimizing API usage to avoid reaching limits. It recommends using the Bulk API to fetch large amounts of data in fewer calls, paginating queries, and only fetching changed records. Methods described include initial full fetching followed by incremental fetching of just changed records since the last sync. Error handling and dealing with unavailable objects are also covered.
3. In this session…
•Implisit -Intro & Motivation
•Salesforce APIs Usage & Limits -Overview
•Efficient use of Salesforce APIs
•Scale and limitations
•Other pitfalls and tips
4. Implisit –The End of CRM Data Entry
•Implisit uses Data-Mining and Machine Learning to keep Salesforce updated:
–Updating emails and calendar events to Salesforce automatically
–Creating and updating Accounts, Opportunities, Contacts, Leads
–Keeping team informed on all client communications
•Using text analysis:
–Creating meaningful business insights
–Improving forecasting and sales pipeline management
•Requires Salesforce data replication for offline processing
5. Data Replication Goals
•Minimize your API usage
–Avoid reaching the API limit
–API limits are shared between all API-connected apps –other apps can be blocked
•Minimize sync cycle time
–Don’t makeour customers wait for too long
6. Salesforce API Limits
•Daily API limits for Salesforce Editions:
–Unlimited/Performance: # of users x 5,000, up to 1,000,000
–Enterprise/Professional: # of users x 1,000
–Developer: 15,000
–Sandbox: 5,000,000
•In-parallel API calls limit (25 –production, 5 –dev)
Source & more info: https://help.salesforce.com/HTViewHelpDoc?id=integrate_api_rate_limiting.htm
7. Performance Stats
•Keeping over one billionSalesforce records replicated in-sync
–27 Salesforce object types are replicated (e.g. Accounts, Contacts)
•Initial sync
–600-1000 API calls in total
•Updates sync
–200-400 API calls in total
–Performed every few hours
8.
9. •Bulk (Async) API
–Large amounts of records in a single request (fewer API calls)
–Slow, requires polling for results
–Implements internal retries
–Does not support some objects (e.g. OpportunityHistory)
Salesforce API Types
•REST API
–Fast, synchronous queries
–Up to 2,000 records per request
–Each request –single API call
–Simple usage
https://developer.salesforce.com/blogs/tech-pubs/2011/10/salesforce-apis-what-they-are-when-to-use-them.html
12. Replication method –Initial fetching
•Using Bulk API as much as possible
•Fetch all records for each relevant object type
–Lots of data
–Only non-deleted records
•Paginate by CreatedDate
•Example:
–1stquery: “…ORDER BY CreatedDateLIMIT 100000”
–Subsequent: “…WHERE CreatedDate> 2014-08-31T02:29:29Z ORDER BY CreatedDateLIMIT 100000”
13. Replication method –Changes fetching
•Fetch only records that changed since the previous fetch time
–Less data –only changes
–Take care of updates and deletions
•Using SystemModstampas indicator for changes in record
•Same pagination logic as in initial fetching
•Example:
–1stquery: “…WHERE SystemModstamp> 2014-07-31T02:29:29Z AND ORDER BY CreatedDateLIMIT 100000”
–Subsequent:“…WHERE SystemModstamp> 2014-07-31T02:29:29Z AND CreatedDate> 2014-08-31T02:29:29Z ORDER BY CreatedDateLIMIT 100000”
•Bulk changes fetching VS getUpdated()
14. Deleted items
•Motivation:
–Required to maintain consistent sync
•Two implementation options
–Use getDeleted()call in SOAP API (our choice)
–Use queryAll(isDeleted= True)call in REST API
•Potentially more API calls
•Some objects can become “undeleted” !
15. Getting all fields
•No “SELECT *” support
•Get all fields for table using “describe”
–Optionally, filter the fields (skip custom fields, etc…)
–Non-visible fields (due to security restrictions)
•Use the field names in the query
•Limitation: query length cannot exceed 20,000 characters*
* http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_select.htm
16. User Access Restrictions
•Full access rights are strongly encouraged
–Full view of all objects
–Limited access rights → slower queries
•Reference Fields –special case
–Tasks / Events -WhoId, WhatId
–Attachment -ParentId
–Reference fields make access checks in Salesforce even slower
–Limited to 100,000 different values per query
–Solution: query in smaller chunks
17. Error handling
•Nothing is fail-safe
•Different APIs produce different errors
•Examples:
–Query too long (too many fields)
–Scale limitations
–Communication errors
–Salesforce maintenance windows
•Add support for anything you encounter
–“Rare” becomes “frequent” once you scale
•ABR (Always Be Retrying)
•Remember to clean up upon errors
–Close open bulk jobs
18. Unavailable Salesforce objects
•Some orgs make some of the objects unavailable
–Using security restriction
–For example, Lead or Opportunity
•Check using describeSObjectsfor each object, before fetching
•Safely skip when not supported
19. Summary
•Implisit -Intro & Motivation
•Salesforce APIs Overview
•Efficient use of API
•Scale and limitations
•Other pitfalls and tips