Attendees will learn how to build an operational data hub that can be used as a silo-buster. In this session, I will show how we developed a data hub at TD Ameritrade to provide actionable 360 views of client data using MongoDB. I will also explain why this approach suited our use case better than a Hadoop-based data lake.
Strategies for Landing an Oracle DBA Job as a Fresher
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
1. Near real-timeAnalytical Data Hub with MongoDB
§ Real-time 360 Views of Advisor Client Data
June 18, 2019
Tajdar Siddiqui
Dilip Vazirani
Gary Russo @garyprusso
3. About – Presenters
● Gary Russo (TD Ameritrade):
Ø NoSQL Architect and Developer building mission critical NoSQL solutions for the past 10+ years.
● Dilip Vazirani (TD Ameritrade):
Ø Technology Portfolio Management
● Tajdar Siddiqui (TD Ameritrade):
Ø Principal Architect at TD Ameritrade
4. About – TD Ameritrade
● Founded by Joe Ricketts on May 1st, 1975 as First Omaha Securities, Inc.
● Headquarters in Omaha, Nebraska
● Customer Types:
Ø Retail Investors – investment tools for self-directed investors
Ø Institutional Investors – provides rich set of products for Investment Advisors
● 9,700 Full-time Employees
● 360+ branches in 48 states
● 11.5 million funded accounts
● Averages 900,000 trades per day
● TD Bank owns 42% of TD Ameritrade
5. What is the VEO One Platform?
● Suite of Tools for Institutional Financial Advisors
● Provides:
Ø Single unified 360 degree view of clients and business
Ø Tools for Portfolio Management and Financial Planning
Ø Convenient access to data and analytics
Ø Personalized views to track "book of business“
Ø View real-time balances, positions, and history
6. Analytical Capabilities Added leveraging
MongoDB
● Veo One Analytics Advisor Dashboard is used to track:
Ø Total AUM – assets under management
Ø Asset In-flows
Ø Asset Out-flows
Ø Total NNA = In-flows minus Out-flows
Ø Client Segmentation – facets by market value, generation, etc.
Ø New Accounts Opened – Client Onboarding
Ø ACATS – tracks money flowing in and out of “book of business”
Ø NNA Goals
8. Why NoSQL (Not-Only-SQL)?
• SQL plus much more!
• Sub-second Search and Query
• Schema-agnostic
• Centralized Data Governance
• Data Level Security
• Build Apps in Weeks instead of Months
10. Sharded MongoDB Infrastructure – Future Direction
• Hybrid Infrastructure:
Ø Leverage existing Blade Servers with SAN SSD Storage
Ø Scale-out horizontally – Add Rackmount Servers with SSD Local Storage
for additional shards
11. Data Sizing Calculations
# Metric Value
1 Number of accounts 4,500,000
2 Number of Trade Days Per Year 252
3 Number of Years 11
4 Number of documents (multiply items 1, 2, and 3) 12,474,000,000
5 Average Doc Storage Size (bytes) 144
6 Total Disk Space (bytes) (multiply items 4 and 5) 1,796,256,000,000
7 Total Disk Space Needed (GB) 1,673
8 Estimated Index Size (GB) – assume 25% of item 7 420
9 Total Disk Space Needed for Collection (GB) 2,093
For collection: Account_Measures_Daily
Used to store a daily account measures documents for 4.5 million accounts for the past 11 years (and
counting) .
13. Why MongoDB
• Scale Out Sharding
• Shared Nothing Architecture
• Aggregation Framework
• Heterogeneous Query Type Support (Faceted, Full Text, Geospatial)
• Replication
• Spread Application/Analyst workloads across Primary/Secondaries
• Intra/Inter Data Center Resiliency
• Ops Manager API to automate admin_tasks
24. Key Takeaways
● MongoDB right fit for Near Real Time Analytical Data Hub
● Scale_out at App/Db tiers
● Architect for Resiliency
● Test, test, test: Unit/System/Integration/Performance/Infrastructure
● Monitor everything
● Be prepared to scale rapidly if Business likes your POC/Phase_1