Search data store for the world's largest biometric identity system

•

0 gefällt mir•608 views

Aadhaar application stores and searches through 200M residents' data containing personal and biometrics information. A user can search for records based on various criteria like personal or system information of resident(s). The session will discuss about the approach and challenges to creating a data store to handle 2M inserts/updates and 10M reads/day. You will learn details on storing and handling 16TB of data, spread over 8 shards for high availability and approach on scaling it to handle a total of 1.2 Billion residents' information data) in such a way, that we can process it for analytics.

Search data store for the world's largest
biometric identity system

Regunath Balasubramanian Shashikant Soni
regunathb@gmail.com soni.shashikant@gmail.com
twitter @regunathb

CONFIDENTIAL: For limited circulation only Slide 1

India
● 1.2 billion residents
● 640,000 villages, ~60% lives under $2/day
● ~75% literacy, <3% pays Income Tax, <20% banking
● ~800 million mobile, ~200-300 mn migrant workers

● Govt. spends about $25-40B on direct subsidies
● Residents have no standard identity document
● Most programs plagued with ghost and multiple identities causing
leakage of 30-40%

Slide 2

Aadhaar
● Create a common ‘national identity’ for every ‘resident’
●Biometric backed identity to eliminate duplicates
●‘Verifiable online identity’ for portability
● Applications ecosystem using open APIs
●Aadhaar enabled bank account and payment platform
●Aadhaar enabled electronic, paperless KYC (Know Your
Customer)

Slide 3

Search Requirements
● Multi-attribute query like:
name contains ‘regunath’ AND city = ‘bangalore’ AND
address contains ‘J P Nagar’ AND YearOfBirth = ……

● Search 1.2B resident data with photo, history
●35Kb - Average record size
● Response times in milliseconds
● Open scale out

Slide 4

Why MongoDB
● Auto-sharding
● Replication
● Failover
… Essentially an AP (slaveOk) data store in CAP parlance

● Evolving schema
● Map-Reduce for analysis
● Full text search
●Compound (or) multi-keys

Slide 5

$Design { _id:123456789, name: ‘abcde’, year:1980, ….. } MongoDB 2 Search API Client App Name=‘abcde’ Solr 1 Address=‘some place’ Indexes Name: ‘abcde’ Year= 1980 Address: ‘some place’ year: 1980 ● Read/Search ●Sharded Solr indexes for search ●Keyed document read from MongoDB ● Write ●Eventual consistency (across data sources) driven by application ●Composite MongodDB-Solr app persistence handler Slide 6$

Implementation and Deployment
● Start - 4M records in 2 shards
Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas)
● Performance , Reliability & Durability
●SlaveOk
●getLastError, Write Concern: availability vs durability
 j = journaling
 w = nodes-to-write
● Replica-sets / Shards – how?
RS 1 RS 1 RS 1
Rs 2 RS 2 RS 2

Primary
Config 1 Config 2 Config 3
Secondary

Arbiter Router Router Router
Slide 7

Monitoring and Troubleshooting
● Monitoring tools evaluated
●MMS
●munin
● Manual approach - daily ritual
●RS, DB, config, router - health and stats
● Problem analysis stats
●mongostat, iostat, currentOps, logs
●Client connections
● Stats for storage, shards addition
●Data file size
●Shard data distribution
●Replication
Slide 8

Key Learnings on MongoDB
● Indexing 32 fields
●Compound indexes
●Multi-keys indexes
 {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }
 db.coll.find ({ "indexes.email" : "john.doe@email.com" })
●Indexes use b-tree
●Many fields to index
●Performs well upto 1-2M documents
●Best if index fits in memory
● Data replication, RS failover
●Rollback when RS goes out of sync
 Manual restore (physical data copy)
 Restarting a very stale node
Slide 9

Questions?

Regunath Balasubramanian Shashikant Soni
regunathb@gmail.com soni.shashikant@gmail.com
twitter @regunathb

CONFIDENTIAL: For limited circulation only Slide 10

Empfohlen

Bio-metric Attendance SystemUniversity of Sindh, Jamshoro

Biometric attendance systemDivinity IT Solutions

AadharVivek Kumar

Biometrics Final VisionicsCraig Allen Keefner

Aadhaar (Identity) Innovation ideasShrikant Karwa

zAadhaar Powerpoint - December 20 - Skylar MG JoynerSkylar Joyner, MSW, PHR, PMP

Aadhaar enabled biometric attendance systemAkshay Kashyap

Aebas – attendance system govt of jharkhand366Pi

Empfohlen

Bio-metric Attendance SystemUniversity of Sindh, Jamshoro

Biometric attendance systemDivinity IT Solutions

AadharVivek Kumar

Biometrics Final VisionicsCraig Allen Keefner

Aadhaar (Identity) Innovation ideasShrikant Karwa

zAadhaar Powerpoint - December 20 - Skylar MG JoynerSkylar Joyner, MSW, PHR, PMP

Aadhaar enabled biometric attendance systemAkshay Kashyap

Aebas – attendance system govt of jharkhand366Pi

Fingerprintattendancesystem 131016052949-phpapp01Muhammad Tahir Mehmood

From Cash to CashlessMudit Shukla

Unified Payments Interface (UPI) - Introduction indiastack

Dissertation report on customer satisfaction towards rupay cardSardar Ji

Indian Banking - In a Time For Change - Nandan NilekaniProductNation/iSPIRT

Go cashless, IndiaRanjan Varma

Digital payment merchantsConfederation of Indian Industry

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

Weitere ähnliche Inhalte

Andere mochten auch

Fingerprintattendancesystem 131016052949-phpapp01Muhammad Tahir Mehmood

From Cash to CashlessMudit Shukla

Unified Payments Interface (UPI) - Introduction indiastack

Dissertation report on customer satisfaction towards rupay cardSardar Ji

Indian Banking - In a Time For Change - Nandan NilekaniProductNation/iSPIRT

Go cashless, IndiaRanjan Varma

Digital payment merchantsConfederation of Indian Industry

Andere mochten auch (7)

Fingerprintattendancesystem 131016052949-phpapp01

From Cash to Cashless

Unified Payments Interface (UPI) - Introduction

Dissertation report on customer satisfaction towards rupay card

Indian Banking - In a Time For Change - Nandan Nilekani

Go cashless, India

Digital payment merchants

Mehr von MongoDB