SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
CUBRID Reference Architecture for Social Networking Service Kieun Park NHN Business Platform Corp. 2011.8
46  CUBRID Reference Architecture for Social Networking Service 2 /
Abstract 46  CUBRID Reference Architecture for Social Networking Service 3 / The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important. Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and  demands new database architecture to handle them. The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues. In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique  is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
I Am 46  CUBRID Reference Architecture for Social Networking Service 4 / 박기은Kieun Park ,[object Object]
Service Platform Development Center
NHN Business Platform Corp.
iamyaw@nhn.com
CUBRID Open Source DBMS
nStore Distributed Database System,[object Object]
Contents 46  CUBRID Reference Architecture for Social Networking Service 6 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service Business demands and system requirements Main considerations of database architecture for OSN service Scale-out, performance, and high availability
Contents 46  CUBRID Reference Architecture for Social Networking Service 7 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID unique features CUBRID reference architecture for social networking service Index scan with top-k sorting technique High availability feature Automatic sharding component CUBRID Cluster System nStore, a distributed database system based on the CUBRID
Contents 46  CUBRID Reference Architecture for Social Networking Service 8 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service CUBRID Web Reference Architecture CUBRID SNS Reference Architecture
46  CUBRID Reference Architecture for Social Networking Service 9 / Characteristics of online social networking service
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 10 / The history and evolution of OSN are made in last 10 years. Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 11 / 500 million Facebook users, 106 million Twitter users Social networks with user bases larger than the population of most countries Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 12 / The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day. Twitter gets about 600 million queries every day. (http://twitaholic.com) Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 13 / The most followed person, Eminem, has more than 44 million fans. More than 5 billion pieces of content shared each week. 2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook. (http://www.independent.co.uk) Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/ Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 14 / Have we reached a world of infinite information? In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second. By 2020, roughly 25x1018 (quintillion) information containers Every minute, 24 hours of video The growth gap between the digital contents created and the available storage Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
Statistics of Facebook and Twitter 46  CUBRID Reference Architecture for Social Networking Service 15 / 140 million; the average number of Tweets people sent per day. 6,939;current TPSrecord. More than 750 million active users. There are over 900 million objects that people interact with (pages, groups, events and community pages) Source http://www.facebook.com/press/info.php?statistics Source http://blog.twitter.com/2011/03/numbers.html
Statistics of Me2Day 46  CUBRID Reference Architecture for Social Networking Service 16 / Postings per day: 278,461 Total postings: 123,456,727 Total photos: 10,638,089
Online social networking service 46  CUBRID Reference Architecture for Social Networking Service 17 / Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
Feed Following Works 46  CUBRID Reference Architecture for Social Networking Service 18 / Feeds Following Contents (comment, photo, tag,  …) Follower News Feeds (personalized feeds) Application Layer Outbox Inbox Delivery & Aggregation Engine Content Management Layer Cache Database Database Data Storage Layer
Characteristics of Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 19 / Power-law scalinggrowth ,[object Object]
Followers gets personalized feeds that aggregate streams produced those followed.
Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.Online social networks have properties of significant clustering, small diameter, and power-law degrees. Zipfiandistribution access Data feeding frenzy Twitter Activity 5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
46  CUBRID Reference Architecture for Social Networking Service 20 / Challenges and demands on database architecture
Challenge and Demands on Database Architecture 46  CUBRID Reference Architecture for Social Networking Service 21 / From business demands to technology implementation. ,[object Object]
Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes] With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage.  When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens. Challenge and Demands on Database Architecture 46  CUBRID Reference Architecture for Social Networking Service 22 / Managing user generated socialinteraction data! Coping with explosion in data volume! Cost-effective scale-out to meet rapidly growing demands!
46  CUBRID Reference Architecture for Social Networking Service 23 / CUBRID unique features
CUBRID 46  CUBRID Reference Architecture for Social Networking Service 24 / Free open source is the choice of the modern world Powerful clean architecture with rich functionality for competitive performance Enterprise unique features for stability and reliability
[object Object]
Reclaim deleted space
Fast serial data (cached)
LFS (large file support ) for database volumeCUBRID 46  CUBRID Reference Architecture for Social Networking Service 25 / CUBRID 4.0 stable released. July, 2011 CUBRID 3.0 stable released. October, 2010 Official open source community, www.cubrid.org, opened. ,[object Object]
Database volume size reduced.
Multi-range scan and key limit function
Covered indexOctober, 2009 CUBRID Cluster Project has been started. September, 2009 CUBRID 2008 R2.0 stable released. August, 2009 ,[object Object]
HA monitoring
Full SQL function supportCUBRID became an open source project. CUBRID 2008 R1.1 stable was released. November, 2008 First internal release CUBRID 2008 R1.0 October, 2008 The development of CUBRID DBMS started. 2011 2006  2007  2008  2009  2010  2012
CUBRID Index Scan with Top-k Sorting Technique 46  CUBRID Reference Architecture for Social Networking Service 26 / CUBRID does multi-range index scan. My friends’ newest twenty comments SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000 ORDER BY registered_date DESC LIMIT 20 Multi-range scan Single range scan with key filter Disk I/O ?! # of leaf pages accessed > # of keys of scan result # of leaf pages accessed  = # of keys of scan result Filter out On the fly sorting during scan Sort after scan (4,10001) (4,9999) (4,875) … (4,10001) (4,9999) (4,875) … (36,947) (36,120) (36,3) … (36,947) (36,120) (36,3) … (15, 10000) (15,9999) (15, 7467) … (15, 10000) (15,9999) (15, 7467) …
CUBRID Index Scan with Top-k Sorting Technique 46  CUBRID Reference Architecture for Social Networking Service 27 / SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ ORDER BY b LIMIT 3; SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ ORDER BY b LIMIT 3;
CUBRID Test Results 46  CUBRID Reference Architecture for Social Networking Service 28 / Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test Test case 1: user group 1 only Test case 2: user group 2 only Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3 Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3 User group 1: users with 50 or less friends User group 2: users with 51~2000 friends User group 3: users with friends up to tens of thousands
CUBRID High Availability Feature 46  CUBRID Reference Architecture for Social Networking Service 29 / CUBRID HA, highly fault-resistant DBMS enables ,[object Object]

Weitere ähnliche Inhalte

Mehr von CUBRID

The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRIDCUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on WindowsCUBRID
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on LinuxCUBRID
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCUBRID
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCUBRID
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCUBRID
 

Mehr von CUBRID (6)

The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on Windows
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on Linux
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 Replication
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 Migration
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha Implementation
 

Kürzlich hochgeladen

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Kürzlich hochgeladen (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

CUBRID Features Optimized for Social Networking Services

  • 1. CUBRID Reference Architecture for Social Networking Service Kieun Park NHN Business Platform Corp. 2011.8
  • 2. 46 CUBRID Reference Architecture for Social Networking Service 2 /
  • 3. Abstract 46 CUBRID Reference Architecture for Social Networking Service 3 / The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important. Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and demands new database architecture to handle them. The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues. In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
  • 4.
  • 9.
  • 10. Contents 46 CUBRID Reference Architecture for Social Networking Service 6 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service Business demands and system requirements Main considerations of database architecture for OSN service Scale-out, performance, and high availability
  • 11. Contents 46 CUBRID Reference Architecture for Social Networking Service 7 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID unique features CUBRID reference architecture for social networking service Index scan with top-k sorting technique High availability feature Automatic sharding component CUBRID Cluster System nStore, a distributed database system based on the CUBRID
  • 12. Contents 46 CUBRID Reference Architecture for Social Networking Service 8 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service CUBRID Web Reference Architecture CUBRID SNS Reference Architecture
  • 13. 46 CUBRID Reference Architecture for Social Networking Service 9 / Characteristics of online social networking service
  • 14. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 10 / The history and evolution of OSN are made in last 10 years. Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
  • 15. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 11 / 500 million Facebook users, 106 million Twitter users Social networks with user bases larger than the population of most countries Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
  • 16. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 12 / The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day. Twitter gets about 600 million queries every day. (http://twitaholic.com) Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
  • 17. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 13 / The most followed person, Eminem, has more than 44 million fans. More than 5 billion pieces of content shared each week. 2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook. (http://www.independent.co.uk) Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/ Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
  • 18. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 14 / Have we reached a world of infinite information? In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second. By 2020, roughly 25x1018 (quintillion) information containers Every minute, 24 hours of video The growth gap between the digital contents created and the available storage Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
  • 19. Statistics of Facebook and Twitter 46 CUBRID Reference Architecture for Social Networking Service 15 / 140 million; the average number of Tweets people sent per day. 6,939;current TPSrecord. More than 750 million active users. There are over 900 million objects that people interact with (pages, groups, events and community pages) Source http://www.facebook.com/press/info.php?statistics Source http://blog.twitter.com/2011/03/numbers.html
  • 20. Statistics of Me2Day 46 CUBRID Reference Architecture for Social Networking Service 16 / Postings per day: 278,461 Total postings: 123,456,727 Total photos: 10,638,089
  • 21. Online social networking service 46 CUBRID Reference Architecture for Social Networking Service 17 / Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
  • 22. Feed Following Works 46 CUBRID Reference Architecture for Social Networking Service 18 / Feeds Following Contents (comment, photo, tag, …) Follower News Feeds (personalized feeds) Application Layer Outbox Inbox Delivery & Aggregation Engine Content Management Layer Cache Database Database Data Storage Layer
  • 23.
  • 24. Followers gets personalized feeds that aggregate streams produced those followed.
  • 25. Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.Online social networks have properties of significant clustering, small diameter, and power-law degrees. Zipfiandistribution access Data feeding frenzy Twitter Activity 5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
  • 26. 46 CUBRID Reference Architecture for Social Networking Service 20 / Challenges and demands on database architecture
  • 27.
  • 28. Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
  • 29. Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
  • 30. Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes] With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage. When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens. Challenge and Demands on Database Architecture 46 CUBRID Reference Architecture for Social Networking Service 22 / Managing user generated socialinteraction data! Coping with explosion in data volume! Cost-effective scale-out to meet rapidly growing demands!
  • 31. 46 CUBRID Reference Architecture for Social Networking Service 23 / CUBRID unique features
  • 32. CUBRID 46 CUBRID Reference Architecture for Social Networking Service 24 / Free open source is the choice of the modern world Powerful clean architecture with rich functionality for competitive performance Enterprise unique features for stability and reliability
  • 33.
  • 35. Fast serial data (cached)
  • 36.
  • 38. Multi-range scan and key limit function
  • 39.
  • 41. Full SQL function supportCUBRID became an open source project. CUBRID 2008 R1.1 stable was released. November, 2008 First internal release CUBRID 2008 R1.0 October, 2008 The development of CUBRID DBMS started. 2011 2006  2007  2008  2009  2010  2012
  • 42. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 26 / CUBRID does multi-range index scan. My friends’ newest twenty comments SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000 ORDER BY registered_date DESC LIMIT 20 Multi-range scan Single range scan with key filter Disk I/O ?! # of leaf pages accessed > # of keys of scan result # of leaf pages accessed = # of keys of scan result Filter out On the fly sorting during scan Sort after scan (4,10001) (4,9999) (4,875) … (4,10001) (4,9999) (4,875) … (36,947) (36,120) (36,3) … (36,947) (36,120) (36,3) … (15, 10000) (15,9999) (15, 7467) … (15, 10000) (15,9999) (15, 7467) …
  • 43. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 27 / SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ ORDER BY b LIMIT 3; SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ ORDER BY b LIMIT 3;
  • 44. CUBRID Test Results 46 CUBRID Reference Architecture for Social Networking Service 28 / Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test Test case 1: user group 1 only Test case 2: user group 2 only Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3 Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3 User group 1: users with 50 or less friends User group 2: users with 51~2000 friends User group 3: users with friends up to tens of thousands
  • 45.
  • 48. Various acess modes (read-write, read-only)Application CUBRID Driver CUBRID Driver UPDATE SELECT UPDATE Broker Active Broker Backup Broker automatic switch-over Read-Only Mode Read-Write Mode Standby-2 Server @Remote IDC Standby-1 Server automatic fail-over/fail-back Active Server Database Server Slave DB Master DB Slave DB
  • 49. CUBRID High Availability Feature 46 CUBRID Reference Architecture for Social Networking Service 30 / UPDATE SELECT Heartbeat Heartbeat Log Applying Log Applying Log Shipping (synchronous) Log Writer Log Applier CUBRID Server Log Writer Log Applier CUBRID Server Slave DB Replication Log Replication Log Transaction Log Transaction Log Master DB S1-Node Standby Server Node A-Node Active Server Node Log Shipping (asynchronous) Heartbeat SELECT Log Applying HA feature is based on database replication with transaction log multiplication technique. Slave DB Replication Log Transaction Log Statement-based replication could cause data inconsistency. S2-Node
  • 50.
  • 51.
  • 52.
  • 54.
  • 56. Additionally, linear scalabilityApplication SELECT * FROM gtable WHERE part_key=2 AND … INSERT INTO gtable … Broker load balancing global schema / distributed partition gtable part_01 part_05 gtable part_02 part_06 gtable part_03 part_07 gtable part_04 part_08 Node #1 Node #2 Node #3 Node #4 Cluster Server
  • 57. CUBRID Cluster System 46 CUBRID Reference Architecture for Social Networking Service 33 / The global schemais a single representation or a global view of all nodes where each node has its own database and schema. SELECT * FROM contents WHERE auth = (SELECT name FROM author WHERE …) Local Schema User Global Schema User UPDATE local … SELECT * FROM contents WHERE … SELECT * FROM info, code WHERE info.id = code.id INSERT INTO contents… info contents author Global Schema author code level local contents contents contents info Local Schema #4 Local Schema #3 Local Schema #2 Local Schema #1 The users can access any databases through a single schema regardless of and without knowing the location of the distributed data. Database #1 Database #2 Database #3 Database #4
  • 58. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 34 / Global Schema Data System Catalog Logical View Logical View Index Physical View Physical View Schema Schema Data System Catalog System Catalog Data Index Index
  • 59. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 35 / The distributed partition maps global schema onto table partitioning. Partitions are resident in different nodes but accessed through global schema. SELECT * FROM gtable, info WHERE gtable.part_key=02 AND info.id = gtable.id gtable – PARTITION BY HASH (part_key) info part_01 part_02 part_03 part_04 Global Schema part_05 part_06 part_07 part_08 Partition Data Partition Data Partition Data Partition Data part_02 part_03 part_03 part_01 info part_06 part_07 part_08 part_05 Database #1 Database #2 Database #3 Database #4
  • 60.
  • 61.
  • 62.
  • 63. nStore, a distributed database system based on the CUBRID 46 CUBRID Reference Architecture for Social Networking Service 38 / Application Container Server Container (ckey=iamyaw) nStore Equi-join REST API Table A Table B Container Server Table C Indexed Column Equi-join Container Server Container Server Global Table G Management Node Indexed Column Container (ckey=kieun_park) Equi-join Container Server Table A Table B Tables Table C Indexed Column Distribution layer RDBMS Indexed Column
  • 64. nStore Test Results 46 CUBRID Reference Architecture for Social Networking Service 39 / Tested using YCSB (http://research.yahoo.com/Web_Information_Management/YCSB) INSERT: 50,000,000 records (1K size) READ: Zifian distribution READ w/ compaction: after SSTable compaction (Cassandra, Hbase) READ/UPDATE: 50:50 (50,000,000 records DB) READ/INSERT: 50:50 (50,000,000 records DB)
  • 65. 46 CUBRID Reference Architecture for Social Networking Service 40 / CUBRID referencearchitecture for social networking service
  • 66. CUBRID Web Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 41 / Mid-size web service Web Server (User Interface) Small-size web service Web Application Server (Business Logic) Cache Server Web Server RW RO DB Sharding master master master master CUNITOR master slave slave slave slave slave CUBRID HA CUBRID HA
  • 67. Social Networking Service Architecture 46 CUBRID Reference Architecture for Social Networking Service 42 / Web Servers (User Interface) Cache Layer Web Application Servers (Business Logic) Social Query Engine Aggregation Engine Delivery Engine Search Engine Recommendation Engine User Profile DB Social Relation DB Analytics DB Feed Outbox DB Feed Inbox DB Search Index
  • 68. CUBRID SNS Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 43 / Analytic DB partitioned for OLAP Application servers ETL Cache server farm node #2 node #n node #1 CUBRID Cluster User profile DB sharded by user-id Social relation DB sharded by user-id Inbox/Outbox storage distributed according to user-id OAM RW RO RW RO broker broker DB Sharding container container DB Sharding container container management slave slave slave slave monitoring server container container nStore w/ CUBRID CUNITOR master master master master CUBRID HA CUBRID HA
  • 69. Best Practices 46 CUBRID Reference Architecture for Social Networking Service 44 / High available database architecture is the basic business requirements and not technical barrier anymore. Automatic shardingis an effective way to scale-out DB system storing relational model data. nStore is a solution for peta-byte scale data with benefits of high available and scalable distributed store.