Hermes 
Distributed social network monitoring system 
Daniel Cea and Jordi Nin 
Barcelona Supercomputing Center (﴾BSC)﴿́ 
...
Index 
1. Introduction 
2. Technologies 
3. Implementation 
4. Results 
5. Conclusions
1. Introduction 
Problem formulation 
Objectives
Problem formulation 
Platform to build social relations among people 
who share interests, activities, backgrounds or 
rea...
Problem formulation 
§ Rise of social networks -‐> Big amounts of 
social data. 
§ Two main problems: Multiple sources +...
Objectives 
Input web interface: 
§ Start a new query. 
§ Control the data 
recollection. 
§ Query history. 
6/25
Objectives 
Backend: 
§ Render interfaces. 
§ Gather data from external 
APIs. 
§ Enrich and store data into a 
NoSQL d...
Objectives 
Output web interface: 
§ See aggregated 
results. 
§ Filter results. 
§ Customize how the 
results are disp...
2. Current Technologies 
Data Access 
Data Process 
Data Storage
Data Access 
Twitter Stream API (﴾ready to add other sources)﴿ 
10/25
Data Process 
JavaScript (﴾client and server side)﴿ 
§ Platform: Node.js 
§ Web framework: Express 
§ Sentiment analysi...
Data Storage 
CouchBase (﴾Storage)﴿ + ElasticSearch (﴾Indexing)﴿ 
12/25
3. Implementation 
Description 
Data access layer 
Business logic layer 
Enrichers
Description 
Implementation structured in 3 layers, following a Model 
View Controller pattern: 
• Data access -‐> Storage...
15/25
16/25
Enrichers 
Stream slots implement the following data enrichers: 
§ Device enricher: Determines the device used to 
write ...
Enrichers 
§ Stopwords enricher: Remove stop words from 
the text. 
§ Stemmer enricher: Applies a stem to the prior 
fil...
4. Results 
Use case: 9N referendum
Use case: 9N referendum 
§ What? -‐> The 9N 
unofficial Catalonian 
independence referendum 
§ When? -‐> from 7th Nov. 
...
Use case: 9N referendum 
§ How? -‐> Storing all tweets with filters: 
§ Location: none. 
§ Language: none. 
§ Text: Co...
5. Conclusions 
General conclusions 
Future work
General conclusions 
§ NoSQL Technologies are crucial for the project. 
Couchsbase + Elasticsearch + kibana works 
perfec...
Future work 
§ More data sources. 
§ Better data enrichment. 
§ Add user data context. 
§ Percolation queries 
24/25
Hermes 
Thank you for your attention
Nächste SlideShare
Wird geladen in …5
×

Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

841 Aufrufe

Veröffentlicht am

Jordi Nin – Hermes: Distributed social network monitoring system

Nowadays, social network services play a very important role in the way people interact with each other and with the world. This generates big amounts of data that can be used to study social relationships and extract useful information about preferences and trends.When analysing this information, two main problems emerge: The need to aggregate different data coming from multiple sources, and hardware limitations due to the incapability traditional systems have to deal with large amounts of data. In order to solve the problems mentioned before, Hermes aims to implement a distributed, scalable social media analysis tool, ready to connect and gather data from multiple sources and show the aggregated results in real-time using couchdb, elasticsearch and kibana.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

  1. 1. Hermes Distributed social network monitoring system Daniel Cea and Jordi Nin Barcelona Supercomputing Center (﴾BSC)﴿́ Universitat Politècnica de Catalunya (﴾UPC)﴿ {dcea, nin}@ac.upc.edu
  2. 2. Index 1. Introduction 2. Technologies 3. Implementation 4. Results 5. Conclusions
  3. 3. 1. Introduction Problem formulation Objectives
  4. 4. Problem formulation Platform to build social relations among people who share interests, activities, backgrounds or real-‐life connections. New issues born: Privacy, child safety, addiction. 4/25
  5. 5. Problem formulation § Rise of social networks -‐> Big amounts of social data. § Two main problems: Multiple sources + Hardware limitations. § Solution: Implement a distributed, scalable social media analyser ready to gather from multiple sources and show the aggregated results in real-‐time. 5/25
  6. 6. Objectives Input web interface: § Start a new query. § Control the data recollection. § Query history. 6/25
  7. 7. Objectives Backend: § Render interfaces. § Gather data from external APIs. § Enrich and store data into a NoSQL database. 7/25
  8. 8. Objectives Output web interface: § See aggregated results. § Filter results. § Customize how the results are displayed. 8/25
  9. 9. 2. Current Technologies Data Access Data Process Data Storage
  10. 10. Data Access Twitter Stream API (﴾ready to add other sources)﴿ 10/25
  11. 11. Data Process JavaScript (﴾client and server side)﴿ § Platform: Node.js § Web framework: Express § Sentiment analysis: Dictionaries obtained from Amazon Turk* * Amy Beth Warriner, Victor Kuperman, Marc Brysbaert. "Norms of valence, arousal, and dominance for 13,915 English lemmas”. December 2013, Ghent university. 11/25
  12. 12. Data Storage CouchBase (﴾Storage)﴿ + ElasticSearch (﴾Indexing)﴿ 12/25
  13. 13. 3. Implementation Description Data access layer Business logic layer Enrichers
  14. 14. Description Implementation structured in 3 layers, following a Model View Controller pattern: • Data access -‐> Storage and indexing of documents (﴾ json)﴿ and queries. • Business logic -‐> Start query, manage data stream, process + enrich tweets, send them to storage. • User Interface -‐> Allow user control of the system. 14/25
  15. 15. 15/25
  16. 16. 16/25
  17. 17. Enrichers Stream slots implement the following data enrichers: § Device enricher: Determines the device used to write the message. § Geo enricher: Filters messages by geo-‐location. § Spain enricher: For messages coming from Spain, determines the autonomous community. 17/25
  18. 18. Enrichers § Stopwords enricher: Remove stop words from the text. § Stemmer enricher: Applies a stem to the prior filtered words. § Sentiment enricher: Determines the sentiment and arousal of the stemmed message. 18/25
  19. 19. 4. Results Use case: 9N referendum
  20. 20. Use case: 9N referendum § What? -‐> The 9N unofficial Catalonian independence referendum § When? -‐> from 7th Nov. 2014, to 11th Nov. 2014. § Where? -‐> Catalonia 20/25
  21. 21. Use case: 9N referendum § How? -‐> Storing all tweets with filters: § Location: none. § Language: none. § Text: Contains “9N”. § Time: From Nov 7 at 00:00 to Nov 11 at 23:59. § Why? -‐> Analyse the reactions in the world before, during and after the referendum. 21/25
  22. 22. 5. Conclusions General conclusions Future work
  23. 23. General conclusions § NoSQL Technologies are crucial for the project. Couchsbase + Elasticsearch + kibana works perfectly. § Elasticsearch is flexible enough for allowing fast developing and performing real time queries § Kibana allows us to create fancy plots with few effort 23/25
  24. 24. Future work § More data sources. § Better data enrichment. § Add user data context. § Percolation queries 24/25
  25. 25. Hermes Thank you for your attention

×