Jordi Nin – Hermes: Distributed social network monitoring system
Nowadays, social network services play a very important role in the way people interact with each other and with the world. This generates big amounts of data that can be used to study social relationships and extract useful information about preferences and trends.When analysing this information, two main problems emerge: The need to aggregate different data coming from multiple sources, and hardware limitations due to the incapability traditional systems have to deal with large amounts of data. In order to solve the problems mentioned before, Hermes aims to implement a distributed, scalable social media analysis tool, ready to connect and gather data from multiple sources and show the aggregated results in real-time using couchdb, elasticsearch and kibana.
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014
1. Hermes
Distributed social network monitoring system
Daniel Cea and Jordi Nin
Barcelona Supercomputing Center (﴾BSC)﴿́
Universitat Politècnica de Catalunya (﴾UPC)﴿
{dcea, nin}@ac.upc.edu
4. Problem formulation
Platform to build social relations among people
who share interests, activities, backgrounds or
real-‐life connections.
New issues born:
Privacy, child safety,
addiction.
4/25
5. Problem formulation
§ Rise of social networks -‐> Big amounts of
social data.
§ Two main problems: Multiple sources +
Hardware limitations.
§ Solution: Implement a distributed, scalable
social media analyser ready to gather from
multiple sources and show the aggregated
results in real-‐time.
5/25
6. Objectives
Input web interface:
§ Start a new query.
§ Control the data
recollection.
§ Query history.
6/25
7. Objectives
Backend:
§ Render interfaces.
§ Gather data from external
APIs.
§ Enrich and store data into a
NoSQL database.
7/25
8. Objectives
Output web interface:
§ See aggregated
results.
§ Filter results.
§ Customize how the
results are displayed.
8/25
11. Data Process
JavaScript (﴾client and server side)﴿
§ Platform: Node.js
§ Web framework: Express
§ Sentiment analysis:
Dictionaries obtained from Amazon Turk*
* Amy Beth Warriner, Victor Kuperman, Marc Brysbaert. "Norms of valence, arousal, and dominance for 13,915 English
lemmas”. December 2013, Ghent university.
11/25
14. Description
Implementation structured in 3 layers, following a Model
View Controller pattern:
• Data access -‐> Storage and indexing of documents
(﴾ json)﴿ and queries.
• Business logic -‐> Start query, manage data stream,
process + enrich tweets, send them to storage.
• User Interface -‐> Allow user control of the system.
14/25
17. Enrichers
Stream slots implement the following data enrichers:
§ Device enricher: Determines the device used to
write the message.
§ Geo enricher: Filters messages by geo-‐location.
§ Spain enricher: For messages coming from Spain,
determines the autonomous community.
17/25
18. Enrichers
§ Stopwords enricher: Remove stop words from
the text.
§ Stemmer enricher: Applies a stem to the prior
filtered words.
§ Sentiment enricher: Determines the sentiment
and arousal of the stemmed message.
18/25
20. Use case: 9N referendum
§ What? -‐> The 9N
unofficial Catalonian
independence referendum
§ When? -‐> from 7th Nov.
2014, to 11th Nov. 2014.
§ Where? -‐> Catalonia
20/25
21. Use case: 9N referendum
§ How? -‐> Storing all tweets with filters:
§ Location: none.
§ Language: none.
§ Text: Contains “9N”.
§ Time: From Nov 7 at 00:00 to Nov 11 at 23:59.
§ Why? -‐> Analyse the reactions in the world before,
during and after the referendum.
21/25
23. General conclusions
§ NoSQL Technologies are crucial for the project.
Couchsbase + Elasticsearch + kibana works
perfectly.
§ Elasticsearch is flexible enough for allowing fast
developing and performing real time queries
§ Kibana allows us to create fancy plots with few
effort
23/25
24. Future work
§ More data sources.
§ Better data enrichment.
§ Add user data context.
§ Percolation queries
24/25