3. Building Indicators via Web Scraping
Analytical Crawling &Text Annotation +Web Search Engines and Information
Retrieval
4. Goal
To build a similarity indicator for cities, on the
basis ofTripAdvisor.com reviews
Reviews are related to main city attractions and
not to restaurants or hotels
Exploit bothTokens and Entities Analysis
9. Reducing the weight of frequent
tokens/entities
Token and Entity frequencies are weigthed by
log N(t) / N
where
N(t) is the number of reviews where a
token/entity occurs at least one time
N is the number of reviews
17. Search Engine Output
The user is able to search for
Keywords in the Recipe’s title
Difficulty
Keywords in the ingredient’s list
Country
Optional output
Recipe’s full description
Ingredients for n persons
19. An Interactive Data Scientist Job Map
Big data sources, crowdsourcing, crowdsensing + DataVisualization &Visual
analytics
20. Goal
To build a map of the cities offering jobs for Data
Scientists
To make the map interactive for users
To show additional info to users
21. Tools
Python for
scraping the web site, data preparation
and data integration
building the API for querying the DB
MySQL for storing data
Jquery and D3.js for visualization
Xampp for server simulation on PhpMyAdmin
28. The goal
Building a query suggestion application exploiting
the information observed on the AOL WebLog.
Constrains:
1) the application relies on observed queries
2) the application must be fast (milliseconds!)
29. The approach
Exploiting the relation between submitted queries
and clicked URL:
If two queries share “a lot or
URLs” then they are strongly
related to each other
30. Idea 1/2
Let q(i) be the i-th query and
u(k) be the k-th clicked url.
A Bipartite Graph can be
built such that for each q(i)
belonging to the query set, a
link to a subsequent clicked
url u(k) can be defined
31. Idea 2/2
Once a Bipartite Graph has
been built, a relation
between any query
belonging to the query set
can be established
accordingly to the clicked
URLs.
An Affinity Graph over the
query set can be defined
consequently
44. - Social Network Analysis: Static
Analysis of Social and
Geographic Distances in On-
Line Social Networks
- Mobility Data Analysis: Space
Dynamics in On-line Social
Networks
- Data Journalism: Immigration
Stories
What else?