Chenxi Yang, Yang Chen, Qingyuan Gong, Xinlei He, Yu Xiao, Yuhuan Huang, Xiaoming Fu. Understanding the Behavioral Differences Between American and German Users: A Data-Driven Study. Big Data Mining and Analytics, 2018, 1(4):284-296.
Understanding the Behavioral Differences Between American and German Users: A Data-Driven Study
1. Nov/14/2018
Understanding the Behavioral Differences
Between American and German Users:
A Data-Driven Study
Chenxi Yang1, Yang Chen1, Qingyuan Gong1, Xinlei He1, Yu Xiao2,
Yuhuan Huang3, Xiaoming Fu4
Fudan Univ.1, Aalto Univ.2, Guangdong Univ. of Foreign Studies3, Univ. of Gö̈ttingen4
Big Data Mining and Analytics, 2018, 1(4):284-296.
4. Related Work & Problems
✤ Facebook Survey [Krasnova et al., HICSS’10]:
Explore the differences to self-disclosure between
American and German
Not large enough to form a cultural impact
Lack of data comprehensiveness
Only the answers to the questions or users’ posts
Lack of movement pattern and points-of-interest (POIs)
5. Solution: LBSNs
✤ Location-Based Social Networks
- Location-centric activities
- Social interactions
- Viable data source
Case Study: American & German Users in Yelp
6. Contributions
✤ Provide a comprehensive demographic analysis of
American and German users’ behavior
- Friends’ distribution
- Daily schedule
- Collectivism & individualism
- A classification model detecting cultural background
✤ Verify the feasibility of applying big data analysis in the
context of cultural behavior
7. Dataset
Introduction
✤ Yelp
The world’s largest online
“urban guide” and
business review sites
- 4,700,000 reviews
- 156,000 businesses
- 1,100,000 users
✤ Yelp Open Dataset
8. Data Analysis: Social Graph
✤ More friends on Yelp: American
✤ Higher proportion of influential users: American
Nation Avg. CC Var. CC Avg. Degree Var. Degree
Avg.
PageRank
Var.
PageRank
USA 6.045 0.252 2.298 1.845 0.087 0.438
Germany 4.688 0.191 0.965 2.842 0.017 0.288
9. Data Analysis: Social Graph
✤ German users’ friends are
gathered in fewer cities
✤ The location distribution
of American users’ friends
is slightly wider
Location distributions of friends of
American and German users
10. Data Analysis:
Writing Styles
✤ More affective: American
✤ Collectivism: German
Individualism: American
Occurrence Frequency of Different
Categories of Words in Reviews
Dimension Values of American and
German Users
Nation Affect Anger Tenta Certain Swear Friends
USA 6.045 0.252 2.298 1.845 0.087 0.438
Germany 4.688 0.191 0.965 2.842 0.017 0.288
Nation Category I We
USA
Beauty & Spas 7.02 0.53
Health Medical 6.85 0.61
Home Services 4.89 1.61
Nightlife 3.72 1.79
Restaurant 4.08 1.49
Shopping 5.47 0.88
Avg. 5.34 1.15
Germany
Beauty & Spas 4.17 0.27
Health Medical 3.58 0.31
Home Services 2.16 1.05
Nightlife 1.73 1.29
Restaurant 1.78 1.34
Shopping 3.04 0.46
Avg. 2.74 0.79
11. Data Analysis: Business Categories
Category Pattern
✤ “Food”, “Nightlife” and
“Shopping”: Similar
✤ “Restaurants” and “Public
Services”: German
12. Data Analysis: Rating
✤ Most users: Good rating
✤ American: Wilder rating
German: Milder rating
Distribution of the
Number of Stars
13. Data Analysis: Check-in Patterns
✤ Differences of noon peak and night peak
✤ Mealtime & Bedtime
Here-now Count Patterns
14. Cultural Background Classification
✤ Category: 7 features
✤ Social Graph: 4 features
✤ Writing Style: 10 features
✤ Visit & Rating: 4 features
Overview of the Classification Model
15. Cultural Background Classification
✤ The writing style-related feature set: Pivotal
✤ The social graph-related feature set: Significant
Rank X2 Feature Category
1 969.876 Pronoun Writing Style
2 650.939 Preps Writing Style
3 366.716 Tentat Writing Style
4 268.432 Certain Writing Style
5 199.665 CC Social Graph
6 99.615 Friends Distribution Social Graph
7 85.471 PageRank Social Graph
8 73.282 Swear Writing Style
9 60.701 Beauty & Spas Business
Category Precision Recall F1-score AUC
Writing Style 0.879 0.878 0.878 0.937
Social Graph 0.741 0.741 0.741 0.823
Business 0.623 0.612 0.617 0.660
Visit &
Rating
0.602 0.603 0.601 0.619
16. Conclusions
✤ Use the behavioral information of massive users to
explore the differences between American and German
users from a cultural perspective
✤ Validate our analysis results with a cultural background
classification model and gain a better understanding of
the importance of various feature sets in forming a
human behavior pattern
17. Future Work
We aim to build an overall online behavior pattern set
of cultural consequences applied to people with fine-
grained cultural backgrounds.
Hinweis der Redaktion
Compared with rather static textual data, situation-aware interactive information like user reviews and comments provides more daily life-related and accessible opinions and thoughts. Therefore, we put our eyes on the online social networks.
Online social networks, e.g. facebook, instgram, pinterest, yelp, Linkedin, foursquare
#
Users -> so large scale of data
Interactive -> real-time thoughts
Start from the cultural perspective, and the OSNs
We searched for the related work, and only found one previous work.
#
However, given that most of the cultural phenomena evolved for many years and developed from generation to generation, the scale of the online survey- based research is still not large enough to form a cultural impact.
Further, data comprehensiveness is of great consequence to cultural analysis.
Apart from the answer to the questions in the survey or text posted by users, the movement pattern and points-of-interest (POIs), which are closely related to the cultural impact on a user, also matter.
#
Short for LBSN
#
LBSNs allow users to undertake location-centric activities in addition to social interactions, offer a viable data source for such cultural studies.
#
Case study
The USA and Germany, which have the largest populations in North America and Western Europe, respectively, are two important culture clusters in the world. They have different languages, traditions, and geographical conditions but also share the same Anglo-Saxon origins.
Therefore, we select these two countries as the object of our case study.
//
Therefore, we select the USA and Germany as the examples to understand online behavior from a cultural perspective and make comparisons between these two representative cultural clusters in North America and Western Europe respectively.
Contributions
#
We give the analysis results of…
#
To verify, we build a model.., f1-score reaches 0.891 & AUC 0.949
Details about the analysis are followed.
Before introducing the analysis results, I would first give a general introduction about the dataset.
#
Yelp, 大众点评in china
On the right hand side, first search -> category/ rating/price/location, check-in -> see the reviews
#
We study the Yelp Open Dataset, which was used in the Yelp Dataset challenge. Updated by Yelp itself every year.
The dataset covers over 4million reviews, 150 thousand businesses, and 110 thousand users. Each review contains text and/or rating attributes.
#
The CC measures the cliquishness of a typical friendship circle.
We can see that the Avg.CC and Avg.Degree of USA is larger than the German’s.
//
We also use Pagerank values to define the top 0.1% influential users in the social graph.
find that 0.31% of the American users belong to P, whereas for German users, that number is 0.18%.
the proportion of influential users is higher in American users than in German users
American students behave in a more affective way than German students
#
this conclusion conforms to our results;
German users mention certainty-related words like “always” and “never” frequently;
German users are less likely to use swear words than their American counterparts when reviewing on Yelp.
Words such as “buddy” and “neighbor”, which belong to the “friends” category, appear more frequently in American users’ reviews than in German users’ reviews.
//
The frequency of writing the reviews with the pronoun “I” by American users is twice than that of German users.
When talking with “We”, the most possible category American users are in is “Nightlife”, whereas it is “Restaurants” for German users. We believe the frequency of the usage of “We” can be positively related to the high prevalence of people going to the particular category of businesses together. Therefore, “Nightlife” and “Restaurants” are favorite category of American and German users when they go out for social gatherings.
#
For the relaxing points of interests, American users are more frequent
#
3-5 in a 5 scale rating
Most of the German users prefer to rate the 3-4 stars, not too good, not too bad
This is the check-in pattern of American and German users in one week
The y-axis here represents the herenow count percentage
American and German users both conduct more check-ins between Monday and Saturday and much less on Sunday.
Further, we also found some interesting results of users’ daily schedule
#
#
We found that the differences in German users’ everyday noon peak (the first peak of a day) and night peak (the second peak of a day) can reach 6%, which is much more explicit than that of American users (0.03%).
Given the exact time of everyday, we find that the lunch and dinner peaks of German users are around 11 a.m and 6 p.m., respectively, whereas those of American users are around 1 p.m. and 10 p.m., respectively.
2100 users
Four feature sets with 25 features to feed into the algorithm, predict its cultural background is the American or German
#
For the feature sets, xxxx
#
The most discriminating feature is “Pronoun”, which represents words like “I” and “You”.
Meanwhile, the features from writing style analysis such as “Preps (preposition)”, “Tentat (tentative)” and “Certain (certainty)” are more important than other features.
The writing style-related feature set plays an important role in distinguishing between American and German users on Yelp.
The cultural causes of user online behavior
Expanding the variety of social platforms
Western & eastern coast differences in USA users