How to do quick user assign in kanban in Odoo 17 ERP
Computational Social Science, Lecture 03: Counting at Scale, Part I
1. Counting @ Scale
Sharad Goel
Columbia University
Computational Social Science: Lecture 3
February 8, 2013
2. Descriptive statistics
(as opposed to inferential statistics)
is about counting
contingency tables
means, variances, quantiles
summaries of conditional distributions
3. Long tail
video consumption on YouTube
Digital divide
time spent across various online properties
Viral diffusion
propagation of tweets on Twitter
5. I/O bound
difficult to read terabytes of data
Network bound
hard to transfer terabytes of data
Memory bound
cannot randomly access data points
CPU bound
even simple manipulations add up
6. Rank videos by popularity
local video store
1K movies, 100K viewings
7. Rank videos by popularity
local video store
1K movies, 100K viewings
Load dataset into memory
8. Rank videos by popularity
Netflix
100K movies, 1B viewings
9. Rank videos by popularity
Netflix
100K movies, 1B viewings
store counter for each movie in memory
and stream through the dataset
10. Rank videos by popularity
YouTube
10B videos, 10T viewings
11. Rank videos by popularity
YouTube
10B videos, 10T viewings
Trouble, with a capital ‘T’
13. 10 parallel workers
1T views per worker
maybe 5B unique videos on each
100 parallel workers
100B views per worker
maybe 1B videos on each
14. split
count
sort by video
merge
sort by popularity
15. Core problem
the same movie appears on multiple machines
Solution
do not split viewing data at random
ensure individual movies are never split apart
16. split
count
sort by video
merge
sort by popularity
17. Shuffle (1st attempt)
create a new file for every movie
append viewing data to the appropriate file
18. Shuffle (2nd attempt)
First time you see a movie,
append it randomly to one of 10K files
Next time you see the movie, append it to same file
19. Shuffle (3rd attempt)
Hash the movie ID to determine which
file to append it to
( Hash function
maps large input space to small output space
approximately uniformly
)
21. Map
assign each input line to one or more groups
Shuffle
aggregate groups
Reduce
operate on grouped data
22. Map
assign each input line to one or more groups
v [(k1, v1), …, (km, vm)]
Shuffle
aggregate groups
Reduce
operate on grouped data
(k, [v1, …, vn]) [w1, …, wp]
23. The Insight of MapReduce
One can efficiently group identical items
Many tasks are computationally easier on grouped data
24. Word Count
Input
text corpus
Output
number of occurrences of each word
25. Word Count
Map
line words
Reduce
word group size of group