2. [Map-Reduce] Workflow Master splits a job into small chunks (symd model) Assign to slaves with available mapper slots (taking into account of data locality) Mapper collects required data, puts through user defined mapper function Mapper writes intermediate results to local disk, report to Master with location of the results Master record status, pick slaves with available reducer and push over location info for reduce phase (*locality? Yes!) Reducer copies data from mapper via RPC, waits for all mappers to finish, then sorts by intermediate keys, eventually puts through user defined reducer function Reducer writes final output to DFS, report to Master
3. [Map-Reduce] Data flow Raw Map(k1, v1) -> list(k2, v2) Reduce(k2, list(v2)) -> list(v2) *why not v3?
5. [Map-Reduce] To-Dos Splitting: When: upon arrival or upon head-of-queue how is size M determined? (based on chunk size) “can be processed in parallel by different machines” Cost of re-execution Map & reduce
6. [Fair Scheduler] 3-phase allocation Satisfy the pool whose min share >= demand Allocate resources to the other pools up to its min share Residual given to the unfilled, starting with the least fulfilled Notes Resource allocation is pool based instead of job based Pool: min share is user specified
7. [Fair Scheduler] reschedule Policy: wait & kill Algorithm: Wait Tmin. If min share not achieved, kill others Wait Tfair. If fare share not achieved, kill more.
9. [Fair Scheduler] Tradeoffs Batch response time: fairness vs. utilization tradeoff (throughput) Average Response Time Space Usage with Intermediate Data User Isolation: “ability to provide worst-case performance comparable to owning a small private cluster regardless of user workload”
10. [Fair Scheduler] To-Dos<done> Reschedule/Reassignment FairScheduler keeps UPDATE_INTERVAL, check all pools for tasks to preempt and set status of those tasks, and place in action queue. Next heartbeat will pick up the changes in task status and carry out the kills. Relationship between batch response time and throughput: measure the same thing. Relationship between average response time and user isolation: could be correlated, but not all the time. ART is not a quantitative measurement of user isolation
11. [Quincy] Model the problem as a flow network Flow network: a directed graph each of whose Edges e is annotated with a non-negative integer capacity and a cost, and whose Nodes v is annotated with an integer “supply” where total supply of the graph equals to zero To construct simplest graph with only hard constraint being no starvation
13. Readings MapReduce. Jeffery Dean* Google: Cluster Computing and MR Job Scheduling for Multi-User. Matei Zaharia* Max-min fairness. Wikipedia + algo* Quincy. Michael Isard* An update on Google’s infrastructure
14. Topic Before: Existing systems predetermined and fixed allocation of resources/slots to queries/tasks. Intuitively, if resources can be dynamically allocated to tasks, the resources can be better utilized. After: Enable scheduler to make resource aware decisions. (IO, CPU, memory) + bring fair scheduler from pool level to job level.
15. Tips from Prof Tan Keep references of all the literature reviews done and note where it is published