SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Multithreaded Programming in Cilk L ECTURE  1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example applications:
Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
Cilk Is Simple ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fibonacci Cilk is a  faithful   extension of C.  A Cilk program’s  serial elision  is always a legal implementation of Cilk semantics.  Cilk provides  no   new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Cilk code
Basic Cilk Keywords cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Identifies a function as a  Cilk procedure , capable of being spawned in parallel. The named  child  Cilk procedure can execute in parallel with the  parent  caller. Control cannot pass this point until all spawned children have returned.
Dynamic Multithreading cilk   int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync ; return (x+y); } } The  computation dag  unfolds dynamically. Example:   fib(4) “ Processor   oblivious” 4 3 2 2 1 1 1 0 0
Multithreaded Computation ,[object Object],[object Object],[object Object],spawn edge return edge continue edge initial thread final thread
Cactus Stack Cilk supports C’s rule for pointers:   A pointer to stack space can be passed from parent to child, but not from child to parent.  (Cilk also supports  malloc .) Cilk’s  cactus stack  supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
LECTURE 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithmic Complexity Measures T P  =  execution time on  P  processors
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work T 1  =  span * * Also called  critical-path length  or  computational depth .
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work ,[object Object],[object Object],[object Object],* Also called  critical-path length  or  computational depth . T 1  =  span *
Speedup Definition:   T 1 /T P  =  speedup   on  P  processors. If  T 1 /T P =   ( P )  ·   P ,  we have  linear speedup ; =  P , we have  perfect linear speedup ; >  P , we have  superlinear speedup , which is not possible in our model, because of the lower bound  T P   ¸   T 1 / P .
Parallelism ,[object Object],[object Object],[object Object]
Example:  fib(4) Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 3 4 5 6 1 2 7 8 Work:   T 1  = 17
Example:  fib(4) Parallelism:   T 1 / T 1  = 2.125 Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 Work:   T 1  = 17 Using many more than  2  processors makes little sense.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
Parallelizing Vector Addition C C if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { void vadd (real *A, real *B, int n){ vadd (A, B, n/2); vadd (A+n/2, B+n/2, n-n/2); ,[object Object],[object Object],void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Parallelizing Vector Addition if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { C ,[object Object],[object Object],[object Object],void vadd (real *A, real *B, int n){ cil k spawn vadd (A, B, n/2; vadd (A+n/2, B+n/2, n-n/2; spawn Side benefit:   D&C is generally good for caches! sync ; C ilk void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Vector Addition cil k   void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn  vadd (A, B, n/2); spawn  vadd (A+n/2, B+n/2, n-n/2); sync ; } }
Vector Addition Analysis ,[object Object],[object Object],[object Object],[object Object], ( n /lg  n )  ( n )  (lg  n ) BASE
Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk  void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk  void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn  vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
Analysis ,[object Object], (1)  ( n ) … …  ( n ) BASE ,[object Object],[object Object],[object Object],PUNY!
Optimal Choice of BASE ,[object Object],[object Object], ( √ n  ) … ,[object Object], ( n ) BASE … ,[object Object], (BASE +  n /BASE) ,[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scheduling ,[object Object],[object Object],[object Object],P P P Network … Memory I/O $ $ $
Greedy Scheduling ,[object Object],Definition:   A thread is  ready  if all its predecessors have  executed .
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy-Scheduling Theorem Theorem  [Graham ’68 & Brent ’75]. Any greedy scheduler achieves T P      T 1 / P   + T  . ,[object Object],[object Object],[object Object],P  = 3
Optimality of Greedy ,[object Object],Proof .  Let  T P *  be the execution time produced by the optimal scheduler.  Since  T P *  ¸  max{ T 1 / P ,  T 1 }  (lower bounds), we have T P ·   T 1 / P  +  T 1  ·  2 ¢ max{ T 1 / P ,  T 1 } ·  2 T P *  .  ■
Linear Speedup ,[object Object],Proof.   Since  P  ¿   T 1 / T 1  is equivalent to  T 1   ¿   T 1 / P , the Greedy Scheduling Theorem gives us   T P ·   T 1 / P  +  T 1 ¼  T 1 / P  . Thus, the speedup is  T 1 / T P   ¼   P .  ■ Definition.  The quantity  ( T 1 / T 1  )/ P  is called the  parallel slackness .
Cilk Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk Chess Programs ,[object Object],[object Object],[object Object],[object Object]
 Socrates Normalized Speedup T P  =  T 1 / P  +  T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P  =  T  T P  =  T 1 / P T 1 /T P T 1 /T  P T 1 /T 
Developing   Socrates  ,[object Object],[object Object],[object Object],[object Object]
 Socrates Speedup Paradox T P      T 1 / P  +  T  T 32 = 2048/32 + 1   = 65  seconds   = 40  seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65  seconds T  32 = 40  seconds T 1 = 2048  seconds T    = 1  second T  1 = 1024  seconds T     = 8  seconds T 512 = 2048/512 + 1   = 5  seconds T  512 = 1024/512 + 8   = 10  seconds
Lesson Work  and  span  can predict performance on large machines better than running times on small machines can.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque   of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a   work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P      T 1 / P   + O ( T 1 ) on  P  processors. Pseudoproof . A processor is either  working  or  stealing .  The total time all processors spend working is  T 1 .  Each steal has a  1/ P  chance of reducing the span by  1 .  Thus, the expected cost of all steals is  O ( PT 1 ) .  Since there are  P  processors, the expected time is  ( T 1  +  O ( PT 1 ))/ P  =   T 1 / P  +  O ( T 1 )  .  ■
Space Bounds Theorem.   Let  S 1  be the stack space required by a serial execution of a Cilk program.  Then, the space required by a  P -processor execution is at most  S P   ·   PS 1  . Proof  (by induction). The work-stealing algorithm maintains the  busy-leaves property :  every extant procedure frame with no extant descendents has a processor working on it.   ■   P  = 3 P P P S 1
Linguistic Implications ,[object Object],for (i=1; i<1000000000; i++) { spawn  foo(i); } sync; M ORAL Better to steal parents than children!
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Key Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic ProgrammingNeeldhara Misra
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsVivek Bhargav
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Dr. Pankaj Agarwal
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesHaitham El-Ghareeb
 
02 order of growth
02 order of growth02 order of growth
02 order of growthHira Gul
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm AnalyzingHaluan Irsad
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithmsDr. Rupa Ch
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to AlgorithmsVenkatesh Iyer
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsAbdullah Al-hazmy
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 

Was ist angesagt? (19)

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study Notes
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm Analyzing
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithms
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Analysis of Algorithm
Analysis of AlgorithmAnalysis of Algorithm
Analysis of Algorithm
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Big o
Big oBig o
Big o
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis tools
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Algorithms Question bank
Algorithms Question bankAlgorithms Question bank
Algorithms Question bank
 

Ähnlich wie Lecture 1

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematicalbabuk110
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.pptebinazer1
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdfomchoubey297
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowQuoc-Sang Phan
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.pptSoumyaJ3
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptxKarthikVijay59
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithmDevaKumari Vijay
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxdudelover
 

Ähnlich wie Lecture 1 (20)

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematical
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
parallel
parallelparallel
parallel
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf
 
Matlab Nn Intro
Matlab Nn IntroMatlab Nn Intro
Matlab Nn Intro
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Cs2251 daa
Cs2251 daaCs2251 daa
Cs2251 daa
 
AA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptxAA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptx
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information Flow
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.ppt
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithm
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptx
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Lecture 1

  • 1. Multithreaded Programming in Cilk L ECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
  • 2.
  • 3. Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
  • 4.
  • 5.
  • 6.
  • 7. Fibonacci Cilk is a faithful extension of C. A Cilk program’s serial elision is always a legal implementation of Cilk semantics. Cilk provides no new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Cilk code
  • 8. Basic Cilk Keywords cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Identifies a function as a Cilk procedure , capable of being spawned in parallel. The named child Cilk procedure can execute in parallel with the parent caller. Control cannot pass this point until all spawned children have returned.
  • 9. Dynamic Multithreading cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync ; return (x+y); } } The computation dag unfolds dynamically. Example: fib(4) “ Processor oblivious” 4 3 2 2 1 1 1 0 0
  • 10.
  • 11. Cactus Stack Cilk supports C’s rule for pointers: A pointer to stack space can be passed from parent to child, but not from child to parent. (Cilk also supports malloc .) Cilk’s cactus stack supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
  • 12.
  • 13. Algorithmic Complexity Measures T P = execution time on P processors
  • 14. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work
  • 15. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work T 1 = span * * Also called critical-path length or computational depth .
  • 16.
  • 17. Speedup Definition: T 1 /T P = speedup on P processors. If T 1 /T P =  ( P ) · P , we have linear speedup ; = P , we have perfect linear speedup ; > P , we have superlinear speedup , which is not possible in our model, because of the lower bound T P ¸ T 1 / P .
  • 18.
  • 19. Example: fib(4) Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 3 4 5 6 1 2 7 8 Work: T 1 = 17
  • 20. Example: fib(4) Parallelism: T 1 / T 1 = 2.125 Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 Work: T 1 = 17 Using many more than 2 processors makes little sense.
  • 21.
  • 22. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
  • 23.
  • 24.
  • 25. Vector Addition cil k void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn vadd (A, B, n/2); spawn vadd (A+n/2, B+n/2, n-n/2); sync ; } }
  • 26.
  • 27. Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.  Socrates Normalized Speedup T P = T 1 / P + T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P = T  T P = T 1 / P T 1 /T P T 1 /T  P T 1 /T 
  • 42.
  • 43.  Socrates Speedup Paradox T P  T 1 / P + T  T 32 = 2048/32 + 1 = 65 seconds = 40 seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65 seconds T  32 = 40 seconds T 1 = 2048 seconds T  = 1 second T  1 = 1024 seconds T   = 8 seconds T 512 = 2048/512 + 1 = 5 seconds T  512 = 1024/512 + 8 = 10 seconds
  • 44. Lesson Work and span can predict performance on large machines better than running times on small machines can.
  • 45.
  • 46. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
  • 47. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
  • 48. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 49. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 50. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 51. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 52. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 53. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 54. Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P  T 1 / P + O ( T 1 ) on P processors. Pseudoproof . A processor is either working or stealing . The total time all processors spend working is T 1 . Each steal has a 1/ P chance of reducing the span by 1 . Thus, the expected cost of all steals is O ( PT 1 ) . Since there are P processors, the expected time is ( T 1 + O ( PT 1 ))/ P = T 1 / P + O ( T 1 ) . ■
  • 55. Space Bounds Theorem. Let S 1 be the stack space required by a serial execution of a Cilk program. Then, the space required by a P -processor execution is at most S P · PS 1 . Proof (by induction). The work-stealing algorithm maintains the busy-leaves property : every extant procedure frame with no extant descendents has a processor working on it. ■ P = 3 P P P S 1
  • 56.
  • 57.
  • 58.
  • 59.