SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
The theory of
  concurrent programming
for a seasoned programmer
© Roman Elizarov, Devexperts, 2012
What? For whom?

• The practical experience in writing concurrent programs is
  assumed
   - Here, concurrent == using shared memory
   - Assuming audience knows and used in practice locks, synchronized
     sections, compare and set, etc
   - Knowledge of “Java Concurrency in Practice” is a plus!
• The theory behind the practical constructs will be explained
   - Formal models
   - Key definitions
   - Important facts and theorems (without proofs)
   - Practical corollaries
• But some concepts are simplified
Just a reminder: the free lunch is over




                                    http://www.gotw.ca/publications/concurrency-ddj.htm
Basic definitions

• Process owns memory and other resources in OS
• Thread of execution defines current instruction pointer, stack
  pointer and other registers
   - Threads execute program code
   - Multiple threads per process are sharing the same memory
• However, both terms are often used interchangeably in theory
   - “Process” seems to be used more often due to historical reasons
   - And they are typically named P, Q, R, … etc in papers
Why model?


• Formal models of computation let you
  define and prove certain desired
  properties of you programs
• The models let you prove impossibility
  of achieving certain results under specific
  constraints
   - Saving your time trying to find a working
     solution
The model with shared objects

                 [Shared] Memory
    Thread 1

                              [Shared] Object 1

    Thread 2
                              [Shared] Object 2




                              [Shared] Object M
    Thread N
Concurrency




              http://www.nassaulibrary.org/ncla/nclacler_files/LILC7.JPG
Shared objects

• Threads (or processes) perform operations on shared memory
  objects
• This model doesn’t care about operations that are internal to
  threads:
   - Computations performed by threads
   - Updates to threads’ CPU registers
   - Updates to threads’ stacks
   - Updates to any “thread local” memory regions
• Only inter-thread communication matters
• The only type of inter-thread communication in this model is via
  shared objects
[Shared] Registers

• Don’t confuse with CPU registers (eax, ebx, etc in x86)
   - They are just part of “thread state” in concurrent programming theory
• In concurrent programming [shared] register is the simplest kind of
  shared object:
   - It has some value type (typically boolean or integer)
   - With read and write operations
• Registers are basic building blocks for many practical concurrent
  algorithms
• The model of threads + shared registers is a decent abstraction for
  modern multicore hardware systems
   - It abstracts away enough actual complexity to make theoretical
     reasoning possible
Message passing models

• We can model parallel computing by letting threads send
  messages to each other, instead of giving them shared registers
  (or other shared objects)
   - It is closer to how the hardware memory bus actually works on a low
     level (CPUs send messages to memory via interconnects)
   - But it is farther from how the programs actually work with
• Message passing is typically used to model distributed programs
• Both models are theoretically equivalent in their power
   - But the practical performance of various algorithms will be different
   - We work with shared objects model where performance matters
     (taking care to optimize the number of shared objects and the number
     of operations on them is close to the real practical optimization)
Parallel




       Concurrent                       Distributed
    [shared memory]                  [message passing]


* NOTE: There is no general consensus on this terminology
Properties of concurrent programs

• Serial programs are usually deterministic
   - Unless explicit calls to random number generator are present
   - Their properties are established by analyzing their state, invariants,
     pre- and post- conditions
• Concurrent programs are inherently nondeterministic
   - Even when the code for each thread is fully deterministic
   - Outcome depends on the actual execution history – what operations
     on shared objects where performed by threads in what order
   - When you say “program A has property P” it actually means “program
     A has property P in any execution”
Modeling executions
                            • S is a global state, which includes:
                               - State of all threads
             S
                               - State of all shared objects or all “in
                                 flight” messages (in distributed system)
         f       g
                            • f and g are operations on shared
                              objects
  f(S)               g(S)      - for registers it can be either
                                 ri.read(value) or ri.write(value)
                               - There are as many possible operations
                                 in each state as there are active threads
                                   • not as simple for distributed case
                            • f(S) is a new state after operation f was
                              performed in state S
Example                                      P0,Q0
                                              x=0
shared int x                                  (-, -)
thread P:                                                                                  thread Q:
 0: x = 1                                                                                   0: x = 2
 1: print x           P1,Q0                                             P0,Q1               1: print x
 2: stop               x=1           A total of 17 states                x=2                2: stop
                       (-, -)                                            (-, -)



       P2,Q0                       P1,Q1                      P1,Q1                      P0,Q2
         x=1                        x=2                        x=1                         x=2
        (1, -)                      (-, -)                     (-, -)                     (-, 2)

              +1 state not shown        +2 states not shown        +2 states not shown      +1 state not shown


       P2,Q2                       P2,Q2                      P2,Q2                      P2,Q2
        x=2                         x=2                        x=1                        x=1
       (1, 2)                      (2, 2)                     (1, 1)                     (2, 1)
Discussion of the execution model with states

• This model is not truly “parallel”
   - All operations happen serially (albeit in undefined order)
• In reality (on a modern CPU)
   - A read or write operation is not instantaneous. It takes time
   - There are multiple memory banks that work in parallel. You have
     multiple read or write operation happening at the same time.
• However, you can safely use this model for atomic registers
   - Atomic (linearizable) registers work as if each write or read is
     instantaneous and as if there is no parallelism
   - Will define what this means precisely later
• A more general model of execution is needed to analyze a wider
  class of primitives
Lamport’s happens before (occurs before) model

• An execution history is a pair (H, →H)
   - “H” is a set of operations e, f, g, … that happened during execution
   - “→H” is a transitive, irreflexive, antisymmetric relation on a set of
     operations H (strict partial order relation)
   - “e → H f” means “e happens before f [in H]” or “occurs before”
       • H is ommited where it is not ambiguous
• In global time model of execution, each operation e has
   - s(e) and f(e) – times where it has started and finished
                         e  f  f (e) s( f )
   - Albeit convenient to visualize, in reality there is no global time (no
     central clock) in a modern system (so formal proofs cannot use time)
Legal executions

• Execution is legal, if it satisfies specifications of all objects


                         x.w(1)
          P
                                        x.r(1)               LEGAL
          Q



                         x.w(1)
          P
                                        x.r(2)               ILLEGAL
          Q
Serial executions

• Execution is serial, if “happens before” is a total order


                         x.w(1)
         P
                                          x.r(1)                 SERIAL
         Q



                         x.w(1)
         P
                              x.r(1)                             NON-SERIAL
         Q

               e and f are called parallel when    e f  f e
                                                           
Linearizable executions

• Execution is linearizable, if its history (“happens before” relation)
  can be extended to a legal and serial (total) history

                        x.w(1)
         P
                            x.r(1)                   LINEARIZABLE
         Q



                        x.w(1)
         P
                            x.r(2)                   NON-LINEARIZABLE
         Q
Linearizable (atomic) objects

• Object is called linearizable (atomic) if all execution histories with
  respect to this object are linearizable
• Lineriazability is composable. A system execution on linearizable
  objects is linearizable.
• In global time model, each operation in linearizable execution has
  a linearization point T(e)
       e : s(e)  T (e)  f (e)
       e, f : e  f  T (e)  T ( f )  e  f  T (e)  T ( f )

                        x.w(1)
         P
                            x.r(1)
         Q
Atomic registers and other objects

• Atomic register == linearizable register
   - They work as if read/write operations happen instantaneously at
     linearization point and in some specific serial order
   - Thus we can use “global state” model of execution to analyze
     behavior of a program whose threads are working with shared atomic
     registers (or with other atomic objects)
• volatile fields in Java work like atomic registers
   - AtomicXXX classes are atomic registers, too (with additional ops)
• Thread-safe classes (synchronized, ConcurrentXXX) are atomic
  (linearizable) unless explicitly specified otherwise
   - “thread-safe” in practice means “linearizable”, e.g. designed to work
     as if all operations happen in some serial order without an outside
     synchronization even if accessed concurrently
http://www.flickr.com/photos/xserve/368758286/
Mutual exclusion (lock)

The mutex protocol        • The main desired property of protocol is
                            mutual exclusion. Two executions of
thread Pid:                 critical section cannot be parallel:
  loop forever:
     nonCriticalSection     i, j : i  j  CSi  CS j  CS j  CSi
     mutex.lock
     criticalSection
     mutex.unlock         • It is also known as correctness
                            requirement for mutual exclusion
                            protocol
Mutex attempt #1

 threadlocal int   id // 0 or 1   • This protocol does guarantee
                                    mutual exclusion
 shared boolean want[2]
                                  • But there is no guarantee of
 def lock:                          progress. It can get into live-lock
   want[id] = true                  (both threads spinning forever in
   while want[1 - id]: pass
                                    lock)
 def unlock:                      • So, the other desired property is
   want[id] = false
                                    progress: critical section should
                                    get entered infinitely often
Mutex attempt #2

 threadlocal int   id // 0 or 1   • This protocol does guarantee
                                    mutual exclusion and progress
 shared int        victim
                                  • But critical section can be entered
 def lock:                          in a turn-by-turn fashion only. One
   victim = id                      thread working in isolation will
   while victim == id: pass
                                    starve.
 def unlock:                      • So, the stronger progress is
   pass
                                    desired. Freedom from
                                    starvation: if one (or more)
                                    threads wants to enter critical
                                    section, then it’ll enter CS in a
                                    finite number of steps
Peterson’s mutual exclusion algorithm

 threadlocal int   id // 0 or 1   • This protocol does guarantee
                                    mutual exclusion, progress and
 shared boolean want[2]
 shared int     victim              freedom from starvation
                                  • The order of operations in this
 def lock:                          pseudo-code is important
   want[id] = true
   victim = id                    • Not the first one invented (1981),
   while want[1-id] and             but the simplest 2-thread one
         victim == id:
      pass                        • Hard to generalize to N threads
                                    (can be, but the result is complex)
 def unlock:
   want[id] = false
Lamport’s [bakery] mutual exclusion algorithm

 threadlocal int   id // 0 to N-1             • This protocol does guarantee
                                                mutual exclusion, progress
 shared boolean want[N]                         and freedom from starvation
 shared int     label[N]                        for N threads
 def lock:                                    • This protocol has an additional
   want[id] = true                 doorway      first-come, first-served
   label[id] = max(label) + 1                   (FCFS) property. First thread
   while exists k: k != i and
          want[k] and
                                                finishing doorway gets lock
          (label[k], k) < (label[id], id) :     first
      pass                                    • But relies on infinite labels.
 def unlock:
                                                They can be replaced with
   want[id] = false                             “concurrent bounded
                                                timestamps”
Pros and cons of locks

• With mutual exclusion any serial object can be turned into a
  linearizable shared object.
   - Just protect all operations as critical sections with a mutex
   - Using two phase locking (2PL) you can build complex linearizable
     objects out of smaller building blocks
   - Nothing more but shared registers are enough to build a mutex
   - Profit!
• But
   - By using multiple locks you can get into a deadlock
   - Locks lead to priority inversion
   - Locks limit concurrency of code by ensuring that critical sections are
     executed strictly serially with respect to each other
Amdahl’s Law for parallelization

• The maximal speedup of code with N threads when S portion of it
  is serial
                               1
                  speedup 
                               1 S
                            S
                                 N
                                    1
                      lim speedup 
                      N           S

• Even when just 5% of code is serial (S=0.05), the maximal
  possible speedup of the code is 20.
Non-blocking algorithms (objects)

• What happens if OS scheduler pauses a thread that is working
  inside a critical section (is holding a lock)?
   - No other operation on the corresponding object can proceed
• Lock-free: An object or operation (method) is lock-free if one of
  the active (non-paused) threads can complete an operation in the
  finite number of steps.
   - Some threads may starve, but only when some other threads
     complete their operations
• Wait-free: An object or operation (method) is wait-free if any of the
  active (non-paused) threads can complete an operation in the finite
  number of steps
   - No starvation is allowed
Non-atomic registers
• Physical register (SRAM) is not atomic
    - However, it is wait-free, but…
    - It stores only boolean (bit) values
    - It can have only a single reader (SR)
      and single writer (SW)
    - Trying to read and write at the same
      time leads to unpredictable results
    - But it is a safe register
         • When reading after write completes,
            the most recent written value is
            returned
Through a chain of software constructions on top of safe boolean SRSW
registers it is possible to build wait-free atomic multi valued multi reader
(MR) multi writer (MW) register
Atomic shapshot

• Just read values of N registers in a loop and return
     - is not an atomic snapshot (“read N registers atomically”) operation
                                                        System states
                      r1.w(1)       r2.w(2)             r1    r2
 P
            r1.r(0)                           r2.r(2)   0     0
 Q                              ?                       1     0
                    Q tries to take snapshot:           1     2
               this execution cannot be linearized
                                                        Read state
                                                        r1    r2
                                                        0     2
Lock-free atomic snapshot

• Add version to each register
   - On write atomically write a pair (new_version, new_value) to a register
     where new_version = old_version + 1
• To take an atomic shapshot
   - Read in a loop all versions and values
   - Reread them to check if versions are still them same
      • If still same -> snapshot was atomic, return it
      • If changed -> shapshot was not atomic, repeat
• Can loop trying to take snapshot forever (starvation), thus it is not
  a wait-free algorithm
• But it is lock-free. The system as a whole has progress. A loop in
  snapshot means writes are being completed
Wait-free atomic snapshot

• Yes, it is possible to make it wait-free, so that every operation
  (including snapshot) is guaranteed to complete in a finite number
  of steps under all circumstances
   - Threads will have to cooperate
   - Each updating thread will have to take a snapshot and store it in its
     own per-thread register to help complete concurrent snapshots
      • O(N2) storage requirement, O(N) time for each operation
• Not practical
   - This is true about all wait-free algorithms
   - There are no practical wait-free algorithms
      • But certain individual non-modifying operations in some algorithms
        can be implemented wait-free
Wait-free synchronization and consensus

The consensus protocol     • What other wait-free objects can we
                             build using atomic wait-free registers
threadlocal int proposal     as our primitive?
                              - The question was definitely answered
thread Pid:
  print consensus
                                by M. Herlihy in 1991
  stop                        - He considered wait-free
                                implementations of consensus
                                protocol
                           • In a consensus protocol all threads
                             have to reach agreement on a value.
                              - It has to be non-trivial
                              - The protocol must be wait-free
Consensus number

• Consensus number of a shared
  object or class of objects is the
  largest number N, such that a
  [wait-free] consensus protocol for     Lock-based (blocking)
  N threads can be implemented           consensus protocol
  using these objects as primitive       threadlocal int   proposal // != 0
  building blocks.
                                         shared int        value
• Consensus number of atomic
  registers is 1 (one, uno, один)        def consensus:
   - Even two threaded [wait-free]         lock
     consensus protocol cannot be          if value == 0:
     reached using any number of               value = proposal
                                           unlock
     atomic registers
                                           return value
   - However, it’s trivial with locks!
Read-Modify-Write (RMW) registers

• It’s a register that is augmented
  with additional RMW operation(s)
   - Each RMW operation has a kernel
     function F and is typically named
     “getAndF”                         RMW register
• Common2 class of RMW kernels        shared int value
   - F1(F2(x)) == F1(x) or
                                      def getAndF:
   - F1(F2(x)) == F2(F1(x))             old = value    // read
• Common2 examples:                     value = F(old) // modify, write
                                        return old
   - F(x)=a     // set to const
   - F(x)=x+a // add const
  Non-trivial Common2 RMW registers have consensus number 2
Consensus hierarchy

  Objects and operations                                    Consensus
                                                            number

  Atomic Register with get (read), set (write) operations   1
  Atomic snapshot of N registers

  Common2 Read-Modify-Write Registers:                  2
  getAndAdd (atomic inc/dec), getAndSet (atomic swap),
  queue and stack (with enqueue/dequeue, push/pop only)
  Atomic assignment of any N registers                      2n-2


  Universal operations:                                     ∞
  compareAndSet/compareAndSwap (CAS), queue with
  peek operation, memory-to-memory swap
Universality of consensus

• Any object can be turned into a concurrent wait-free linearizable
  object for N threads if we have a consensus protocol for N threads
  using universal construction
   - Corollary: consensus hierarchy is strict.
   - However, universal construction is not really efficient for real-life
• Lock-free universal construction via CAS is easy and practical
  shared register<MyObject> value

  def concurrentOperationX:                            MyObject is a pointer
    loop:                                              if it’s state does not fit
       oldval = value.get                              into CAS-able
       newval = oldval.deepCopy                        machine word
       newval.serialOperationX
   until value.CAS(oldval, newval) is successful
Implementing lock-free algorithms

• Let’s try to implement CAS-based universal construction in C:
 typedef struct object { /* my object’s state is here */ } object_t;

 void serial_operation_X(object_t *ptr); // updates state pointed to by ptr

 void concurrent_operation_X(object_t **ptr) {
   object_t *oldval, *newval = malloc(sizeof(object_t));
   do {
      oldval = *ptr;
      memcpy(newval, oldval, sizeof(object_t));
      serial_operation_X(newval);
   } while (! __sync_bool_compare_and_swap(ptr, oldval, newval));
   free(oldval);
 }

   Problem: it can copy trash, that was freed, and serial_operation_X will crash
Implementing lock-free algorithms (attempt #2)

• Let’s try to implement CAS-based universal construction in C:
 typedef struct object { /* my object’s state is here */ } object_t;

 void serial_operation_X(object_t *ptr); // updates state pointed to by ptr

 void concurrent_operation_X(object_t **ptr) {
   object_t *oldval, *newval = malloc(sizeof(object_t));
   do {
      oldval = *ptr;
      memcpy(newval, oldval, sizeof(object_t)); // assume no segfault here
      __sync_synchronize();             // make sure we see changes of *ptr
      if (oldval != *ptr) continue;
      serial_operation_X(newval);
   } while (! __sync_bool_compare_and_swap(ptr, oldval, newval));
   free(oldval);
 }
Still doesn’t work: ABA problem

 A, B and C are memory locations

                              start with *ptr == A
        Thread P:                                           Thread Q:

 1: oldval is A                                  1: oldval == A
 2: (newval = malloc()) is B                     2: (newval = malloc()) == C
 3: CAS(ptr, A, B) is successful
 4: free(A)
   // makes operation_X again                        // sleeps/slow all that time
 5: oldval is B
 6: (newval = malloc()) is A
 7: CAS(ptr, B, A) is successful
 8: free(B)                                      3: CAS(ptr, A, C) is successful


      *ptr is going A, B, A
Solving ABA problem

• Attach version to a pointer and increment it on every operation
    - Need to CAS two words at the same
    - That’s why CPUs have ops like CMPXCHG8B (for 32bit mode) and
      CMPXCHG16B (for 64bit mode)
• Rely on garbage collector (GC) for memory management
    - In GC runtime environment the ABA problem simply does not exist
    - Makes your non-blocking concurrent programming much easier!
• Use other schemes that rely on coordination between threads
  (hazard pointers)
• Use special hardware support (LL/SC or hardware memory
  transactions)
• Still, universal construction is efficient only if object state is small
Tree-like persistent data structures
           oldval                            newval


           Root             Update B          Root’




   NodeA            NodeB                             NodeB’




             NodeC          NodeD



    Reallocate and update only path from updated node to the root
http://liveearth.org/en/liveearthblog/run-for-water?page=7
Lock-free stacks
• Use universal construction on linked-list representation of the stack
  (it’s a trivial tree-like structure!)
   - root is pointing to the top of stack
   - push and pop have trivial implementation with minimal overhead
• With a lot of cores, root becomes bottleneck. Use elimation-backoff
   - Threads trying to push and pop at the same time meet elsewhere
• But linked data structures are slow on modern machines
   - No memory locality
   - Next memory address is not known before reading previous node –
     code must pay memory latency penalty on each access
   - Array-based single-threaded stack is many times faster than linked
     one
• Alas, no practical & efficient array-based lock-free algos are known
Lock-free queues
• Michael & Scott algo for lock-free unbounded linked queue
   - Great implementation in java.util.concurrent.ConcurrentLinkedQueue
• Array-based bounded cyclic queues cannot be practically &
  efficiently make lock-free
   - But limiting to a single producer and single consumer helps (in case of
     a bounded array-based queue)
   - Don’t not even need CAS for SPSC queue
   - Use N of them for MP or MC
   - Can do MP and MC queue (and even deque) if you additionally keep
     a version of every slot in the array
      • but this is not really practical
   - Or reallocate memory when array is filled (unrolled linked list)
      • a really practical alternative if needed
More practical notes

• Strict FIFO queue will always get contended
   - Multiple producers will contend for tail
   - Multiple consumers will contend for head
   - Does not scale to a lot of cores
• In practice, strict FIFO queue is rarely needed
   - Usually, it does not really matter if first in is really first out
      • but it needs to be eventually out
   - See java.util.concurrent.ForkJoinPool for one alternative
• Lock-free algorihthms can be faster (and scale better) that their
  lock-based counterparts, but always slower than serial algos

        Avoid unnecessary synchronization between threads
Data structures for search

• Ordered
   - Balanced trees are hard to make lock-free (not practical)
   - But Bill Pugh’s skip lists are practical in lock-free case
      • Because they are based on order linked sets
          • which support lock-free implementation
      • See java.util.concurrent.ConcurrentSkipList for implementation
• Unordered
   - Fixed-size hash-tables are trivial in concurrent case
   - Resizable hash-table can be implemented lock-free, too
      • As either ordered linked set with lookup hash-table
        (recursive split-ordering)
      • Or fully based on arrays
        (Cliff Click’s high-scale hash-table)
Hardware transactional memory (HTM)

• Is scheduled to debut in Intel Haswell processors
   - Allows to begin transaction, perform it inside processor cache, then
     commit to main memory its effects or abort
   - Enhances existing cache infrastructure
   - While tracking interference between threads on top of existing cache-
     coherence protocols
• It makes more efficient lock-free algorithms practical
   - Like LIFO stacks and FIFO queues with any number of participants
   - Like concurrent hash tables without pain
   - Hardware just automatically detects conflicts without a code overhead
     to manage them and rolls back allowing code to start transaction
     again (just like you’d do in CAS universal construction)
Software Transactional Memory (STM)

• Is a simplified programming model
   - Similar to locks, but use atomic section instead of synchronized
   - Same problems as locks, but
      • Without worry to take the right lock
      • Without worry about deadlocks
      • Conflicting transaction is transparently restarted by transaction
        manager
• It has poor performance, but makes life easier
   - when there are few limited places, where threads have to coordinate
     though shared objects
   - It is inefficient if there are a lot of shared objects and/or they are
     accessed very often
There’s much more to it.

     It is an active
    area of research
Further reading
Thank you for your attention!




           Slides will be posted to elizarov.livejournal.com

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesTakeshi Yamamuro
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITiansAshish Bansal
 
Knowledge engg using & in fol
Knowledge engg using & in folKnowledge engg using & in fol
Knowledge engg using & in folchandsek666
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierRaj Sikarwar
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsAntonio García-Domínguez
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Serieshanshang
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and IterationsSameer Wadkar
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisDSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisAmr E. Mohamed
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2Serhii Havrylov
 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp timeDanbi Cho
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in MatlabAbdul Sami
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 

Was ist angesagt? (20)

Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
JUnit PowerUp
JUnit PowerUpJUnit PowerUp
JUnit PowerUp
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITians
 
Knowledge engg using & in fol
Knowledge engg using & in folKnowledge engg using & in fol
Knowledge engg using & in fol
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patterns
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Series
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisDSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in Matlab
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 

Andere mochten auch

Effective websites development
Effective websites developmentEffective websites development
Effective websites developmentDevexperts
 
Codefreeze rus
Codefreeze rusCodefreeze rus
Codefreeze rusDevexperts
 
Drd secr final1_3
Drd secr final1_3Drd secr final1_3
Drd secr final1_3Devexperts
 
20130420 bitbyte
20130420 bitbyte20130420 bitbyte
20130420 bitbyteDevexperts
 
How to improve java performance
How to improve java performanceHow to improve java performance
How to improve java performanceDevexperts
 
Browsers. Magic is inside.
Browsers. Magic is inside.Browsers. Magic is inside.
Browsers. Magic is inside.Devexperts
 
Windows, doors and secret passages: approaches to the space organization in t...
Windows, doors and secret passages: approaches to the space organization in t...Windows, doors and secret passages: approaches to the space organization in t...
Windows, doors and secret passages: approaches to the space organization in t...Devexperts
 
Dynamic data race detection in concurrent Java programs
Dynamic data race detection in concurrent Java programsDynamic data race detection in concurrent Java programs
Dynamic data race detection in concurrent Java programsDevexperts
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File SystemNtu
 
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewRoman Elizarov
 

Andere mochten auch (10)

Effective websites development
Effective websites developmentEffective websites development
Effective websites development
 
Codefreeze rus
Codefreeze rusCodefreeze rus
Codefreeze rus
 
Drd secr final1_3
Drd secr final1_3Drd secr final1_3
Drd secr final1_3
 
20130420 bitbyte
20130420 bitbyte20130420 bitbyte
20130420 bitbyte
 
How to improve java performance
How to improve java performanceHow to improve java performance
How to improve java performance
 
Browsers. Magic is inside.
Browsers. Magic is inside.Browsers. Magic is inside.
Browsers. Magic is inside.
 
Windows, doors and secret passages: approaches to the space organization in t...
Windows, doors and secret passages: approaches to the space organization in t...Windows, doors and secret passages: approaches to the space organization in t...
Windows, doors and secret passages: approaches to the space organization in t...
 
Dynamic data race detection in concurrent Java programs
Dynamic data race detection in concurrent Java programsDynamic data race detection in concurrent Java programs
Dynamic data race detection in concurrent Java programs
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
 

Ähnlich wie Codefreeze eng

Presentation of GetTogether on Functional Programming
Presentation of GetTogether on Functional ProgrammingPresentation of GetTogether on Functional Programming
Presentation of GetTogether on Functional ProgrammingFilip De Sutter
 
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...Sergey Staroletov
 
20170714 concurrency in julia
20170714 concurrency in julia20170714 concurrency in julia
20170714 concurrency in julia岳華 杜
 
Ch-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxCh-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxKabindra Koirala
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?greenwop
 
A Proposition for Business Process Modeling
A Proposition for Business Process ModelingA Proposition for Business Process Modeling
A Proposition for Business Process ModelingAng Chen
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 
An Intuitive Approach to Fourier Optics
An Intuitive Approach to Fourier OpticsAn Intuitive Approach to Fourier Optics
An Intuitive Approach to Fourier Opticsjose0055
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceAngelo Corsaro
 
3 concurrencycontrolone
3 concurrencycontrolone3 concurrencycontrolone
3 concurrencycontroloneKamal Shrish
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
 

Ähnlich wie Codefreeze eng (20)

Presentation of GetTogether on Functional Programming
Presentation of GetTogether on Functional ProgrammingPresentation of GetTogether on Functional Programming
Presentation of GetTogether on Functional Programming
 
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
 
20170714 concurrency in julia
20170714 concurrency in julia20170714 concurrency in julia
20170714 concurrency in julia
 
Computational models
Computational models Computational models
Computational models
 
Ch-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxCh-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptx
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?
 
A Proposition for Business Process Modeling
A Proposition for Business Process ModelingA Proposition for Business Process Modeling
A Proposition for Business Process Modeling
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Concur15slides
Concur15slidesConcur15slides
Concur15slides
 
Clojure intro
Clojure introClojure intro
Clojure intro
 
An Intuitive Approach to Fourier Optics
An Intuitive Approach to Fourier OpticsAn Intuitive Approach to Fourier Optics
An Intuitive Approach to Fourier Optics
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
3 concurrencycontrolone
3 concurrencycontrolone3 concurrencycontrolone
3 concurrencycontrolone
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
 

Kürzlich hochgeladen

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Codefreeze eng

  • 1. The theory of concurrent programming for a seasoned programmer © Roman Elizarov, Devexperts, 2012
  • 2. What? For whom? • The practical experience in writing concurrent programs is assumed - Here, concurrent == using shared memory - Assuming audience knows and used in practice locks, synchronized sections, compare and set, etc - Knowledge of “Java Concurrency in Practice” is a plus! • The theory behind the practical constructs will be explained - Formal models - Key definitions - Important facts and theorems (without proofs) - Practical corollaries • But some concepts are simplified
  • 3. Just a reminder: the free lunch is over http://www.gotw.ca/publications/concurrency-ddj.htm
  • 4. Basic definitions • Process owns memory and other resources in OS • Thread of execution defines current instruction pointer, stack pointer and other registers - Threads execute program code - Multiple threads per process are sharing the same memory • However, both terms are often used interchangeably in theory - “Process” seems to be used more often due to historical reasons - And they are typically named P, Q, R, … etc in papers
  • 5. Why model? • Formal models of computation let you define and prove certain desired properties of you programs • The models let you prove impossibility of achieving certain results under specific constraints - Saving your time trying to find a working solution
  • 6. The model with shared objects [Shared] Memory Thread 1 [Shared] Object 1 Thread 2 [Shared] Object 2 [Shared] Object M Thread N
  • 7. Concurrency http://www.nassaulibrary.org/ncla/nclacler_files/LILC7.JPG
  • 8. Shared objects • Threads (or processes) perform operations on shared memory objects • This model doesn’t care about operations that are internal to threads: - Computations performed by threads - Updates to threads’ CPU registers - Updates to threads’ stacks - Updates to any “thread local” memory regions • Only inter-thread communication matters • The only type of inter-thread communication in this model is via shared objects
  • 9. [Shared] Registers • Don’t confuse with CPU registers (eax, ebx, etc in x86) - They are just part of “thread state” in concurrent programming theory • In concurrent programming [shared] register is the simplest kind of shared object: - It has some value type (typically boolean or integer) - With read and write operations • Registers are basic building blocks for many practical concurrent algorithms • The model of threads + shared registers is a decent abstraction for modern multicore hardware systems - It abstracts away enough actual complexity to make theoretical reasoning possible
  • 10. Message passing models • We can model parallel computing by letting threads send messages to each other, instead of giving them shared registers (or other shared objects) - It is closer to how the hardware memory bus actually works on a low level (CPUs send messages to memory via interconnects) - But it is farther from how the programs actually work with • Message passing is typically used to model distributed programs • Both models are theoretically equivalent in their power - But the practical performance of various algorithms will be different - We work with shared objects model where performance matters (taking care to optimize the number of shared objects and the number of operations on them is close to the real practical optimization)
  • 11. Parallel Concurrent Distributed [shared memory] [message passing] * NOTE: There is no general consensus on this terminology
  • 12. Properties of concurrent programs • Serial programs are usually deterministic - Unless explicit calls to random number generator are present - Their properties are established by analyzing their state, invariants, pre- and post- conditions • Concurrent programs are inherently nondeterministic - Even when the code for each thread is fully deterministic - Outcome depends on the actual execution history – what operations on shared objects where performed by threads in what order - When you say “program A has property P” it actually means “program A has property P in any execution”
  • 13. Modeling executions • S is a global state, which includes: - State of all threads S - State of all shared objects or all “in flight” messages (in distributed system) f g • f and g are operations on shared objects f(S) g(S) - for registers it can be either ri.read(value) or ri.write(value) - There are as many possible operations in each state as there are active threads • not as simple for distributed case • f(S) is a new state after operation f was performed in state S
  • 14. Example P0,Q0 x=0 shared int x (-, -) thread P: thread Q: 0: x = 1 0: x = 2 1: print x P1,Q0 P0,Q1 1: print x 2: stop x=1 A total of 17 states x=2 2: stop (-, -) (-, -) P2,Q0 P1,Q1 P1,Q1 P0,Q2 x=1 x=2 x=1 x=2 (1, -) (-, -) (-, -) (-, 2) +1 state not shown +2 states not shown +2 states not shown +1 state not shown P2,Q2 P2,Q2 P2,Q2 P2,Q2 x=2 x=2 x=1 x=1 (1, 2) (2, 2) (1, 1) (2, 1)
  • 15. Discussion of the execution model with states • This model is not truly “parallel” - All operations happen serially (albeit in undefined order) • In reality (on a modern CPU) - A read or write operation is not instantaneous. It takes time - There are multiple memory banks that work in parallel. You have multiple read or write operation happening at the same time. • However, you can safely use this model for atomic registers - Atomic (linearizable) registers work as if each write or read is instantaneous and as if there is no parallelism - Will define what this means precisely later • A more general model of execution is needed to analyze a wider class of primitives
  • 16. Lamport’s happens before (occurs before) model • An execution history is a pair (H, →H) - “H” is a set of operations e, f, g, … that happened during execution - “→H” is a transitive, irreflexive, antisymmetric relation on a set of operations H (strict partial order relation) - “e → H f” means “e happens before f [in H]” or “occurs before” • H is ommited where it is not ambiguous • In global time model of execution, each operation e has - s(e) and f(e) – times where it has started and finished e  f  f (e) s( f ) - Albeit convenient to visualize, in reality there is no global time (no central clock) in a modern system (so formal proofs cannot use time)
  • 17. Legal executions • Execution is legal, if it satisfies specifications of all objects x.w(1) P x.r(1) LEGAL Q x.w(1) P x.r(2) ILLEGAL Q
  • 18. Serial executions • Execution is serial, if “happens before” is a total order x.w(1) P x.r(1) SERIAL Q x.w(1) P x.r(1) NON-SERIAL Q e and f are called parallel when e f  f e  
  • 19. Linearizable executions • Execution is linearizable, if its history (“happens before” relation) can be extended to a legal and serial (total) history x.w(1) P x.r(1) LINEARIZABLE Q x.w(1) P x.r(2) NON-LINEARIZABLE Q
  • 20. Linearizable (atomic) objects • Object is called linearizable (atomic) if all execution histories with respect to this object are linearizable • Lineriazability is composable. A system execution on linearizable objects is linearizable. • In global time model, each operation in linearizable execution has a linearization point T(e) e : s(e)  T (e)  f (e) e, f : e  f  T (e)  T ( f )  e  f  T (e)  T ( f ) x.w(1) P x.r(1) Q
  • 21. Atomic registers and other objects • Atomic register == linearizable register - They work as if read/write operations happen instantaneously at linearization point and in some specific serial order - Thus we can use “global state” model of execution to analyze behavior of a program whose threads are working with shared atomic registers (or with other atomic objects) • volatile fields in Java work like atomic registers - AtomicXXX classes are atomic registers, too (with additional ops) • Thread-safe classes (synchronized, ConcurrentXXX) are atomic (linearizable) unless explicitly specified otherwise - “thread-safe” in practice means “linearizable”, e.g. designed to work as if all operations happen in some serial order without an outside synchronization even if accessed concurrently
  • 23. Mutual exclusion (lock) The mutex protocol • The main desired property of protocol is mutual exclusion. Two executions of thread Pid: critical section cannot be parallel: loop forever: nonCriticalSection i, j : i  j  CSi  CS j  CS j  CSi mutex.lock criticalSection mutex.unlock • It is also known as correctness requirement for mutual exclusion protocol
  • 24. Mutex attempt #1 threadlocal int id // 0 or 1 • This protocol does guarantee mutual exclusion shared boolean want[2] • But there is no guarantee of def lock: progress. It can get into live-lock want[id] = true (both threads spinning forever in while want[1 - id]: pass lock) def unlock: • So, the other desired property is want[id] = false progress: critical section should get entered infinitely often
  • 25. Mutex attempt #2 threadlocal int id // 0 or 1 • This protocol does guarantee mutual exclusion and progress shared int victim • But critical section can be entered def lock: in a turn-by-turn fashion only. One victim = id thread working in isolation will while victim == id: pass starve. def unlock: • So, the stronger progress is pass desired. Freedom from starvation: if one (or more) threads wants to enter critical section, then it’ll enter CS in a finite number of steps
  • 26. Peterson’s mutual exclusion algorithm threadlocal int id // 0 or 1 • This protocol does guarantee mutual exclusion, progress and shared boolean want[2] shared int victim freedom from starvation • The order of operations in this def lock: pseudo-code is important want[id] = true victim = id • Not the first one invented (1981), while want[1-id] and but the simplest 2-thread one victim == id: pass • Hard to generalize to N threads (can be, but the result is complex) def unlock: want[id] = false
  • 27. Lamport’s [bakery] mutual exclusion algorithm threadlocal int id // 0 to N-1 • This protocol does guarantee mutual exclusion, progress shared boolean want[N] and freedom from starvation shared int label[N] for N threads def lock: • This protocol has an additional want[id] = true doorway first-come, first-served label[id] = max(label) + 1 (FCFS) property. First thread while exists k: k != i and want[k] and finishing doorway gets lock (label[k], k) < (label[id], id) : first pass • But relies on infinite labels. def unlock: They can be replaced with want[id] = false “concurrent bounded timestamps”
  • 28. Pros and cons of locks • With mutual exclusion any serial object can be turned into a linearizable shared object. - Just protect all operations as critical sections with a mutex - Using two phase locking (2PL) you can build complex linearizable objects out of smaller building blocks - Nothing more but shared registers are enough to build a mutex - Profit! • But - By using multiple locks you can get into a deadlock - Locks lead to priority inversion - Locks limit concurrency of code by ensuring that critical sections are executed strictly serially with respect to each other
  • 29. Amdahl’s Law for parallelization • The maximal speedup of code with N threads when S portion of it is serial 1 speedup  1 S S N 1 lim speedup  N  S • Even when just 5% of code is serial (S=0.05), the maximal possible speedup of the code is 20.
  • 30. Non-blocking algorithms (objects) • What happens if OS scheduler pauses a thread that is working inside a critical section (is holding a lock)? - No other operation on the corresponding object can proceed • Lock-free: An object or operation (method) is lock-free if one of the active (non-paused) threads can complete an operation in the finite number of steps. - Some threads may starve, but only when some other threads complete their operations • Wait-free: An object or operation (method) is wait-free if any of the active (non-paused) threads can complete an operation in the finite number of steps - No starvation is allowed
  • 31.
  • 32. Non-atomic registers • Physical register (SRAM) is not atomic - However, it is wait-free, but… - It stores only boolean (bit) values - It can have only a single reader (SR) and single writer (SW) - Trying to read and write at the same time leads to unpredictable results - But it is a safe register • When reading after write completes, the most recent written value is returned Through a chain of software constructions on top of safe boolean SRSW registers it is possible to build wait-free atomic multi valued multi reader (MR) multi writer (MW) register
  • 33. Atomic shapshot • Just read values of N registers in a loop and return - is not an atomic snapshot (“read N registers atomically”) operation System states r1.w(1) r2.w(2) r1 r2 P r1.r(0) r2.r(2) 0 0 Q ? 1 0 Q tries to take snapshot: 1 2 this execution cannot be linearized Read state r1 r2 0 2
  • 34. Lock-free atomic snapshot • Add version to each register - On write atomically write a pair (new_version, new_value) to a register where new_version = old_version + 1 • To take an atomic shapshot - Read in a loop all versions and values - Reread them to check if versions are still them same • If still same -> snapshot was atomic, return it • If changed -> shapshot was not atomic, repeat • Can loop trying to take snapshot forever (starvation), thus it is not a wait-free algorithm • But it is lock-free. The system as a whole has progress. A loop in snapshot means writes are being completed
  • 35. Wait-free atomic snapshot • Yes, it is possible to make it wait-free, so that every operation (including snapshot) is guaranteed to complete in a finite number of steps under all circumstances - Threads will have to cooperate - Each updating thread will have to take a snapshot and store it in its own per-thread register to help complete concurrent snapshots • O(N2) storage requirement, O(N) time for each operation • Not practical - This is true about all wait-free algorithms - There are no practical wait-free algorithms • But certain individual non-modifying operations in some algorithms can be implemented wait-free
  • 36. Wait-free synchronization and consensus The consensus protocol • What other wait-free objects can we build using atomic wait-free registers threadlocal int proposal as our primitive? - The question was definitely answered thread Pid: print consensus by M. Herlihy in 1991 stop - He considered wait-free implementations of consensus protocol • In a consensus protocol all threads have to reach agreement on a value. - It has to be non-trivial - The protocol must be wait-free
  • 37. Consensus number • Consensus number of a shared object or class of objects is the largest number N, such that a [wait-free] consensus protocol for Lock-based (blocking) N threads can be implemented consensus protocol using these objects as primitive threadlocal int proposal // != 0 building blocks. shared int value • Consensus number of atomic registers is 1 (one, uno, один) def consensus: - Even two threaded [wait-free] lock consensus protocol cannot be if value == 0: reached using any number of value = proposal unlock atomic registers return value - However, it’s trivial with locks!
  • 38. Read-Modify-Write (RMW) registers • It’s a register that is augmented with additional RMW operation(s) - Each RMW operation has a kernel function F and is typically named “getAndF” RMW register • Common2 class of RMW kernels shared int value - F1(F2(x)) == F1(x) or def getAndF: - F1(F2(x)) == F2(F1(x)) old = value // read • Common2 examples: value = F(old) // modify, write return old - F(x)=a // set to const - F(x)=x+a // add const Non-trivial Common2 RMW registers have consensus number 2
  • 39. Consensus hierarchy Objects and operations Consensus number Atomic Register with get (read), set (write) operations 1 Atomic snapshot of N registers Common2 Read-Modify-Write Registers: 2 getAndAdd (atomic inc/dec), getAndSet (atomic swap), queue and stack (with enqueue/dequeue, push/pop only) Atomic assignment of any N registers 2n-2 Universal operations: ∞ compareAndSet/compareAndSwap (CAS), queue with peek operation, memory-to-memory swap
  • 40. Universality of consensus • Any object can be turned into a concurrent wait-free linearizable object for N threads if we have a consensus protocol for N threads using universal construction - Corollary: consensus hierarchy is strict. - However, universal construction is not really efficient for real-life • Lock-free universal construction via CAS is easy and practical shared register<MyObject> value def concurrentOperationX: MyObject is a pointer loop: if it’s state does not fit oldval = value.get into CAS-able newval = oldval.deepCopy machine word newval.serialOperationX until value.CAS(oldval, newval) is successful
  • 41. Implementing lock-free algorithms • Let’s try to implement CAS-based universal construction in C: typedef struct object { /* my object’s state is here */ } object_t; void serial_operation_X(object_t *ptr); // updates state pointed to by ptr void concurrent_operation_X(object_t **ptr) { object_t *oldval, *newval = malloc(sizeof(object_t)); do { oldval = *ptr; memcpy(newval, oldval, sizeof(object_t)); serial_operation_X(newval); } while (! __sync_bool_compare_and_swap(ptr, oldval, newval)); free(oldval); } Problem: it can copy trash, that was freed, and serial_operation_X will crash
  • 42. Implementing lock-free algorithms (attempt #2) • Let’s try to implement CAS-based universal construction in C: typedef struct object { /* my object’s state is here */ } object_t; void serial_operation_X(object_t *ptr); // updates state pointed to by ptr void concurrent_operation_X(object_t **ptr) { object_t *oldval, *newval = malloc(sizeof(object_t)); do { oldval = *ptr; memcpy(newval, oldval, sizeof(object_t)); // assume no segfault here __sync_synchronize(); // make sure we see changes of *ptr if (oldval != *ptr) continue; serial_operation_X(newval); } while (! __sync_bool_compare_and_swap(ptr, oldval, newval)); free(oldval); }
  • 43. Still doesn’t work: ABA problem A, B and C are memory locations start with *ptr == A Thread P: Thread Q: 1: oldval is A 1: oldval == A 2: (newval = malloc()) is B 2: (newval = malloc()) == C 3: CAS(ptr, A, B) is successful 4: free(A) // makes operation_X again // sleeps/slow all that time 5: oldval is B 6: (newval = malloc()) is A 7: CAS(ptr, B, A) is successful 8: free(B) 3: CAS(ptr, A, C) is successful *ptr is going A, B, A
  • 44. Solving ABA problem • Attach version to a pointer and increment it on every operation - Need to CAS two words at the same - That’s why CPUs have ops like CMPXCHG8B (for 32bit mode) and CMPXCHG16B (for 64bit mode) • Rely on garbage collector (GC) for memory management - In GC runtime environment the ABA problem simply does not exist - Makes your non-blocking concurrent programming much easier! • Use other schemes that rely on coordination between threads (hazard pointers) • Use special hardware support (LL/SC or hardware memory transactions) • Still, universal construction is efficient only if object state is small
  • 45. Tree-like persistent data structures oldval newval Root Update B Root’ NodeA NodeB NodeB’ NodeC NodeD Reallocate and update only path from updated node to the root
  • 47. Lock-free stacks • Use universal construction on linked-list representation of the stack (it’s a trivial tree-like structure!) - root is pointing to the top of stack - push and pop have trivial implementation with minimal overhead • With a lot of cores, root becomes bottleneck. Use elimation-backoff - Threads trying to push and pop at the same time meet elsewhere • But linked data structures are slow on modern machines - No memory locality - Next memory address is not known before reading previous node – code must pay memory latency penalty on each access - Array-based single-threaded stack is many times faster than linked one • Alas, no practical & efficient array-based lock-free algos are known
  • 48. Lock-free queues • Michael & Scott algo for lock-free unbounded linked queue - Great implementation in java.util.concurrent.ConcurrentLinkedQueue • Array-based bounded cyclic queues cannot be practically & efficiently make lock-free - But limiting to a single producer and single consumer helps (in case of a bounded array-based queue) - Don’t not even need CAS for SPSC queue - Use N of them for MP or MC - Can do MP and MC queue (and even deque) if you additionally keep a version of every slot in the array • but this is not really practical - Or reallocate memory when array is filled (unrolled linked list) • a really practical alternative if needed
  • 49. More practical notes • Strict FIFO queue will always get contended - Multiple producers will contend for tail - Multiple consumers will contend for head - Does not scale to a lot of cores • In practice, strict FIFO queue is rarely needed - Usually, it does not really matter if first in is really first out • but it needs to be eventually out - See java.util.concurrent.ForkJoinPool for one alternative • Lock-free algorihthms can be faster (and scale better) that their lock-based counterparts, but always slower than serial algos Avoid unnecessary synchronization between threads
  • 50. Data structures for search • Ordered - Balanced trees are hard to make lock-free (not practical) - But Bill Pugh’s skip lists are practical in lock-free case • Because they are based on order linked sets • which support lock-free implementation • See java.util.concurrent.ConcurrentSkipList for implementation • Unordered - Fixed-size hash-tables are trivial in concurrent case - Resizable hash-table can be implemented lock-free, too • As either ordered linked set with lookup hash-table (recursive split-ordering) • Or fully based on arrays (Cliff Click’s high-scale hash-table)
  • 51. Hardware transactional memory (HTM) • Is scheduled to debut in Intel Haswell processors - Allows to begin transaction, perform it inside processor cache, then commit to main memory its effects or abort - Enhances existing cache infrastructure - While tracking interference between threads on top of existing cache- coherence protocols • It makes more efficient lock-free algorithms practical - Like LIFO stacks and FIFO queues with any number of participants - Like concurrent hash tables without pain - Hardware just automatically detects conflicts without a code overhead to manage them and rolls back allowing code to start transaction again (just like you’d do in CAS universal construction)
  • 52. Software Transactional Memory (STM) • Is a simplified programming model - Similar to locks, but use atomic section instead of synchronized - Same problems as locks, but • Without worry to take the right lock • Without worry about deadlocks • Conflicting transaction is transparently restarted by transaction manager • It has poor performance, but makes life easier - when there are few limited places, where threads have to coordinate though shared objects - It is inefficient if there are a lot of shared objects and/or they are accessed very often
  • 53. There’s much more to it. It is an active area of research
  • 55. Thank you for your attention! Slides will be posted to elizarov.livejournal.com