SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
Simon Peyton Jones (Microsoft Research)

         FP Exchange, April 2011
 The free lunch is over. Muticores are here. We have
  to program them. This is hard. Yada-yada-yada.
 Programming parallel computers
    Plan A. Start with a language whose computational fabric is
     by-default sequential, and by heroic means make the program
     parallel
    Plan B. Start with a language whose computational fabric is
     by-default parallel
 Every successful large-scale application of parallelism
  has been largely declarative and value-oriented
        SQL Server
        LINQ
        Map/Reduce
        Scientific computation
 Plan B will win. Parallel programming will increasingly
  mean functional programming
Parallel functional programming was tried in
       the 80’s, and basically failed to deliver
               Then                                    Now
Uniprocessors were getting faster     Uniprocessors are stalled
really, really quickly.
Our compilers were crappy naive, so   Compilers are pretty good
constant factors were bad
The parallel guys were a dedicated    They are regular Joe Developers
band of super-talented programmers
who would burn any number of cycles
to make their supercomputer smoke.
Parallel computers were really        Everyone will has 8, 16, 32 cores,
expensive, so you needed 95%          whether they use them or not. Even
utilisation                           using 4 of them (with little effort) would
                                      be a Jolly Good Thing
Parallel functional programming was tried in
        the 80’s, and basically failed to deliver
                Then                            Now
We had no story about            Lots of progress
(a) locality,                    • Software transactional memory
(b) exploiting regularity, and   • Distributed memory
(c) granularity                  • Data parallelism
                                 • Generating code for GPUs

                                 This talk
 “Just use a functional language and your
  troubles are over”
 Right idea:
   No side effects Limited side effects
   Strong guarantees that sub-computations do not
    interfere
 But far too starry eyed. No silver bullet:
   one size does not fit all
   need to “think parallel”: if the algorithm has
    sequential data dependencies, no language will
    save you!
A “cost model” gives
 Different problems need                   the programmer some
  different solutions.                         idea of what an
     Shared memory vs distributed memory      operation costs,
                                            without burying her in
     Transactional memory
                                                   details
     Message passing
     Data parallelism                      Examples:
     Locality                              • Send message: copy
                                              data or swing a
     Granularity
                                              pointer?
     Map/reduce                            • Memory fetch:
     ...on and on and on...                  uniform access or do
                                              cache effects
 Common theme:                               dominate?
   the cost model matters – you can’t      • Thread spawn: tens
    just say “leave it to the system”         of cycles or tens of
                                              thousands of cycles?
   no single cost model is right for all   • Scheduling: can a
                                              thread starve?
 Goal: express the “natural structure” of a program
  involving lots of concurrent I/O (eg a web serer, or
  responsive GUI, or download lots of URLs in parallel)
     Makes perfect sense with or without multicore
     Most threads are blocked most of the time

   Usually done with
     Thread pools
     Event handler
     Message pumps

   Really really hard to get right, esp when combined with
    exceptions, error handling

NB: Significant steps forward in F#/C# recently: Async<T> See
http://channel9.msdn.com/blogs/pdc2008/tl11
 Sole goal: performance using multiple cores
   …at the cost of a more complicated program

 #include “StdTalk.io”
     Clock speeds not increasing
     Transistor count still increasing
     Delivered in the form of more cores
     Often with an inadequate memory bandwidth

 No alternative: the only way to ride Moore’s
  law is to write parallel code
 Use a functional language
 But offer many different approaches to
  parallel/concurrent programming, each with
  a different cost model
 Do not force an up-front choice:
   Better one language offering many abstractions
   …than many languages offer one each
   (HPF, map/reduce, pthreads…)
This talk
                                  Lots of different concurrent/parallel
Multicore                         programming paradigms (cost models)
                                                in Haskell


                               Use Haskell!




   Task parallelism            Semi-implicit             Data parallelism
     Explicit threads,          parallelism            Operate simultaneously on
  synchronised via locks,       Evaluate pure                  bulk data
    messages, or STM             functions in
                                   parallel
Modest parallelism                                      Massive parallelism
Hard to program             Modest parallelism          Easy to program
                            Implicit synchronisation    Single flow of control
                            Easy to program             Implicit synchronisation

     Slogan: no silver bullet: embrace diversity
Multicore
                              Parallel
                           programming
                             essential



  Task parallelism
    Explicit threads,
 synchronised via locks,
   messages, or STM
 Lots of threads, all performing I/O
   GUIs
   Web servers (and other servers of course)
   BitTorrent clients
 Non-deterministic by design
 Needs
   Lightweight threads
   A mechanism for threads to coordinate/share
   Typically: pthreads/Java threads +
    locks/condition variables
 Very very lightweight threads
   Explicitly spawned, can perform I/O
   Threads cost a few hundred bytes each
   You can have (literally) millions of them
   I/O blocking via epoll => OK to have hundreds of
    thousands of outstanding I/O requests
   Pre-emptively scheduled
 Threads share memory
 Coordination via Software Transactional
  Memory (STM)
main = do { putStr (reverse “yes”)
               ; putStr “no” }
• Effects are explicit in the type system
  – (reverse “yes”) :: String      -- No effects
  – (putStr “no”) :: IO ()      -- Can have effects
• The main program is an effect-ful
  computation
  – main :: IO ()
newRef :: a -> IO (Ref a)
                         readRef :: Ref a -> IO a
                         writeRef :: Ref a -> a -> IO ()


main = do { r <- newRef 0         Reads and
          ; incR r                writes are
          ; s <- readRef r        100% explicit!
          ; print s }

incR :: Ref Int -> IO ()          You can’t say
incR r = do { v <- readRef r      (r + 6), because
            ; writeRef r (v+1)    r :: Ref Int
            }
forkIO :: IO () -> IO ThreadId
  forkIO spawns a thread
  It takes an action as its argument
webServer :: RequestPort -> IO ()
webServer p = do { conn <- acceptRequest p
                 ; forkIO (serviceRequest conn)
                 ; webServer p }

serviceRequest :: Connection -> IO ()
serviceRequest c = do { … interact with client … }
         No event-loop spaghetti!
 How do threads coordinate with each other?

main = do { r <- newRef 0
          ; forkIO (incR r)
          ; incR r
          ; ... }                  Aargh!
                                   A race
incR :: Ref Int -> IO ()
incR r = do { v <- readRef r
            ; writeRef r (v+1) }
A 10-second review:
 Races: due to forgotten locks

 Deadlock: locks acquired in “wrong” order.

 Lost wakeups: forgotten notify to condition
  variable
 Diabolical error recovery: need to restore
  invariants and release locks in exception
  handlers

   These are serious problems. But even worse...
Scalable double-ended queue: one lock per cell




             No interference if
             ends “far enough”
                   apart

        But watch out when the queue
          is 0, 1, or 2 elements long!
Difficulty of concurrent
 Coding style
                            queue
Sequential code       Undergraduate
Difficulty of concurrent
 Coding style
                            queue

Sequential code       Undergraduate
  Locks and
                    Publishable result at
  condition
                  international conference
  variables
Difficulty of concurrent
  Coding style
                            queue
Sequential code       Undergraduate
   Locks and
                    Publishable result at
   condition
                  international conference
   variables


Atomic blocks Undergraduate
atomically { ... sequential get code ... }

    To a first approximation, just write the
     sequential code, and wrap atomically around it
    All-or-nothing semantics: Atomic commit
    Atomic block executes in Isolation
    Cannot deadlock (there are no locks!)     ACID
    Atomicity makes error recovery easy
     (e.g. exception thrown inside the get code)
atomically :: IO a -> IO a

  main = do { r <- newRef 0
             ; forkIO (atomically (incR r))
             ; atomically (incR r)
             ; ... }

 atomically is a function, not a syntactic
  construct
 A worry: what stops you doing incR
  outside atomically?
atomically :: STM a -> IO a
 Better idea:   newTVar :: a -> STM (TVar a)
                 readTVar :: TVar a -> STM a
                 writeTVar :: TVar a -> a -> STM ()

incT :: TVar Int -> STM ()
incT r = do { v <- readTVar r; writeTVar r (v+1) }
main = do { r <- atomically (newTVar 0)
          ; forkIO (atomically (incT r))
          ; atomic (incT r)
          ; ... }
atomic :: STM a -> IO a
                    newTVar :: a -> STM (TVar a)
                    readTVar :: TVar a -> STM a
                    writeTVar :: TVar a -> a -> STM ()


 Can’t fiddle with TVars outside atomic
  block [good]
 Can’t do IO inside atomic block [sad,
  but also good]
 No changes to the compiler
  (whatsoever). Only runtime system and
  primops.
 ...and, best of all...
incT :: TVar Int -> STM ()
  incT r = do { v <- readTVar r; writeTVar r (v+1) }   Composition
  incT2 :: TVar Int -> STM ()                          is THE way
  incT2 r = do { incT r; incT r }                      we build big
                                                        programs
  foo :: IO ()                                           that work
  foo = ...atomically (incT2 r)...
 An STM computation is always executed atomically
  (e.g. incT2). The type tells you.
 Simply glue STMs together arbitrarily; then wrap with
  atomic
 No nested atomic. (What would it mean?)
 MVars for efficiency in (very common)
  special cases
 Blocking (retry) and choice (orElse) in STM
 Exceptions in STM
 A very simple web server written in Haskell
     full HTTP 1.0 and 1.1 support,
     handles chunked transfer encoding,
     uses sendfile for optimized static file serving,
     allows request bodies and response bodies to be
      processed in constant space

 Protection for all the basic attack vectors:
  overlarge request headers and slow-loris
  attacks
 500 lines of Haskell (building on some amazing
  libraries: bytestring, blaze-builder, iteratee)
 A new thread for each user request
 Fast, fast

     Pong requests/sec
 Again, lots of threads: 400-600 is typical
 Significantly bigger program: 5000 lines of
  Haskell – but            (Not shown: Vuse 480k lines)
  way smaller       80,000
                     loc
  than the
  competition


                                       Erlang
                             Haskell
 Performance:
  roughly
  competitive
 Built on STM
 Heavy use of combinator parsers: “reads like
  the protocol specification if you squint your
  eyes”
        decodeMsg :: Parser Message
        decodeMsg = do
          m <- getWord8
          case m of
            0 -> return Choke
            1 -> return Unchoke
            2 -> return Interested
            3 -> return NotInterested
            4 -> Have     <$> gw32
            5 -> BitField <$> getRestLazyByteString
            6 -> Request <$> gw32 <*> gw32 <*> gw32
            7 -> Piece    <$> gw32 <*> gw32 <*> getRestLazyByteString
            8 -> Cancel   <$> gw32 <*> gw32 <*> gw32
            9 -> Port     <$> (fromIntegral <$> getWord16be)
            _ -> fail "Incorrect message parse"
         where gw32 = fromIntegral <$> getWord32bea
 So far everything is shared memory
 Distributed memory has a different cost model




 Think message passing…
 Think Erlang…
 Processes share nothing; independent GC;
  independent failure
 Communicate over channels
 Message communication = serialise to
  bytestream, transmit, deserialise
 Comprehensive failure model
     A process P can “link to” another Q
     If Q crashes, P gets a message
     Use this to build process monitoring apparatus
     Key to Erlang’s 5-9’s reliability
 Provide Erlang as a library – no language
     extensions needed
           newChan :: PM (SPort a, RPort a)
           send    :: Serialisable a => SPort a -> a -> PM a
           receive :: Serialisable a => RPort a -> PM a
           spawn :: NodeId -> PM a -> PM PId

 Process



                                   Channels



 May contain many
  Haskell threads,
which share via STM
 Many static guarantees for cost model:
   (SPort a) is serialisable, but not (RPort a)
    => you always know where to send your message
   (TVar a) not serialisable
    => no danger of multi-site STM
The k-means clustering algorithm takes a set of data points
     and groups them into clusters by spatial proximity.




Initial clusters have    After first iteration                              After third iteration
                                                  After second iteration
random centroids


                        ●Start with Z lots of data points in N-dimensional space
                        ●Randomly choose k points as ”centroid candidates”
                        ●Repeat:
                            1. For each data point, find the nearerst ”centroid candidate”
                            2. For each candidate C, find the centroid of all points nearest to C
                            3. Make those the new centroid candidates, and repeat
   Converged
●Start with Z lots of data points in N-dimensional space
●Randomly choose k points as ”centroid candidates”
●Repeat:
    1. For each data point, find the nearerst ”centroid candidate”
    2. For each candidate C, find the centroid of all points nearest to C
    3. Make those the new centroid candidates, and repeat if necessary

                                          Step 1
          MapReduce
                                                       Step 2
                             Mapper 1                                        Step 3

                                               Reducer
                             Mapper 2             1                 conver
      Master                                       …                 ged?         Result
                             Mapper 3          Reducer
                                                  k

                             Mapper n




       Running today in Haskell on an Amazon EC2 cluster [current work]
Highly concurrent
applications are a killer
    app for Haskell
Highly concurrent
applications are a killer
    app for Haskell


But wait… didn’t you say
  that Haskell was a
 functional language?
 Side effects are inconvenient
     do { v <- readTVar r; writeTVar r (v+1) }
  vs
      r++
 Result: almost all the code is functional,
  processing immutable data
 Great for avoiding bugs: no aliasing, no race
  hazards, no cache ping-ponging.
 Great for efficiency: only TVar access are
  tracked by STM
Multicore

                  Use Haskell!



                  Semi-implicit
                   parallelism
                   Evaluate pure
                    functions in
                      parallel

               Modest parallelism
               Implicit synchronisation
               Easy to program

  Slogan: no silver bullet: embrace diversity
Place n queens on an n x n board
                             such that no queen attacks any
                            other, horizontally, vertically, or
                                       diagonally
 Sequential code
nqueens :: Int -> [[Int]]
nqueens n = subtree n []

subtree :: Int -> [Int] -> [[Int]]
subtree 0 b = [b]
subtree c b = concat $
              map (subtree (c-1)) (children b)

children :: [Int] -> [[Int]]
children b = [ (q:b) | q <- [1..n],
                       safe q b ]
Place n queens on an n x n board
                   such that no queen attacks any
                  other, horizontally, vertically, or
                             diagonally
[1,3,1]   [1,1]

[2,3,1]
          [2,1]         [1]
[3,3,1]

                                                Start
[4,3,1]   [3,1]                         []      here

[5,3,1]   [4,1]
                        [2]
[6,3,1]
           ...

 ...
                         ...
Place n queens on an n x n board
                             such that no queen attacks any
                            other, horizontally, vertically, or
                                       diagonally
 Sequential code
nqueens :: Int -> [[Int]]
nqueens n = subtree n []

subtree :: Int -> [Int] -> [[Int]]
subtree 0 b = [b]
subtree c b = concat $
              map (subtree (c-1)) (children b)

children :: [Int] -> [[Int]]
children b = [ (q:b) | q <- [1..n],
                       safe q b ]
Place n queens on an n x n board
                              such that no queen attacks any
                             other, horizontally, vertically, or
                                        diagonally
 Parallel code
                                             Works on the
nqueens :: Int -> [[Int]]                    sub-trees in
nqueens n = subtree n []                       parallel

subtree :: Int -> [Int] -> [[Int]]
subtree 0 b = [b]
subtree c b = concat $
              parMap (subtree (c-1)) (children b)

children :: [Int] -> [[Int]]
children b = [ (q:b) | q <- [1..n],
                       safe q b ]

 Speedup: 3.5x on 6 cores
map       :: (a->b) -> [a] -> [b]
    parMap :: (a->b) -> [a] -> [b]

Good things
 Parallel program guaranteed not to change
  the result
 Deterministic: same result every run
 Very low barrier to entry
 “Strategies” to separate algorithm from
  parallel structure
Bad things
 Poor cost model; all too easy to fail to
  evaluate something and lose all parallelism
 Not much locality; shared memory
 Over-fine granularity can be a big issue
Profiling tools can help a lot
 As usual, watch out for Amdahl’s law!
 Find authentication or secrecy failures in
  cryptographic protocols. (Famous example: authentication
    failure in the Needham-Schroeder public key protocol. )

 About 6,500 lines of Haskell
   “I think it would be moronic to code CPSA in C or Python. The
    algorithm is very complicated, and the leap between the
    documented design and the Haskell code is about as small as one
    can get, because the design is functional.”

 One call to parMap
 Speedup of 3x on a quad-core --- worthwhile when
  many problems take 24 hrs to run.
 Modest but worthwhile speedups (3-10) for
  very modest investment
 Limited to shared memory; 10’s not 1000’s of
  processors
 You still have to think about a parallel
  algorithm! (Eg John Ramsdell had to
  refactor his CPSA algorithm a bit.)
Multicore

                  Use Haskell!




                                   Data parallelism
                                 Operate simultaneously on
                                         bulk data


                                  Massive parallelism
                                  Easy to program
                                  Single flow of control
                                  Implicit synchronisation

  Slogan: no silver bullet: embrace diversity
Data parallelism
     The key to using multicores at scale


Flat data parallel         Nested data parallel
  Apply sequential             Apply parallel
operation to bulk data      operation to bulk data

    Very widely used             Research project
e.g. Fortran(s), *C
                                    MPI, map/reduce
 The brand leader: widely used, well
  understood, well supported
          foreach i in 1..N {
             ...do something to A[i]...
          }

 BUT: “something” is sequential
 Single point of concurrency
 Easy to implement:
  use “chunking”
 Good cost model
  (both granularity and 1,000,000’s of (small) work items
                              P1         P2      P3

  locality)
Faces are compared by computing a distance               1 R         v              v
                                              dist(A,B)  
                                                                               A         B
between their multi-region histograms.                               h              h
                                                         R r1             r          r
                                                                                               1




                                     
                                             Multi-region histogram for
                                             candidate face as an array.
    A             r=1    r=2   r=3    r=4
1 R                            v                v
                                                                                           dist(A,B)  
                                                  v                                                                                            A             B
                                                                                                                                                      h
                                                        A 

                            replicate             h
                                                                                                      R r1                          h       r              r
                                                                                                                                                                     1
     v   A
                                           
     h
                                        zipWith                                     reduce                                  reduce                    map

                                                                                         v   A     v   B                                                 1
                                                                                                                                     
                                                                                                                                         R
                                                    v        A     v   B             h            h
                                                                                                                                         r 1
                                                                      h                                                                                             R
               v   B                            h                                                                   1


               h
                                                                                                                                                   
                                            
1         v                v
                                                                                                                                 r 1
                                                     v                                                                          R                  A         B
                                                                                              dist(A,B)                                                  hr
                                                           A 

                                 replicate           h
                                                                                                                               R         h       r
                                                                                                                                                                     1
                v   A
                                               
                h
                                             zipWi                                     reduce                                  reduce                     map
                                              th
                                                                                            v   A     v   B                                              1
                                                                                                                                         
                                                                                                                                             R
                                                       v        A     v   B             h            h
                                                                                                                                             r 1
                                                                         h                                                                                          R
                          v   B                    h                                                                   1


                          h
                                                                                                                                                       
                                                
           



     distances :: Array DIM2 Float -> Array DIM3 Float
       v 
       B
       h
                  -> Array DIM1 Float
     distances histA histBs = dists
       where

            histAs = replicate (constant (All, All, f)) histA
            diffs = zipWith (-) histAs histBs
            l1norm = reduce (¥a b -> abs a + abs b) (0) diffs
            regSum = reduce (+) (0) l1norm
            dists = map (/ r) regSum

             (h, r, f) = shape histBs
 Arrays as values: virtually no element-wise
  programming (for loops).
 Think APL, but with much more
  polymorphism
 Performance is (currently) significantly less
  than C
 BUT it
  auto-parallelises

                      Warning: take all such figures with buckets of salt
 GPUs are massively parallel processors, and
   are rapidly de-specialising from graphics
  Idea: your program (when run) generates a
   GPU program
distances :: Acc (Array DIM2 Float)
          -> Acc (Array DIM3 Float)
          -> Acc (Array DIM1 Float)
distances histA histBs = dists
  where
    histAs = replicate (constant (All, All, f)) histA
    diffs = zipWith (-) histAs histBs
    l1norm = reduce (¥a b -> abs a + abs b) (0) diffs
    regSum = reduce (+) (0) l1norm
    dists = map (/ r) regSum
 An (Acc a) is a syntax tree for a program
   computing a value of type a, ready to be
   compiled for GPU
  The key trick: (+) :: Num a => a –> a -> a
distances :: Acc (Array DIM2 Float)
          -> Acc (Array DIM3 Float)
          -> Acc (Array DIM1 Float)
distances histA histBs = dists
  where
    histAs = replicate (constant (All, All, f)) histA
    diffs = zipWith (-) histAs histBs
    l1norm = reduce (¥a b -> abs a + abs b) (0) diffs
    regSum = reduce (+) (0) l1norm
    dists = map (/ r) regSum
 An (Acc a) is a syntax tree for a program
  computing a value of type a, ready to be
  compiled for GPU
CUDA.run :: Acc (Array a b) -> Array a b

 CUDA.run
     takes the syntax tree
     compiles it to CUDA
     loads the CUDA into GPU
     marshals input arrays into GPU memory
     runs it
     marshals the result array back into Haskell memory
 The code for Repa (multicore) and
  Accelerate (GPU) is virtually identical
 Only the types change


 Other research projects with similar
  approach
     Nicola (Harvard)
     Obsidian/Feldspar (Chalmers)
     Accelerator (Microsoft .NET)
     Recursive islands (MSR/Columbia)
Data parallelism
The key to using multicores at scale


                  Nested data parallel
                      Apply parallel
                   operation to bulk data

                        Research project
 Main idea: allow “something” to be parallel
      foreach i in 1..N {
         ...do something to A[i]...
      }

 Now the parallelism
  structure is recursive,
  and un-balanced
 Much more expressive
 Much harder to implement

                         Still 1,000,000’s of (small) work items
Nested data                     Flat data
       parallel                      parallel
      program        Compiler       program
  (the one we want              (the one we want
      to write)                       to run)


 Invented by Guy Blelloch in the 1990s
 We are now working on embodying it in GHC:
  Data Parallel Haskell
 Turns out to be jolly difficult in practice
  (but if it was easy it wouldn’t be research).
  Watch this space.
 No single cost model suits all programs /
  computers. It’s a complicated world. Get used
  to it.
 For concurrent programming, functional
  programming is already a huge win
 For parallel programming at scale, we’re going to
  end up with data parallel functional programming
 Haskell is super-great because it hosts multiple
  paradigms. Many cool kids hacking in this space.
 But other functional programming languages are
  great too: Erlang, Scala, F#

Weitere ähnliche Inhalte

Was ist angesagt?

Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionCherryBerry2
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainAttila Szegedi
 
Whats New in Visual Studio 2012 for C++ Developers
Whats New in Visual Studio 2012 for C++ DevelopersWhats New in Visual Studio 2012 for C++ Developers
Whats New in Visual Studio 2012 for C++ DevelopersRainer Stropek
 
A Distributed Simulation of P-Systems
A Distributed Simulation of P-SystemsA Distributed Simulation of P-Systems
A Distributed Simulation of P-SystemsApostolos Syropoulos
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
Java & low latency applications
Java & low latency applicationsJava & low latency applications
Java & low latency applicationsRuslan Shevchenko
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPPier Luca Lanzi
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 

Was ist angesagt? (20)

OpenMp
OpenMpOpenMp
OpenMp
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Openmp
OpenmpOpenmp
Openmp
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
 
Whats New in Visual Studio 2012 for C++ Developers
Whats New in Visual Studio 2012 for C++ DevelopersWhats New in Visual Studio 2012 for C++ Developers
Whats New in Visual Studio 2012 for C++ Developers
 
A Distributed Simulation of P-Systems
A Distributed Simulation of P-SystemsA Distributed Simulation of P-Systems
A Distributed Simulation of P-Systems
 
Introducing Parallel Pixie Dust
Introducing Parallel Pixie DustIntroducing Parallel Pixie Dust
Introducing Parallel Pixie Dust
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Nug2004 yhe
Nug2004 yheNug2004 yhe
Nug2004 yhe
 
Java & low latency applications
Java & low latency applicationsJava & low latency applications
Java & low latency applications
 
OpenMP
OpenMPOpenMP
OpenMP
 
Khan and morrison_dq207
Khan and morrison_dq207Khan and morrison_dq207
Khan and morrison_dq207
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
 
2CPP02 - C++ Primer
2CPP02 - C++ Primer2CPP02 - C++ Primer
2CPP02 - C++ Primer
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 

Andere mochten auch

Type-checking Polymorphic Units for Astrophysics Research in Haskell
Type-checking Polymorphic Units for Astrophysics Research in HaskellType-checking Polymorphic Units for Astrophysics Research in Haskell
Type-checking Polymorphic Units for Astrophysics Research in HaskellTakayuki Muranushi
 
Reactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedReactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedDaniel Sawano
 
Peyton jones-2011-type classes
Peyton jones-2011-type classesPeyton jones-2011-type classes
Peyton jones-2011-type classesTakayuki Muranushi
 
Peyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slidePeyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slideTakayuki Muranushi
 
MongoDB as Search Engine Repository @ MongoTokyo2011
MongoDB as Search Engine Repository @ MongoTokyo2011MongoDB as Search Engine Repository @ MongoTokyo2011
MongoDB as Search Engine Repository @ MongoTokyo2011Preferred Networks
 
Succinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionSuccinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionPreferred Networks
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011Preferred Networks
 
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説Preferred Networks
 
ウェーブレット木の世界
ウェーブレット木の世界ウェーブレット木の世界
ウェーブレット木の世界Preferred Networks
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decompositionKenta Oono
 
Jubatusにおける大規模分散オンライン機械学習
Jubatusにおける大規模分散オンライン機械学習Jubatusにおける大規模分散オンライン機械学習
Jubatusにおける大規模分散オンライン機械学習Preferred Networks
 
MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習Preferred Networks
 
大規模データ時代に求められる自然言語処理
大規模データ時代に求められる自然言語処理大規模データ時代に求められる自然言語処理
大規模データ時代に求められる自然言語処理Preferred Networks
 
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)Shirou Maruyama
 
Development and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and mafDevelopment and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and mafKenta Oono
 

Andere mochten auch (20)

Type-checking Polymorphic Units for Astrophysics Research in Haskell
Type-checking Polymorphic Units for Astrophysics Research in HaskellType-checking Polymorphic Units for Astrophysics Research in Haskell
Type-checking Polymorphic Units for Astrophysics Research in Haskell
 
Reactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedReactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons Learned
 
Peyton jones-2011-type classes
Peyton jones-2011-type classesPeyton jones-2011-type classes
Peyton jones-2011-type classes
 
Peyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slidePeyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slide
 
MongoDB as Search Engine Repository @ MongoTokyo2011
MongoDB as Search Engine Repository @ MongoTokyo2011MongoDB as Search Engine Repository @ MongoTokyo2011
MongoDB as Search Engine Repository @ MongoTokyo2011
 
jubatus pressrelease
jubatus pressreleasejubatus pressrelease
jubatus pressrelease
 
Succinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionSuccinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document Collection
 
rcast_20140411
rcast_20140411rcast_20140411
rcast_20140411
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
 
Herd
HerdHerd
Herd
 
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
 
ウェーブレット木の世界
ウェーブレット木の世界ウェーブレット木の世界
ウェーブレット木の世界
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
 
Jubatusにおける大規模分散オンライン機械学習
Jubatusにおける大規模分散オンライン機械学習Jubatusにおける大規模分散オンライン機械学習
Jubatusにおける大規模分散オンライン機械学習
 
bigdata2012ml okanohara
bigdata2012ml okanoharabigdata2012ml okanohara
bigdata2012ml okanohara
 
MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習
 
大規模データ時代に求められる自然言語処理
大規模データ時代に求められる自然言語処理大規模データ時代に求められる自然言語処理
大規模データ時代に求められる自然言語処理
 
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
 
bigdata2012nlp okanohara
bigdata2012nlp okanoharabigdata2012nlp okanohara
bigdata2012nlp okanohara
 
Development and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and mafDevelopment and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and maf
 

Ähnlich wie Parallel Functional Programming Makes a Comeback

Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...Dataconomy Media
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthingtonoscon2007
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overviewstasimus
 
MERIMeeting du 27 mai 2014 - Parallel Programming
MERIMeeting du 27 mai 2014 - Parallel ProgrammingMERIMeeting du 27 mai 2014 - Parallel Programming
MERIMeeting du 27 mai 2014 - Parallel ProgrammingOlivier NAVARRE
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladevPavel Tsukanov
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...VAISHNAVI MADHAN
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...ChangWoo Min
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...KRamasamy2
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...InfinIT - Innovationsnetværket for it
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJAX London
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...Maarten Balliauw
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlangMirko Bonadei
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 

Ähnlich wie Parallel Functional Programming Makes a Comeback (20)

Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthington
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overview
 
MERIMeeting du 27 mai 2014 - Parallel Programming
MERIMeeting du 27 mai 2014 - Parallel ProgrammingMERIMeeting du 27 mai 2014 - Parallel Programming
MERIMeeting du 27 mai 2014 - Parallel Programming
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
Concurrent programming1
Concurrent programming1Concurrent programming1
Concurrent programming1
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlang
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Parallel Functional Programming Makes a Comeback

  • 1. Simon Peyton Jones (Microsoft Research) FP Exchange, April 2011
  • 2.  The free lunch is over. Muticores are here. We have to program them. This is hard. Yada-yada-yada.  Programming parallel computers  Plan A. Start with a language whose computational fabric is by-default sequential, and by heroic means make the program parallel  Plan B. Start with a language whose computational fabric is by-default parallel  Every successful large-scale application of parallelism has been largely declarative and value-oriented  SQL Server  LINQ  Map/Reduce  Scientific computation  Plan B will win. Parallel programming will increasingly mean functional programming
  • 3. Parallel functional programming was tried in the 80’s, and basically failed to deliver Then Now Uniprocessors were getting faster Uniprocessors are stalled really, really quickly. Our compilers were crappy naive, so Compilers are pretty good constant factors were bad The parallel guys were a dedicated They are regular Joe Developers band of super-talented programmers who would burn any number of cycles to make their supercomputer smoke. Parallel computers were really Everyone will has 8, 16, 32 cores, expensive, so you needed 95% whether they use them or not. Even utilisation using 4 of them (with little effort) would be a Jolly Good Thing
  • 4. Parallel functional programming was tried in the 80’s, and basically failed to deliver Then Now We had no story about Lots of progress (a) locality, • Software transactional memory (b) exploiting regularity, and • Distributed memory (c) granularity • Data parallelism • Generating code for GPUs This talk
  • 5.  “Just use a functional language and your troubles are over”  Right idea:  No side effects Limited side effects  Strong guarantees that sub-computations do not interfere  But far too starry eyed. No silver bullet:  one size does not fit all  need to “think parallel”: if the algorithm has sequential data dependencies, no language will save you!
  • 6. A “cost model” gives  Different problems need the programmer some different solutions. idea of what an  Shared memory vs distributed memory operation costs, without burying her in  Transactional memory details  Message passing  Data parallelism Examples:  Locality • Send message: copy data or swing a  Granularity pointer?  Map/reduce • Memory fetch:  ...on and on and on... uniform access or do cache effects  Common theme: dominate?  the cost model matters – you can’t • Thread spawn: tens just say “leave it to the system” of cycles or tens of thousands of cycles?  no single cost model is right for all • Scheduling: can a thread starve?
  • 7.  Goal: express the “natural structure” of a program involving lots of concurrent I/O (eg a web serer, or responsive GUI, or download lots of URLs in parallel)  Makes perfect sense with or without multicore  Most threads are blocked most of the time  Usually done with  Thread pools  Event handler  Message pumps  Really really hard to get right, esp when combined with exceptions, error handling NB: Significant steps forward in F#/C# recently: Async<T> See http://channel9.msdn.com/blogs/pdc2008/tl11
  • 8.  Sole goal: performance using multiple cores  …at the cost of a more complicated program  #include “StdTalk.io”  Clock speeds not increasing  Transistor count still increasing  Delivered in the form of more cores  Often with an inadequate memory bandwidth  No alternative: the only way to ride Moore’s law is to write parallel code
  • 9.  Use a functional language  But offer many different approaches to parallel/concurrent programming, each with a different cost model  Do not force an up-front choice:  Better one language offering many abstractions  …than many languages offer one each  (HPF, map/reduce, pthreads…)
  • 10. This talk Lots of different concurrent/parallel Multicore programming paradigms (cost models) in Haskell Use Haskell! Task parallelism Semi-implicit Data parallelism Explicit threads, parallelism Operate simultaneously on synchronised via locks, Evaluate pure bulk data messages, or STM functions in parallel Modest parallelism Massive parallelism Hard to program Modest parallelism Easy to program Implicit synchronisation Single flow of control Easy to program Implicit synchronisation Slogan: no silver bullet: embrace diversity
  • 11. Multicore Parallel programming essential Task parallelism Explicit threads, synchronised via locks, messages, or STM
  • 12.  Lots of threads, all performing I/O  GUIs  Web servers (and other servers of course)  BitTorrent clients  Non-deterministic by design  Needs  Lightweight threads  A mechanism for threads to coordinate/share  Typically: pthreads/Java threads + locks/condition variables
  • 13.  Very very lightweight threads  Explicitly spawned, can perform I/O  Threads cost a few hundred bytes each  You can have (literally) millions of them  I/O blocking via epoll => OK to have hundreds of thousands of outstanding I/O requests  Pre-emptively scheduled  Threads share memory  Coordination via Software Transactional Memory (STM)
  • 14. main = do { putStr (reverse “yes”) ; putStr “no” } • Effects are explicit in the type system – (reverse “yes”) :: String -- No effects – (putStr “no”) :: IO () -- Can have effects • The main program is an effect-ful computation – main :: IO ()
  • 15. newRef :: a -> IO (Ref a) readRef :: Ref a -> IO a writeRef :: Ref a -> a -> IO () main = do { r <- newRef 0 Reads and ; incR r writes are ; s <- readRef r 100% explicit! ; print s } incR :: Ref Int -> IO () You can’t say incR r = do { v <- readRef r (r + 6), because ; writeRef r (v+1) r :: Ref Int }
  • 16. forkIO :: IO () -> IO ThreadId  forkIO spawns a thread  It takes an action as its argument webServer :: RequestPort -> IO () webServer p = do { conn <- acceptRequest p ; forkIO (serviceRequest conn) ; webServer p } serviceRequest :: Connection -> IO () serviceRequest c = do { … interact with client … } No event-loop spaghetti!
  • 17.  How do threads coordinate with each other? main = do { r <- newRef 0 ; forkIO (incR r) ; incR r ; ... } Aargh! A race incR :: Ref Int -> IO () incR r = do { v <- readRef r ; writeRef r (v+1) }
  • 18. A 10-second review:  Races: due to forgotten locks  Deadlock: locks acquired in “wrong” order.  Lost wakeups: forgotten notify to condition variable  Diabolical error recovery: need to restore invariants and release locks in exception handlers  These are serious problems. But even worse...
  • 19. Scalable double-ended queue: one lock per cell No interference if ends “far enough” apart But watch out when the queue is 0, 1, or 2 elements long!
  • 20. Difficulty of concurrent Coding style queue Sequential code Undergraduate
  • 21. Difficulty of concurrent Coding style queue Sequential code Undergraduate Locks and Publishable result at condition international conference variables
  • 22. Difficulty of concurrent Coding style queue Sequential code Undergraduate Locks and Publishable result at condition international conference variables Atomic blocks Undergraduate
  • 23. atomically { ... sequential get code ... }  To a first approximation, just write the sequential code, and wrap atomically around it  All-or-nothing semantics: Atomic commit  Atomic block executes in Isolation  Cannot deadlock (there are no locks!) ACID  Atomicity makes error recovery easy (e.g. exception thrown inside the get code)
  • 24. atomically :: IO a -> IO a main = do { r <- newRef 0 ; forkIO (atomically (incR r)) ; atomically (incR r) ; ... }  atomically is a function, not a syntactic construct  A worry: what stops you doing incR outside atomically?
  • 25. atomically :: STM a -> IO a  Better idea: newTVar :: a -> STM (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM () incT :: TVar Int -> STM () incT r = do { v <- readTVar r; writeTVar r (v+1) } main = do { r <- atomically (newTVar 0) ; forkIO (atomically (incT r)) ; atomic (incT r) ; ... }
  • 26. atomic :: STM a -> IO a newTVar :: a -> STM (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM ()  Can’t fiddle with TVars outside atomic block [good]  Can’t do IO inside atomic block [sad, but also good]  No changes to the compiler (whatsoever). Only runtime system and primops.  ...and, best of all...
  • 27. incT :: TVar Int -> STM () incT r = do { v <- readTVar r; writeTVar r (v+1) } Composition incT2 :: TVar Int -> STM () is THE way incT2 r = do { incT r; incT r } we build big programs foo :: IO () that work foo = ...atomically (incT2 r)...  An STM computation is always executed atomically (e.g. incT2). The type tells you.  Simply glue STMs together arbitrarily; then wrap with atomic  No nested atomic. (What would it mean?)
  • 28.  MVars for efficiency in (very common) special cases  Blocking (retry) and choice (orElse) in STM  Exceptions in STM
  • 29.  A very simple web server written in Haskell  full HTTP 1.0 and 1.1 support,  handles chunked transfer encoding,  uses sendfile for optimized static file serving,  allows request bodies and response bodies to be processed in constant space  Protection for all the basic attack vectors: overlarge request headers and slow-loris attacks  500 lines of Haskell (building on some amazing libraries: bytestring, blaze-builder, iteratee)
  • 30.  A new thread for each user request  Fast, fast Pong requests/sec
  • 31.  Again, lots of threads: 400-600 is typical  Significantly bigger program: 5000 lines of Haskell – but (Not shown: Vuse 480k lines) way smaller 80,000 loc than the competition Erlang Haskell  Performance: roughly competitive
  • 32.  Built on STM  Heavy use of combinator parsers: “reads like the protocol specification if you squint your eyes” decodeMsg :: Parser Message decodeMsg = do m <- getWord8 case m of 0 -> return Choke 1 -> return Unchoke 2 -> return Interested 3 -> return NotInterested 4 -> Have <$> gw32 5 -> BitField <$> getRestLazyByteString 6 -> Request <$> gw32 <*> gw32 <*> gw32 7 -> Piece <$> gw32 <*> gw32 <*> getRestLazyByteString 8 -> Cancel <$> gw32 <*> gw32 <*> gw32 9 -> Port <$> (fromIntegral <$> getWord16be) _ -> fail "Incorrect message parse" where gw32 = fromIntegral <$> getWord32bea
  • 33.  So far everything is shared memory  Distributed memory has a different cost model  Think message passing…  Think Erlang…
  • 34.  Processes share nothing; independent GC; independent failure  Communicate over channels  Message communication = serialise to bytestream, transmit, deserialise  Comprehensive failure model  A process P can “link to” another Q  If Q crashes, P gets a message  Use this to build process monitoring apparatus  Key to Erlang’s 5-9’s reliability
  • 35.  Provide Erlang as a library – no language extensions needed newChan :: PM (SPort a, RPort a) send :: Serialisable a => SPort a -> a -> PM a receive :: Serialisable a => RPort a -> PM a spawn :: NodeId -> PM a -> PM PId Process Channels May contain many Haskell threads, which share via STM
  • 36.  Many static guarantees for cost model:  (SPort a) is serialisable, but not (RPort a) => you always know where to send your message  (TVar a) not serialisable => no danger of multi-site STM
  • 37. The k-means clustering algorithm takes a set of data points and groups them into clusters by spatial proximity. Initial clusters have After first iteration After third iteration After second iteration random centroids ●Start with Z lots of data points in N-dimensional space ●Randomly choose k points as ”centroid candidates” ●Repeat: 1. For each data point, find the nearerst ”centroid candidate” 2. For each candidate C, find the centroid of all points nearest to C 3. Make those the new centroid candidates, and repeat Converged
  • 38. ●Start with Z lots of data points in N-dimensional space ●Randomly choose k points as ”centroid candidates” ●Repeat: 1. For each data point, find the nearerst ”centroid candidate” 2. For each candidate C, find the centroid of all points nearest to C 3. Make those the new centroid candidates, and repeat if necessary Step 1 MapReduce Step 2 Mapper 1 Step 3 Reducer Mapper 2 1 conver Master … ged? Result Mapper 3 Reducer k Mapper n Running today in Haskell on an Amazon EC2 cluster [current work]
  • 39. Highly concurrent applications are a killer app for Haskell
  • 40. Highly concurrent applications are a killer app for Haskell But wait… didn’t you say that Haskell was a functional language?
  • 41.  Side effects are inconvenient do { v <- readTVar r; writeTVar r (v+1) } vs r++  Result: almost all the code is functional, processing immutable data  Great for avoiding bugs: no aliasing, no race hazards, no cache ping-ponging.  Great for efficiency: only TVar access are tracked by STM
  • 42. Multicore Use Haskell! Semi-implicit parallelism Evaluate pure functions in parallel Modest parallelism Implicit synchronisation Easy to program Slogan: no silver bullet: embrace diversity
  • 43. Place n queens on an n x n board such that no queen attacks any other, horizontally, vertically, or diagonally  Sequential code nqueens :: Int -> [[Int]] nqueens n = subtree n [] subtree :: Int -> [Int] -> [[Int]] subtree 0 b = [b] subtree c b = concat $ map (subtree (c-1)) (children b) children :: [Int] -> [[Int]] children b = [ (q:b) | q <- [1..n], safe q b ]
  • 44. Place n queens on an n x n board such that no queen attacks any other, horizontally, vertically, or diagonally [1,3,1] [1,1] [2,3,1] [2,1] [1] [3,3,1] Start [4,3,1] [3,1] [] here [5,3,1] [4,1] [2] [6,3,1] ... ... ...
  • 45. Place n queens on an n x n board such that no queen attacks any other, horizontally, vertically, or diagonally  Sequential code nqueens :: Int -> [[Int]] nqueens n = subtree n [] subtree :: Int -> [Int] -> [[Int]] subtree 0 b = [b] subtree c b = concat $ map (subtree (c-1)) (children b) children :: [Int] -> [[Int]] children b = [ (q:b) | q <- [1..n], safe q b ]
  • 46. Place n queens on an n x n board such that no queen attacks any other, horizontally, vertically, or diagonally  Parallel code Works on the nqueens :: Int -> [[Int]] sub-trees in nqueens n = subtree n [] parallel subtree :: Int -> [Int] -> [[Int]] subtree 0 b = [b] subtree c b = concat $ parMap (subtree (c-1)) (children b) children :: [Int] -> [[Int]] children b = [ (q:b) | q <- [1..n], safe q b ]  Speedup: 3.5x on 6 cores
  • 47. map :: (a->b) -> [a] -> [b] parMap :: (a->b) -> [a] -> [b] Good things  Parallel program guaranteed not to change the result  Deterministic: same result every run  Very low barrier to entry  “Strategies” to separate algorithm from parallel structure
  • 48. Bad things  Poor cost model; all too easy to fail to evaluate something and lose all parallelism  Not much locality; shared memory  Over-fine granularity can be a big issue Profiling tools can help a lot
  • 49.  As usual, watch out for Amdahl’s law!
  • 50.  Find authentication or secrecy failures in cryptographic protocols. (Famous example: authentication failure in the Needham-Schroeder public key protocol. )  About 6,500 lines of Haskell  “I think it would be moronic to code CPSA in C or Python. The algorithm is very complicated, and the leap between the documented design and the Haskell code is about as small as one can get, because the design is functional.”  One call to parMap  Speedup of 3x on a quad-core --- worthwhile when many problems take 24 hrs to run.
  • 51.  Modest but worthwhile speedups (3-10) for very modest investment  Limited to shared memory; 10’s not 1000’s of processors  You still have to think about a parallel algorithm! (Eg John Ramsdell had to refactor his CPSA algorithm a bit.)
  • 52. Multicore Use Haskell! Data parallelism Operate simultaneously on bulk data Massive parallelism Easy to program Single flow of control Implicit synchronisation Slogan: no silver bullet: embrace diversity
  • 53. Data parallelism The key to using multicores at scale Flat data parallel Nested data parallel Apply sequential Apply parallel operation to bulk data operation to bulk data Very widely used Research project
  • 54. e.g. Fortran(s), *C MPI, map/reduce  The brand leader: widely used, well understood, well supported foreach i in 1..N { ...do something to A[i]... }  BUT: “something” is sequential  Single point of concurrency  Easy to implement: use “chunking”  Good cost model (both granularity and 1,000,000’s of (small) work items P1 P2 P3 locality)
  • 55. Faces are compared by computing a distance 1 R v v dist(A,B)   A B between their multi-region histograms. h h R r1 r r 1  Multi-region histogram for candidate face as an array. A r=1 r=2 r=3 r=4
  • 56. 1 R v v dist(A,B)   v A B h A  replicate h R r1 h r r 1 v A  h zipWith reduce reduce map   v A  v B  1  R v A  v B  h h r 1 h R v B  h 1 h    
  • 57. 1 v v r 1 v R A B dist(A,B)  hr A  replicate h R h r 1 v A  h zipWi reduce reduce map th   v A  v B  1  R v A  v B  h h r 1 h R v B  h 1 h      distances :: Array DIM2 Float -> Array DIM3 Float v  B h -> Array DIM1 Float distances histA histBs = dists where  histAs = replicate (constant (All, All, f)) histA diffs = zipWith (-) histAs histBs l1norm = reduce (¥a b -> abs a + abs b) (0) diffs regSum = reduce (+) (0) l1norm dists = map (/ r) regSum (h, r, f) = shape histBs
  • 58.  Arrays as values: virtually no element-wise programming (for loops).  Think APL, but with much more polymorphism  Performance is (currently) significantly less than C  BUT it auto-parallelises Warning: take all such figures with buckets of salt
  • 59.  GPUs are massively parallel processors, and are rapidly de-specialising from graphics  Idea: your program (when run) generates a GPU program distances :: Acc (Array DIM2 Float) -> Acc (Array DIM3 Float) -> Acc (Array DIM1 Float) distances histA histBs = dists where histAs = replicate (constant (All, All, f)) histA diffs = zipWith (-) histAs histBs l1norm = reduce (¥a b -> abs a + abs b) (0) diffs regSum = reduce (+) (0) l1norm dists = map (/ r) regSum
  • 60.  An (Acc a) is a syntax tree for a program computing a value of type a, ready to be compiled for GPU  The key trick: (+) :: Num a => a –> a -> a distances :: Acc (Array DIM2 Float) -> Acc (Array DIM3 Float) -> Acc (Array DIM1 Float) distances histA histBs = dists where histAs = replicate (constant (All, All, f)) histA diffs = zipWith (-) histAs histBs l1norm = reduce (¥a b -> abs a + abs b) (0) diffs regSum = reduce (+) (0) l1norm dists = map (/ r) regSum
  • 61.  An (Acc a) is a syntax tree for a program computing a value of type a, ready to be compiled for GPU CUDA.run :: Acc (Array a b) -> Array a b  CUDA.run  takes the syntax tree  compiles it to CUDA  loads the CUDA into GPU  marshals input arrays into GPU memory  runs it  marshals the result array back into Haskell memory
  • 62.  The code for Repa (multicore) and Accelerate (GPU) is virtually identical  Only the types change  Other research projects with similar approach  Nicola (Harvard)  Obsidian/Feldspar (Chalmers)  Accelerator (Microsoft .NET)  Recursive islands (MSR/Columbia)
  • 63. Data parallelism The key to using multicores at scale Nested data parallel Apply parallel operation to bulk data Research project
  • 64.  Main idea: allow “something” to be parallel foreach i in 1..N { ...do something to A[i]... }  Now the parallelism structure is recursive, and un-balanced  Much more expressive  Much harder to implement Still 1,000,000’s of (small) work items
  • 65. Nested data Flat data parallel parallel program Compiler program (the one we want (the one we want to write) to run)  Invented by Guy Blelloch in the 1990s  We are now working on embodying it in GHC: Data Parallel Haskell  Turns out to be jolly difficult in practice (but if it was easy it wouldn’t be research). Watch this space.
  • 66.  No single cost model suits all programs / computers. It’s a complicated world. Get used to it.  For concurrent programming, functional programming is already a huge win  For parallel programming at scale, we’re going to end up with data parallel functional programming  Haskell is super-great because it hosts multiple paradigms. Many cool kids hacking in this space.  But other functional programming languages are great too: Erlang, Scala, F#