SlideShare ist ein Scribd-Unternehmen logo
1 von 86
Cluster Computing with
      DryadLINQ
          Mihai Budiu
Microsoft Research, Silicon Valley

  Cloudera, February 12, 2010
Goal




       2
Design Space

Internet




                      Data-
                     parallel



           Shared
Private    memory
 data
center


           Latency                Throughput
                                               3
Data-Parallel Computation

Application
                 SQL       Sawzall     ≈SQL       LINQ, SQL
                          Sawzall    Pig, Hive   DryadLINQ
Language
                                                   Scope
                           Map-                    Dryad
               Parallel              Hadoop
Execution                 Reduce                  Cosmos,
              Databases                          HPC, Azure
                                                  Cosmos
Storage                     GFS       HDFS
                                                   Azure
                          BigTable     S3
                                                 SQL Server


                                                              4
Software Stack
                                      Applications

              Analytics
                              Machine          Data        Optimi-
              SQL      C#     Learning Graphs mining       zation
legacy                                                                              SSIS
 code PSQL      Scope         .Net    Distributed Data Structures
                                                                                SQL
   Distributed Shell                       DryadLINQ                   C++     server

                                             Dryad

  Cosmos FS            Azure XStore      SQL Server        Tidy FS           NTFS

         Cosmos                        Azure XCompute                Windows HPC

  Windows                   Windows                  Windows             Windows
   Server                    Server                   Server              Server
                                                                                           5
•   Introduction
•   Dryad
•   DryadLINQ
•   Building on DryadLINQ
•   Conclusions




                            6
Dryad
•   Continuously deployed since 2006
•   Running on >> 104 machines
•   Sifting through > 10Pb data daily
•   Runs on clusters > 3000 machines
•   Handles jobs with > 105 processes each
•   Platform for rich software ecosystem
•   Used by >> 100 developers

• Written at Microsoft Research, Silicon Valley
                                                  7
Dryad = Execution Layer


Job (application)       Pipeline

     Dryad
                    ≈    Shell

    Cluster             Machine



                                   8
2-D Piping
• Unix Pipes: 1-D
     grep | sed | sort | awk | perl



• Dryad: 2-D
  grep1000 | sed500 | sort1000 | awk500 | perl50




                                                   9
Virtualized 2-D Pipelines




                            10
Virtualized 2-D Pipelines




                            11
Virtualized 2-D Pipelines




                            12
Virtualized 2-D Pipelines




                            13
Virtualized 2-D Pipelines
     • 2D DAG
     • multi-machine
     • virtualized




                            14
Dryad Job Structure

Input           Channels
 files                      Stage                Output
                            sort                  files
         grep                       awk
                      sed                 perl
         grep               sort
                      sed           awk
         grep               sort


           Vertices
          (processes)                              15
Channels
              Finite streams of items
X
              • distributed filesystem files
                      (persistent)
    Items     • SMB/NTFS files
                      (temporary)
              • TCP pipes
M                     (inter-machine)
              • memory FIFOs
                      (intra-machine)

                                               16
Dryad System Architecture
                                    data plane
                        Files, TCP, FIFO, Network
job schedule


                                V             V    V

                    NS,
                                PD            PD   PD
                   Sched

  Job manager   control plane       cluster

                                                        17
Fault Tolerance
Policy Managers
R       R          R           R    Stage R


                           Connection R-X


X        X          X           X
                                    Stage X

                            R-X
    X Manager R manager   Manager
                 Job
               Manager
                                              19
Dynamic Graph Rewriting

 X[0]       X[1]      X[3]   X[2]            X’[2]


                              Slow           Duplicate
        Completed vertices
                              vertex          vertex




Duplication Policy = f(running times, data volumes)
Cluster network topology

                      top-level switch




                      top-of-rack switch




                      rack
Dynamic Aggregation
     S      S           S           S            S     S


                               T
static


  #1S      #2S      #1S            #3S          #3S   #2S


  rack #
                 # 1A       # 2A         # 3A



dynamic                        T                            22
Policy vs. Mechanism

• Application-level      • Built-in
• Most complex in          •   Scheduling
  C++ code                 •   Graph rewriting
• Invoked with upcalls     •   Fault tolerance
• Need good default        •   Statistics and
  implementations              reporting
• DryadLINQ provides
  a comprehensive set
                                                 23
•   Introduction
•   Dryad
•   DryadLINQ
•   Building on DryadLINQ
•   Conclusions




                            24
LINQ => DryadLINQ




    Dryad




                    25
LINQ = .Net+ Queries


Collection<T> collection;
bool IsLegal(Key);
string Hash(Key);

var results = from c in collection
            where IsLegal(c.key)
            select new { Hash(c.key), c.value};
                                                  26
Collections and Iterators
class Collection<T> : IEnumerable<T>;



              public interface IEnumerable<T> {
                     IEnumerator<T> GetEnumerator();
              }

 public interface IEnumerator <T> {
        T Current { get; }
        bool MoveNext();
        void Reset();
 }
                                                   27
DryadLINQ Data Model
Partition                .Net objects




            Collection


                                        28
DryadLINQ = LINQ + Dryad
           Collection<T> collection;
           bool IsLegal(Key k);
           string Hash(Key);
Vertex
code       var results = from c in collection
                        where IsLegal(c.key)
                        select new { Hash(c.key), c.value};             Query
                                                                        plan
                                                                        (Dryad job)
         Data



                                                                   collection

         C#            C#                C#                   C#
                                                                   results
                                                                                  29
Demo




       30
Example: Histogram
public static IQueryable<Pair> Histogram(
   IQueryable<LineRecord> input, int k)
{
  var words = input.SelectMany(x => x.line.Split(' '));
  var groups = words.GroupBy(x => x);
  var counts = groups.Select(x => new Pair(x.Key, x.Count()));
  var ordered = counts.OrderByDescending(x => x.count);
  var top = ordered.Take(k);
  return top;
}
         “A line of words of wisdom”
         [“A”, “line”, “of”, “words”, “of”, “wisdom”]
         [[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]
         [ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]
         [{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]
         [{“of”, 2}, {“A”, 1}, {“line”, 1}]                                 31
Histogram Plan
    SelectMany
           Sort
GroupBy+Select
 HashDistribute
    MergeSort
     GroupBy
       Select
         Sort
         Take
    MergeSort
         Take




                                   32
Map-Reduce in DryadLINQ

public static IQueryable<S> MapReduce<T,M,K,S>(
  this IQueryable<T> input,
         Func<T, IEnumerable<M>> mapper,
         Func<M,K> keySelector,
         Func<IGrouping<K,M>,S> reducer)
{
  var map = input.SelectMany(mapper);
  var group = map.GroupBy(keySelector);
  var result = group.Select(reducer);
  return result;
}



                                                  33
Map-Reduce Plan
                         M                M         M         M              M         M         M    map

                             Q            Q         Q         Q              Q         Q         Q    sort




                                                                                                                         map
                         G1               G1        G1        G1             G1        G1        G1   groupby

M                            R            R         R         R              R         R         R    reduce

                             D            D         D         D              D         D         D    distribute
G




                                                                                                                        partial aggregation
    R                                                                                  MS        MS   mergesort
                         MS                    MS        MS
    X                                                                                  G2        G2   groupby
                         G2                    G2        G2
                             R                 R         R                             R         R    reduce

                             X                 X         X                                            mergesort
                                                                                  MS        MS
            static                   dynamic                       dynamic        G2        G2        groupby




                                                                                                                        reduce
S       S        S       S       S    S                                           R         R         reduce
             A       A       A                                                                        consumer
                                                                                  X         X                      34
                     T
Distributed Sorting Plan

             DS             DS       DS            DS          DS

              H                  H                         H

O             D             D        D                 D       D
    static        dynamic                dynamic

              M                  M                 M       M    M

              S                  S                 S       S    S
                                                                    35
Expectation Maximization




                   • 160 lines
                   • 3 iterations shown




                                36
Probabilistic Index Maps
Images




features
                               37
Language Summary


Where
Select
GroupBy
OrderBy
Aggregate
Join
Apply
Materialize                  38
LINQ System Architecture
      Local machine             Execution engine
                                •LINQ-to-obj
                                •PLINQ
           Query                •LINQ-to-SQL
  .Net                          •LINQ-to-WS
program                LINQ     •DryadLINQ
(C#, VB,             Provider
F#, etc)
                                •Flickr
           Objects              •Oracle
                                •LINQ-to-XML
                                •Your own

                                                   39
The DryadLINQ Provider

             Client machine
                        DryadLINQ
   .Net                                                  Data center

                          Distributed Invoke             Vertex Con-     Input
                                                 Query
ToCollection Query Expr   query plan                      code text      Tables

                                                                   Dryad
                                                 Dryad JM
                                                                 Execution

                           Output
 foreach                    (11)
             .Net Objects DryadTable   Results           Output Tables



                                                                                  40
Combining Query Providers
      Local machine             Execution engines

                       LINQ
                     Provider        PLINQ
           Query
  .Net                 LINQ
                     Provider
                                  SQL Server
program
(C#, VB,               LINQ
                                  DryadLINQ
F#, etc)             Provider
           Objects     LINQ
                                  LINQ-to-obj
                     Provider


                                                    41
Using PLINQ
              Query

           DryadLINQ




Local query

   PLINQ


                                42
Using LINQ to SQL Server
                          Query

                      DryadLINQ




Query     Query   Query     LINQ to SQL    LINQ to SQL



                                   Query          Query


                                                          43
Using LINQ-to-objects

Local machine
                              LINQ to obj

                                   debug
                Query
                      production
                DryadLINQ



Cluster

                                            44
•   Introduction
•   Dryad
•   DryadLINQ
•   Building on/for DryadLINQ
    – System monitoring with Artemis
    – Privacy-preserving query language (PINQ)
    – Machine learning
• Conclusions

                                                 45
Artemis: measuring clusters

                                                       Visualization

                                                    Plug-ins    Statistics


            Cluster             Log collection
  Job
           browser/
browser
           manager               DryadLINQ
                                                               DB
             Cluster/Job State API

 Cosmos                HPC                Azure
 Cluster              Cluster             Cluster



                                                                             46
DryadLINQ job browser




                        47
Automated diagnostics




                        48
Job statistics:
schedule and critical path




                             49
Running time distribution




                            50
Performance counters




                       51
CPU Utilization




                  52
Load imbalance:
rack assignment




                  53
PINQ
Queries
(LINQ)



       Privacy-sensitive
Answer     database


                           54
PINQ = Privacy-Preserving LINQ
• “Type-safety” for privacy
• Provides interface to data that looks very
  much like LINQ.
• All access through the interface gives
  differential privacy.
• Analysts write arbitrary C# code against data
  sets, like in LINQ.
• No privacy expertise needed to produce
  analyses.
• Privacy currency is used to limit per-record
  information released.                           55
Example: search logs mining

// Open sensitive data set with state-of-the-art security
PINQueryable<VisitRecord> visits = OpenSecretData(password);

// Group visits by patient and identify frequent patients.
var patients = visits.GroupBy(x => x.Patient.SSN)
                     .Where(x => x.Count() > 5);

// Map each patient to their post code using their SSN.
var locations = patients.Join(SSNtoPost, x => x.SSN, y => y.SSN,
                             (x,y) => y.PostCode);

// Count post codes containing at least 10 frequent patients.
var activity = locations.GroupBy(x => x)
                         .Where(x => x.Count() > 10);
Visualize(activity); // Who knows what this does???


                                                                   Distribution of queries about “Cricket”

                                                                                                     56
PINQ Download
• Implemented on top of DryadLINQ
• Allows mining very sensitive datasets privately
• Code is available
• http://research.microsoft.com/en-us/projects/PINQ/
• Frank McSherry, Privacy Integrated Queries,
  SIGMOD 2009




                                                       57
Natal Training




                 58
Natal Problem




       • Recognize players from depth map
       • At frame rate
       • Using 15% of one Xbox CPU core

                                        59
Learn from Data


                 Rasterize


                                  Training examples
Motion Capture
                             Machine
(ground truth)               learning


                                    Classifier



                                                      60
Running on Xbox




                  61
Learning from data

                                       Classifier



Training examples   Machine learning

                       DryadLINQ

                         Dryad




                                                    62
Large-Scale Machine Learning
• > 1022 objects
• Sparse, multi-dimensional data structures
• Complex datatypes
      (images, video, matrices, etc.)
• Complex application logic and dataflow
  –   >35000 lines of .Net
  –   140 CPU days
  –   > 105 processes
  –   30 TB data analyzed
  –   140 avg parallelism (235 machines)
  –   300% CPU utilization (4 cores/machine)
                                               63
Highly efficient parallellization




                                    64
•   Introduction
•   Dryad
•   DryadLINQ
•   Building on DryadLINQ
•   Conclusions




                            65
Lessons Learned
• Complete separation of
  storage / execution / language
• Using LINQ +.Net (language integration)
• Static typing
  – No protocol buffers (serialization code)
• Allowing flexible and powerful policies
• Centralized job manager: no replication, no
  consensus, no checkpointing
• Porting (HPC, Cosmos, Azure, SQL Server)
                                                66
Conclusions




  =
                   67




              67
“What’s the point if I can’t have it?”

• Dryad+DryadLINQ available for download
   – Academic license
   – Commercial evaluation license
• Runs on Windows HPC platform
• Dryad is in binary form, DryadLINQ in source
• Requires signing a 3-page licensing agreement
• http://connect.microsoft.com/site/sitehome.aspx?SiteID=891


                                                               68
Backup Slides




                69
What does DryadLINQ do?
 public struct Data { …
   public static int Compare(Data left, Data right);
 }

 Data g = new Data();
 var result = table.Where(s => Data.Compare(s, g) < 0);


                        public static void Read(this DryadBinaryReader reader, out Data obj);
   Data serialization
                        public static int Write(this DryadBinaryWriter writer, Data obj);

        Data factory    public class DryadFactoryType__0 : LinqToDryad.DryadFactory<Data>

                        DryadVertexEnv denv = new DryadVertexEnv(args);
     Channel writer     var dwriter__2 = denv.MakeWriter(FactoryType__0);
     Channel reader     var dreader__3 = denv.MakeReader(FactoryType__0);
                        var source__4 = DryadLinqVertex.Where(dreader__3,
          LINQ code               s => (Data.Compare(s, ((Data)DryadLinqObjectStore.Get(0))) <
Context serialization             ((System.Int32)(0))), false);
                        dwriter__2.WriteItemSequence(source__4);
                                                                                          70
Ongoing Dryad/DryadLINQ Research
•   Performance modeling
•   Scheduling and resource allocation
•   Profiling and performance debugging
•   Incremental computation
•   Hardware acceleration
•   High-level programming abstractions
•   Many domain-specific applications

                                          71
Sample applications written using DryadLINQ           Class
Distributed linear algebra                            Numerical
Accelerated Page-Rank computation                     Web graph
Privacy-preserving query language                     Data mining
Expectation maximization for a mixture of Gaussians   Clustering
K-means                                               Clustering
Linear regression                                     Statistics
Probabilistic Index Maps                              Image processing
Principal component analysis                          Data mining
Probabilistic Latent Semantic Indexing                Data mining
Performance analysis and visualization                Debugging
Road network shortest-path preprocessing              Graph
Botnet detection                                      Data mining
Epitome computation                                   Image processing
Neural network training                               Statistics
Parallel machine learning framework infer.net         Machine learning
Distributed query caching                             Optimization
Image indexing                                        Image processing
                                                                     72
Web indexing structure                                Web graph
Staging
1. Build




     2. Send                           7. Serialize
     .exe                               vertices                                vertex
                                                                                 code

                        5. Generate graph
           JM code
                                                      Cluster
                     6. Initialize vertices           services
     3. Start JM                                                8. Monitor
                                                             Vertex execution
                             4. Query
                         cluster resources
Bibliography
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou
Very Large Databases Conference (VLDB), Auckland, New Zealand, August 23-28 2008

Hunting for problems with Artemis
Gabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises Goldszmidt
USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008

DryadInc: Reusing work in large-scale computations
Lucian Popa, Mihai Budiu, Yuan Yu, and Michael Isard
Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 15, 2009

Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations,
Yuan Yu, Pradeep Kumar Gunda, and Michael Isard,
ACM Symposium on Operating Systems Principles (SOSP), October 2009

Quincy: Fair Scheduling for Distributed Computing Clusters
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg                 74
ACM Symposium on Operating Systems Principles (SOSP), October 2009
Incremental Computation
         …                               Outputs

     Distributed
     Computation

                …                        Inputs
         Append-only data
Goal: Reuse (part of) prior computations to:
- Speed up the current job
- Increase cluster throughput
- Reduce energy and costs
Propose Two Approaches

1. Reuse Identical computations from the past
   (like make or memoization)



2. Do only incremental computation on the new data
   and Merge results with the previous ones
   (like patch)
Context
• Implemented for Dryad
   – Dryad Job = Computational DAG
      • Vertex: arbitrary computation + inputs/outputs
      • Edge: data flows

Simple Example:
                              Outputs
Record Count
                                  Add          A


                                Count     C        C
                                Inputs    I1       I2
                           (partitions)
Identical Computation
Record Count

                               First execution
     Outputs                   DAG
         Add          A


       Count     C        C

       Inputs    I1       I2
  (partitions)
Identical Computation
Record Count

                                       Second execution
     Outputs                           DAG
         Add          A


       Count     C        C    C

       Inputs    I1       I2   I3
  (partitions)

                                    New Input
IDE – IDEntical Computation
Record Count

                                      Second execution
     Outputs                          DAG
         Add          A


       Count     C        C    C

       Inputs
  (partitions)
                 I1       I2   I3   Identical subDAG
Identical Computation
Replace identical computational subDAG with
 edge data cached from previous execution
                                IDE Modified
     Outputs                    DAG
         Add     A


       Count         C

       Inputs        I3      Replaced with
  (partitions)
                             Cached Data
Identical Computation
Replace identical computational subDAG with
 edge data cached from previous execution
                                     IDE Modified
     Outputs                         DAG
         Add        A


       Count               C

       Inputs              I3
  (partitions)
                 Use DAG fingerprints to determine
                 if computations are identical
Semantic Knowledge Can Help

Reuse Output


               A


          C        C
          I1       I2
Semantic Knowledge Can Help

Previous Output
                           A   Merge (Add)



                  A              C
                                 I3
            C         C
             I1       I2

                                       Incremental DAG
Mergeable Computation

User-specified
                              A   Merge (Add)



Automatically        A              C
Inferred                            I3
                C        C
                I1       I2
                                            Automatically
                                            Built
Mergeable Computation
                                           Merge Vertex
Save to Cache
                              A
                                                Incremental DAG –
                                                Remove Old Inputs
                     A                A


                C        C        C        C       C
                I1       I2       I1 Empty I2      I3

Weitere ähnliche Inhalte

Was ist angesagt?

SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014Kyle Bader
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemShuai Yuan
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: CassandraAcunu
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataEnkitec
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackOpenStack_Online
 
Mirantis Folsom Meetup Intro
Mirantis Folsom Meetup IntroMirantis Folsom Meetup Intro
Mirantis Folsom Meetup IntroMirantis
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
No sql & dq2 tracer service
No sql & dq2 tracer serviceNo sql & dq2 tracer service
No sql & dq2 tracer serviceZang Donal
 

Was ist angesagt? (20)

SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
Mirantis Folsom Meetup Intro
Mirantis Folsom Meetup IntroMirantis Folsom Meetup Intro
Mirantis Folsom Meetup Intro
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
No sql & dq2 tracer service
No sql & dq2 tracer serviceNo sql & dq2 tracer service
No sql & dq2 tracer service
 
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 

Andere mochten auch

Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
Uptime Institute Fall 2008 EPO alternatives
Uptime Institute Fall 2008 EPO alternatives Uptime Institute Fall 2008 EPO alternatives
Uptime Institute Fall 2008 EPO alternatives Matt Brown
 
Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011JubatusOfficial
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
S3E : Wabbit the Rabbit
S3E : Wabbit the RabbitS3E : Wabbit the Rabbit
S3E : Wabbit the Rabbitsumawesomeness
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and InvocationArvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldArvind Surve
 
Vowpal Wabbit
Vowpal WabbitVowpal Wabbit
Vowpal Wabbitodsc
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarArvind Surve
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldChester Chen
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMateusz Dymczyk
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLAdam Gibson
 
Preview MOA Campaign Communications Plan Book in Full Screen
Preview MOA Campaign Communications Plan Book in Full ScreenPreview MOA Campaign Communications Plan Book in Full Screen
Preview MOA Campaign Communications Plan Book in Full Screenkuznetsova86
 

Andere mochten auch (20)

Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
Uptime Institute Fall 2008 EPO alternatives
Uptime Institute Fall 2008 EPO alternatives Uptime Institute Fall 2008 EPO alternatives
Uptime Institute Fall 2008 EPO alternatives
 
Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
S3E : Wabbit the Rabbit
S3E : Wabbit the RabbitS3E : Wabbit the Rabbit
S3E : Wabbit the Rabbit
 
Storm
StormStorm
Storm
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 
Vowpal Wabbit
Vowpal WabbitVowpal Wabbit
Vowpal Wabbit
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
GeeCON Prague 2015
GeeCON Prague 2015GeeCON Prague 2015
GeeCON Prague 2015
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
What is jubatus (short)
What is jubatus (short)What is jubatus (short)
What is jubatus (short)
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Preview MOA Campaign Communications Plan Book in Full Screen
Preview MOA Campaign Communications Plan Book in Full ScreenPreview MOA Campaign Communications Plan Book in Full Screen
Preview MOA Campaign Communications Plan Book in Full Screen
 

Ähnlich wie Cluster Computing with Dryad

Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Eugene
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型wang xing
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Oracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and HowOracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and HowSeth Miller
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 

Ähnlich wie Cluster Computing with Dryad (20)

Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Oracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and HowOracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and How
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Cluster Computing with Dryad

  • 1. Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Cloudera, February 12, 2010
  • 2. Goal 2
  • 3. Design Space Internet Data- parallel Shared Private memory data center Latency Throughput 3
  • 4. Data-Parallel Computation Application SQL Sawzall ≈SQL LINQ, SQL Sawzall Pig, Hive DryadLINQ Language Scope Map- Dryad Parallel Hadoop Execution Reduce Cosmos, Databases HPC, Azure Cosmos Storage GFS HDFS Azure BigTable S3 SQL Server 4
  • 5. Software Stack Applications Analytics Machine Data Optimi- SQL C# Learning Graphs mining zation legacy SSIS code PSQL Scope .Net Distributed Data Structures SQL Distributed Shell DryadLINQ C++ server Dryad Cosmos FS Azure XStore SQL Server Tidy FS NTFS Cosmos Azure XCompute Windows HPC Windows Windows Windows Windows Server Server Server Server 5
  • 6. Introduction • Dryad • DryadLINQ • Building on DryadLINQ • Conclusions 6
  • 7. Dryad • Continuously deployed since 2006 • Running on >> 104 machines • Sifting through > 10Pb data daily • Runs on clusters > 3000 machines • Handles jobs with > 105 processes each • Platform for rich software ecosystem • Used by >> 100 developers • Written at Microsoft Research, Silicon Valley 7
  • 8. Dryad = Execution Layer Job (application) Pipeline Dryad ≈ Shell Cluster Machine 8
  • 9. 2-D Piping • Unix Pipes: 1-D grep | sed | sort | awk | perl • Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50 9
  • 14. Virtualized 2-D Pipelines • 2D DAG • multi-machine • virtualized 14
  • 15. Dryad Job Structure Input Channels files Stage Output sort files grep awk sed perl grep sort sed awk grep sort Vertices (processes) 15
  • 16. Channels Finite streams of items X • distributed filesystem files (persistent) Items • SMB/NTFS files (temporary) • TCP pipes M (inter-machine) • memory FIFOs (intra-machine) 16
  • 17. Dryad System Architecture data plane Files, TCP, FIFO, Network job schedule V V V NS, PD PD PD Sched Job manager control plane cluster 17
  • 19. Policy Managers R R R R Stage R Connection R-X X X X X Stage X R-X X Manager R manager Manager Job Manager 19
  • 20. Dynamic Graph Rewriting X[0] X[1] X[3] X[2] X’[2] Slow Duplicate Completed vertices vertex vertex Duplication Policy = f(running times, data volumes)
  • 21. Cluster network topology top-level switch top-of-rack switch rack
  • 22. Dynamic Aggregation S S S S S S T static #1S #2S #1S #3S #3S #2S rack # # 1A # 2A # 3A dynamic T 22
  • 23. Policy vs. Mechanism • Application-level • Built-in • Most complex in • Scheduling C++ code • Graph rewriting • Invoked with upcalls • Fault tolerance • Need good default • Statistics and implementations reporting • DryadLINQ provides a comprehensive set 23
  • 24. Introduction • Dryad • DryadLINQ • Building on DryadLINQ • Conclusions 24
  • 25. LINQ => DryadLINQ Dryad 25
  • 26. LINQ = .Net+ Queries Collection<T> collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 26
  • 27. Collections and Iterators class Collection<T> : IEnumerable<T>; public interface IEnumerable<T> { IEnumerator<T> GetEnumerator(); } public interface IEnumerator <T> { T Current { get; } bool MoveNext(); void Reset(); } 27
  • 28. DryadLINQ Data Model Partition .Net objects Collection 28
  • 29. DryadLINQ = LINQ + Dryad Collection<T> collection; bool IsLegal(Key k); string Hash(Key); Vertex code var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; Query plan (Dryad job) Data collection C# C# C# C# results 29
  • 30. Demo 30
  • 31. Example: Histogram public static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k) { var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top; } “A line of words of wisdom” [“A”, “line”, “of”, “words”, “of”, “wisdom”] [[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]] [ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}] 31
  • 32. Histogram Plan SelectMany Sort GroupBy+Select HashDistribute MergeSort GroupBy Select Sort Take MergeSort Take 32
  • 33. Map-Reduce in DryadLINQ public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input, Func<T, IEnumerable<M>> mapper, Func<M,K> keySelector, Func<IGrouping<K,M>,S> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result; } 33
  • 34. Map-Reduce Plan M M M M M M M map Q Q Q Q Q Q Q sort map G1 G1 G1 G1 G1 G1 G1 groupby M R R R R R R R reduce D D D D D D D distribute G partial aggregation R MS MS mergesort MS MS MS X G2 G2 groupby G2 G2 G2 R R R R R reduce X X X mergesort MS MS static dynamic dynamic G2 G2 groupby reduce S S S S S S R R reduce A A A consumer X X 34 T
  • 35. Distributed Sorting Plan DS DS DS DS DS H H H O D D D D D static dynamic dynamic M M M M M S S S S S 35
  • 36. Expectation Maximization • 160 lines • 3 iterations shown 36
  • 39. LINQ System Architecture Local machine Execution engine •LINQ-to-obj •PLINQ Query •LINQ-to-SQL .Net •LINQ-to-WS program LINQ •DryadLINQ (C#, VB, Provider F#, etc) •Flickr Objects •Oracle •LINQ-to-XML •Your own 39
  • 40. The DryadLINQ Provider Client machine DryadLINQ .Net Data center Distributed Invoke Vertex Con- Input Query ToCollection Query Expr query plan code text Tables Dryad Dryad JM Execution Output foreach (11) .Net Objects DryadTable Results Output Tables 40
  • 41. Combining Query Providers Local machine Execution engines LINQ Provider PLINQ Query .Net LINQ Provider SQL Server program (C#, VB, LINQ DryadLINQ F#, etc) Provider Objects LINQ LINQ-to-obj Provider 41
  • 42. Using PLINQ Query DryadLINQ Local query PLINQ 42
  • 43. Using LINQ to SQL Server Query DryadLINQ Query Query Query LINQ to SQL LINQ to SQL Query Query 43
  • 44. Using LINQ-to-objects Local machine LINQ to obj debug Query production DryadLINQ Cluster 44
  • 45. Introduction • Dryad • DryadLINQ • Building on/for DryadLINQ – System monitoring with Artemis – Privacy-preserving query language (PINQ) – Machine learning • Conclusions 45
  • 46. Artemis: measuring clusters Visualization Plug-ins Statistics Cluster Log collection Job browser/ browser manager DryadLINQ DB Cluster/Job State API Cosmos HPC Azure Cluster Cluster Cluster 46
  • 49. Job statistics: schedule and critical path 49
  • 54. PINQ Queries (LINQ) Privacy-sensitive Answer database 54
  • 55. PINQ = Privacy-Preserving LINQ • “Type-safety” for privacy • Provides interface to data that looks very much like LINQ. • All access through the interface gives differential privacy. • Analysts write arbitrary C# code against data sets, like in LINQ. • No privacy expertise needed to produce analyses. • Privacy currency is used to limit per-record information released. 55
  • 56. Example: search logs mining // Open sensitive data set with state-of-the-art security PINQueryable<VisitRecord> visits = OpenSecretData(password); // Group visits by patient and identify frequent patients. var patients = visits.GroupBy(x => x.Patient.SSN) .Where(x => x.Count() > 5); // Map each patient to their post code using their SSN. var locations = patients.Join(SSNtoPost, x => x.SSN, y => y.SSN, (x,y) => y.PostCode); // Count post codes containing at least 10 frequent patients. var activity = locations.GroupBy(x => x) .Where(x => x.Count() > 10); Visualize(activity); // Who knows what this does??? Distribution of queries about “Cricket” 56
  • 57. PINQ Download • Implemented on top of DryadLINQ • Allows mining very sensitive datasets privately • Code is available • http://research.microsoft.com/en-us/projects/PINQ/ • Frank McSherry, Privacy Integrated Queries, SIGMOD 2009 57
  • 59. Natal Problem • Recognize players from depth map • At frame rate • Using 15% of one Xbox CPU core 59
  • 60. Learn from Data Rasterize Training examples Motion Capture Machine (ground truth) learning Classifier 60
  • 62. Learning from data Classifier Training examples Machine learning DryadLINQ Dryad 62
  • 63. Large-Scale Machine Learning • > 1022 objects • Sparse, multi-dimensional data structures • Complex datatypes (images, video, matrices, etc.) • Complex application logic and dataflow – >35000 lines of .Net – 140 CPU days – > 105 processes – 30 TB data analyzed – 140 avg parallelism (235 machines) – 300% CPU utilization (4 cores/machine) 63
  • 65. Introduction • Dryad • DryadLINQ • Building on DryadLINQ • Conclusions 65
  • 66. Lessons Learned • Complete separation of storage / execution / language • Using LINQ +.Net (language integration) • Static typing – No protocol buffers (serialization code) • Allowing flexible and powerful policies • Centralized job manager: no replication, no consensus, no checkpointing • Porting (HPC, Cosmos, Azure, SQL Server) 66
  • 67. Conclusions = 67 67
  • 68. “What’s the point if I can’t have it?” • Dryad+DryadLINQ available for download – Academic license – Commercial evaluation license • Runs on Windows HPC platform • Dryad is in binary form, DryadLINQ in source • Requires signing a 3-page licensing agreement • http://connect.microsoft.com/site/sitehome.aspx?SiteID=891 68
  • 70. What does DryadLINQ do? public struct Data { … public static int Compare(Data left, Data right); } Data g = new Data(); var result = table.Where(s => Data.Compare(s, g) < 0); public static void Read(this DryadBinaryReader reader, out Data obj); Data serialization public static int Write(this DryadBinaryWriter writer, Data obj); Data factory public class DryadFactoryType__0 : LinqToDryad.DryadFactory<Data> DryadVertexEnv denv = new DryadVertexEnv(args); Channel writer var dwriter__2 = denv.MakeWriter(FactoryType__0); Channel reader var dreader__3 = denv.MakeReader(FactoryType__0); var source__4 = DryadLinqVertex.Where(dreader__3, LINQ code s => (Data.Compare(s, ((Data)DryadLinqObjectStore.Get(0))) < Context serialization ((System.Int32)(0))), false); dwriter__2.WriteItemSequence(source__4); 70
  • 71. Ongoing Dryad/DryadLINQ Research • Performance modeling • Scheduling and resource allocation • Profiling and performance debugging • Incremental computation • Hardware acceleration • High-level programming abstractions • Many domain-specific applications 71
  • 72. Sample applications written using DryadLINQ Class Distributed linear algebra Numerical Accelerated Page-Rank computation Web graph Privacy-preserving query language Data mining Expectation maximization for a mixture of Gaussians Clustering K-means Clustering Linear regression Statistics Probabilistic Index Maps Image processing Principal component analysis Data mining Probabilistic Latent Semantic Indexing Data mining Performance analysis and visualization Debugging Road network shortest-path preprocessing Graph Botnet detection Data mining Epitome computation Image processing Neural network training Statistics Parallel machine learning framework infer.net Machine learning Distributed query caching Optimization Image indexing Image processing 72 Web indexing structure Web graph
  • 73. Staging 1. Build 2. Send 7. Serialize .exe vertices vertex code 5. Generate graph JM code Cluster 6. Initialize vertices services 3. Start JM 8. Monitor Vertex execution 4. Query cluster resources
  • 74. Bibliography Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007 DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou Very Large Databases Conference (VLDB), Auckland, New Zealand, August 23-28 2008 Hunting for problems with Artemis Gabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises Goldszmidt USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008 DryadInc: Reusing work in large-scale computations Lucian Popa, Mihai Budiu, Yuan Yu, and Michael Isard Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 15, 2009 Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations, Yuan Yu, Pradeep Kumar Gunda, and Michael Isard, ACM Symposium on Operating Systems Principles (SOSP), October 2009 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg 74 ACM Symposium on Operating Systems Principles (SOSP), October 2009
  • 75. Incremental Computation … Outputs Distributed Computation … Inputs Append-only data Goal: Reuse (part of) prior computations to: - Speed up the current job - Increase cluster throughput - Reduce energy and costs
  • 76. Propose Two Approaches 1. Reuse Identical computations from the past (like make or memoization) 2. Do only incremental computation on the new data and Merge results with the previous ones (like patch)
  • 77. Context • Implemented for Dryad – Dryad Job = Computational DAG • Vertex: arbitrary computation + inputs/outputs • Edge: data flows Simple Example: Outputs Record Count Add A Count C C Inputs I1 I2 (partitions)
  • 78. Identical Computation Record Count First execution Outputs DAG Add A Count C C Inputs I1 I2 (partitions)
  • 79. Identical Computation Record Count Second execution Outputs DAG Add A Count C C C Inputs I1 I2 I3 (partitions) New Input
  • 80. IDE – IDEntical Computation Record Count Second execution Outputs DAG Add A Count C C C Inputs (partitions) I1 I2 I3 Identical subDAG
  • 81. Identical Computation Replace identical computational subDAG with edge data cached from previous execution IDE Modified Outputs DAG Add A Count C Inputs I3 Replaced with (partitions) Cached Data
  • 82. Identical Computation Replace identical computational subDAG with edge data cached from previous execution IDE Modified Outputs DAG Add A Count C Inputs I3 (partitions) Use DAG fingerprints to determine if computations are identical
  • 83. Semantic Knowledge Can Help Reuse Output A C C I1 I2
  • 84. Semantic Knowledge Can Help Previous Output A Merge (Add) A C I3 C C I1 I2 Incremental DAG
  • 85. Mergeable Computation User-specified A Merge (Add) Automatically A C Inferred I3 C C I1 I2 Automatically Built
  • 86. Mergeable Computation Merge Vertex Save to Cache A Incremental DAG – Remove Old Inputs A A C C C C C I1 I2 I1 Empty I2 I3

Hinweis der Redaktion

  1. Enable any programmer to write and run applications on small and large computer clusters.
  2. Dryad is optimized for: throughput, data-parallel computation, in a private data-center.
  3. In the same way as the Unix shell does not understand the pipeline running on top, but manages its execution (i.e., killing processes when one exits), Dryad does not understand the job running on top.
  4. Dryad is a generalization of the Unix piping mechanism: instead of uni-dimensional (chain) pipelines, it provides two-dimensional pipelines. The unit is still a process connected by a point-to-point channel, but the processes are replicated.
  5. This is a possible schedule of a Dryad job using 2 machines.
  6. The Unix pipeline is generalized 3-ways:2D instead of 1D spans multiple machines resources are virtualized: you can run the same large job on many or few machines
  7. This is the basic Dryad terminology.
  8. Channels are very abstract, enabling a variety of transport mechanisms.The performance and fault-tolerance of these machanisms vary widely.
  9. The brain of a Dryad job is a centralizedJob Manager, which maintains a complete state of the job.The JM controls the processes running on a cluster, but never exchanges data with them.(The data plane is completely separated from the control plane.)
  10. Vertex failures and channel failures are handled differently.
  11. The handling of apparently very slow computation by duplication of vertices is handled by a stage manager.
  12. Aggregating data with associative operators can be done in a bandwidth-preserving fashion in the intermediate aggregations are placed close to the source data.
  13. DryadLINQ adds a wealth of features on top of plain Dryad.
  14. Language Integrated Query is an extension of.Net which allows one to write declarative computations on collections (green part).
  15. DryadLINQ translates LINQ programs into Dryad computations:- C# and LINQ data objects become distributed partitioned files. - LINQ queries become distributed Dryad jobs. -C# methods become code running on the vertices of a Dryad job.
  16. More complicated, even iterative algorithms, can be implemented.
  17. At the bottom DryadLINQ uses LINQ to run the computation in parallel on multiple cores.
  18. Image from http://r24085.ovh.net/images/Gallery/depthMap-small.jpg
  19. We believe that Dryad and DryadLINQ are a great foundation for cluster computing.
  20. Computation Staging