SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Scaling structured prediction
                Tommi Jaakkola
                    MIT

              in collaboration with
     M. Collins, M. Fromer, T. Hazan, T. Koo,
         O. Meshi, A. Rush, D. Sontag
Structured prediction
• Natural language processing
  - e.g., tagging, morphology segmentation, dependency parsing
• Computer vision
  - e.g., segmentation, stereo reconstruction, object recognition
• Computational biology
  - e.g., molecular structure prediction, pathway reconstruction
• Robotics
  - e.g., imitation learning, inverse kinematics
• Human-computer interaction
  - e.g., interface alignment, example based designs
• etc.
Structured prediction
• The goal is to learn a mapping from input examples (x)
 to complex objects (y)
 - e.g., from sentences (x) to dependency parses (y)

 y=
 x=   *   John   saw   a   movie   yesterday   that   he   liked
0.4                               0.4

                                                                  = x) ✓
                                         • We’d like to learn these      =
                                                          s(y; x)s(y; func
       became su⇥ciently high do we find cI2 at the mutated Acknowledgments
                                                                                                    0.4                           0.4                                   0.4




        Binding Freq




                                          Binding Freq




                                                                                     Binding Freq




                                                                                                                   Binding Freq




                                                                                                                                                         Binding Freq




                                                                                                                                                                                              Binding Freq
       OR 1 as well. Note, however, that cI2 inhibits transcrip-
          0.2               0.2                      0.2            0.2                         0.2


            0
                             !2          0                   Structured prediction
       tion at OR 3 prior 0to occupying OR 1. 0 Thus the binding This work was supported in
                                                               !2
                                                                      0
                                                                        2    0

       at the mutated ORRNAcould not beRNA 10 10 without10in- and by NSF 10 10 grant 10
           10
                     frepressor/f 1
                     10      10     10
                                       frepressor/f observed
                                       10                    10
                                                             frepressor/fRNA
                                                                              10
                                                                                                  0

                                                                                 frepressor/fRNA ITR
                                                                                 10
                                                                                                          !2

                                                                                                       f042
                                                                                                        repre
                                                                                                               2   0                    !2   2   0                            !2   2
                                                                                                                                                                                       f      0




       terventions.                                                          “Fundaci´n Rafael del Pino
                                                                                          o
                                  • The(a) O 3is to learn a3mapping from input examples (x)
                                         goal         (a) O
                                                         R            (b) O 2    (b) O 2
                                                                                 R
                                                                                        50           50
                                                                                                                                  R                  R                                     (c) O
                                   to complex objects (y)
                                                                    x
  Figure• Predictions are again are Ragain and mutated OR 1 for increasin
         3: Predicted protein binding to sites O 3, OR 2, qualitatively mutated OR
                Figure• Predictions qualitatively R 3, OR 2, and correct
                       3: Predicted protein binding to sites O correct
                                   - e.g., from pairs of images (x) to disparity maps (y)
                                                                                    References

                                  • We’d•ylike tolike to these functio
                                           We’d learn learn these f
 ame 7 Discussion we find cI2 do we find cI2 at Acknowledgments
             became su⇥ciently high at the mutated the mutated Acknowledgm
       su⇥ciently high do
1 as well. Note, as well. Note, cI2 inhibits transcrip-
             OR 1 however, that however, that cI2 inhibits transcrip-
                    y =3 priorO 1. Thus the binding This binding This work was supp
n at OR 3 prior to occupying to occupying OR 1. Thus the work was supported in in pha
             tion at OR
                                                                                     [1] Adam Arkin, John Ro
                                                                                            Stochastic kinetic ana
                                                                                            way bifurcation part b
      We believe the game theoretic approach provides a com-
                                 R
the mutatedcausalcould notOR 1observed without in- and by NSF ITR Genetics, 149:16
      pelling
             at the mutated beof could not be observed without in- andgrant 0428715.g
               OR 1 abstraction biological systems with re-                                 cells. by NSF ITR
ventions. terventions. The model is complete with prov-
                                                 Art                   “Fundaci´n Rafael del Pino” Fello
                                                                        Books          o       “Fundaci´n Rafael
                                                                                                Dolls          o Laundry
      source constraints.                   Figure 2. The six datasets used in this paper. Shown is the left image of each pair
                                                                                     [2] Kenneth J. Arrow and
      ably convergent algorithms for finding equilibria on a
                    x=
      genome-wide scale.                 x                                  [Scharstein & Pal 07, Mid
                                                                             x              an equilibrium for a c
                                       have an MPE estimate from running graph cuts we use it
                                       to compute our expectation Referencesthe em-            References
                                                                                            metrica, 22(3):265–290
                                                                       in a manner similar to
         The results from the small scale distribution. Training a en-
                                       pirical application are lattice-structured model us-
   DiscussionOur model successfully approach described here is Adam Arkin, John Ross, and
      couraging. Discussion
             7                         ing the reproduces known thus [3] Z. Bar-Joseph, G. G
                                                                         [1]          a generalization Adam Arkin, J
                                                                                                 [1] of
      behavior of the
                                           y                                 y
                                       Viterbi path-based methods described in [32]. For our learn-
                           switch on ing experiments of use straightforward gradient-based up- B. Gordon
                                        the basis we molecularStochastic Yoo,               J. kinetic analysis o
      level competition and resource constraints, al.,learning rate. way bifurcation in phage -in
                                    (Scharsteinvariable ’07) the
                                       dates with a et without
  believe theWe believe the game theoretic approach provides a com-
              game theoretic approach provides a com-
                                                                                                        Stochastic kin
                                                                                            T. Jaakkola,bifurcatio
                                                                                                        way R. Youn
      need to assume protein-protein interactions between cI2cells. Genetics, discovery of gtationalcells. Genetics
                                                                                                          149:1633–164
ling causal abstraction of abstraction of biologicalre-
             pelling causal biological systems withArtsystems with re-
      dimers and cI2 and RNA-polymerase. EvenBooksthe con- Booksnetworks. Laundry Biot
                                     Art Datasets
                                       4.                    in                      Dolls                   Nature
                                                                                                              Dolls
 rce constraints. The model is Figure 2.model is Figure 2. The six with prov-this paper. Shown is theand the cor
             source constraints. The The six with complete datasets used the left image of each pair left image
                                  complete datasets prov-this paper. Shown is in
                                                          used in
      text of this well-known sub-system, however, few quan-Kenneth J.[2] KennethGerard
                                                                         [2]                2003. Arrow and J. Ar
y convergent algorithms for algorithms for finding a
             ably convergent finding equilibria on equilibria on a
Structured prediction
                     • The goal is to learn a mapping from input examples (x)
                        to complex objects (y)
                         - e.g., from pairs of web pages (x) to their alignments (y)

Structured-Prediction Algorithm for Example-Based Web Design
                                 y=                          semantic alignment
Ranjitha Kumar          Jerry O. Talton       Salman Ahmad     Scott R Klemmer
                                   Stanford University⇤




a corpus of design examples unparalleled
 ver, leveraging existing designs to pro-
                                   x=
ntly difficult. This paper introduces the
utomatically transferring design and con-
 Bricolage introduces a novel structured-
  learns to create coherent mappings be-
 n human-generated exemplars. The pro-
 be used to automatically transfer the con-
he style and layout of another. We show
o accurately reproduce human page map-                       (Kumar et al., ’10)
s a general, efficient, and automatic tech-
ent between a variety of real Web pages.

N
rely on examples for inspiration [Herring
es can facilitate better design work [Lee
Structured prediction
• Natural language processing
  - e.g., tagging, morphology segmentation, dependency parsing
• Computer vision
  - e.g., segmentation, stereo reconstruction, object recognition
• Computational biology
  - e.g., molecular structure prediction, pathway reconstruction
• Robotics
  - e.g., imitation learning, inverse kinematics
• Human-computer interaction
  - e.g., interface alignment, example based designs
• etc.
Goals and challenges
• Goals
 - use rich classes of output structures
 - exercise fine control of how structures are chosen (scoring)
 - learn models efficiently from data
• Challenges
 - prediction problems are often provably hard
 - most learning algorithms rely on explicit predictions and are
   therefore inefficient with large amounts of data
 - richer structures lead to ambiguity
Structured prediction
• The goal is to learn a mapping from input examples (x)
 to complex objects (y)
 - e.g., from sentences (x) to dependency parses (y)

 y=
 x=   *   John   saw   a   movie    yesterday   that   he   liked


 - in lexicalized dependency parsing, we draw an arc from the
   head word of each phrase to words that modify it
 - the resulting parse is a directed tree. In many languages, the
   tree is non-projective (crossing arcs)
 - each sentence is mapped to arc scores; the parse is obtained
   as the highest scoring directed tree
Structured prediction
• The goal is to learn a mapping from sentences (x) to
    dependency parses (y)

    y=
    x=      *   John   saw       a   movie    yesterday   that   he   liked
     i=0         1     2                                               n
                                       y(i, j) = 1 if arc i ! j is selected
n
                                       and zero otherwise




2
1
    ⇤ 1 2                    n
Structured prediction
• The goal is to learn a mapping from sentences (x) to
    dependency parses (y)

    y=
    x=      *   John   saw       a    movie     yesterday   that   he   liked
     i=0         1     2                                                  n
                                         y(i, j) = 1 if arc i ! j is selected
n
                                         and zero otherwise

                                          x ! w · f (x; i, j) = ✓(i, j)
                                     sentence          features    arc scores
                                            parameters
2
1
    ⇤ 1 2                    n
Structured prediction
• The goal is to learn a mapping from sentences (x) to
    dependency parses (y)

    y=
    x=      *   John   saw       a    movie     yesterday     that    he    liked
     i=0         1     2                                                      n
                                         y(i, j) = 1 if arc i ! j is selected
n
                                         and zero otherwise

                                          x ! w · f (x; i, j) = ✓(i, j)
                                     sentence          features        arc scores
                                            parameters
                                                      ⇢X
2
1                                    y ⇤ = argmax                 y(i, j)✓(i, j) + ✓T (y)
                                                  y
    ⇤ 1 2                    n                              i,j
                                              highest scoring tree
Structured prediction
 y=
 x=   *   John   saw         a         movie   yesterday   that   he   liked
   i=0     1     2                                                      n

• The complexity of the prediction task depends on how
  we score each candidate tree
• In an arc factoredX
                    model (as before) each arc is scored
  separately          y(i, j)✓(i, j)
                       i,j
• The highest scoring tree is found as the maximum
 weighted directed spanning tree
                       ⇢X
      y ⇤ = argmax                     y(i, j)✓(i, j) + ✓T (y)
                  y
                                 i,j
Structured prediction
 y=
 x=   *   John   saw         a         movie   yesterday   that   he   liked
   i=0     1     2                                                      n

• The complexity of the prediction task depends on how
  we score each candidate tree
• In an arc factoredX
                    model (as before) each arc is scored
  separately          y(i, j)✓(i, j)
                       i,j
• The highest scoring tree is found as the maximum
 weighted directed spanning tree
                       ⇢X
      y ⇤ = argmax                     y(i, j)✓(i, j) + ✓T (y)
                  y
                                 i,j
Structured prediction
 y=
 x=   *   John    saw   a   movie   yesterday   that   he   liked
   i=0        1   2                                          n

• The complexity of the prediction task depends on how
  we score each candidate tree
• It is often advantageous to include interactions between
  modifiers (outgoing arcs) known as “sibling scoring”
          X
              ✓i (y|i ), where y|i = { y(i, j), j 6= i }
          i
Structured prediction
 y=
 x=   *   John    saw       a   movie    yesterday   that   he   liked
   i=0        1   2                                               n

• The complexity of the prediction task depends on how
  we score each candidate tree
• It is often advantageous to include interactions between
  modifiers (outgoing arcs) known as “sibling scoring”
          X
              ✓i (y|i ), where y|i = { y(i, j), j 6= i }
          i

• Finding the highest scoring tree is now NP-hard
 (McDonald and Satta,⇢2007)
                                X
          y ⇤ = argmax               ✓(y|i ) + ✓T (y)
                        y
                                 i
Decomposition
       *     John   saw      a   movie     yesterday   that   he   liked
  i=0          1     2                                               n

              ✓T (y) directed tree
                     arc factored scores




                     ...
✓0(y|0) ✓2(y|2)            ✓n(y|n)
   modifiers (outgoing arcs) solved
     separately for each word

  • We can always turn a hard problem into an easy one by
    solving each “part” separately from others
  • But the parts are unlikely to agree on a solution ...
Dual decomposition
       *     John   saw      a   movie     yesterday   that      he    liked
  i=0          1     2                                                   n
                                                                         X
              ✓T (y) directed tree                            ✓T (y) +         y(i, j) (i, j)
                     arc factored scores                                 i,j

                                                 effective arc
                                                  agreement


                     ...                                X              ...
✓0(y|0) ✓2(y|2)            ✓n(y|n)           ✓i(y|i)           y(i, j) (i, j)
   modifiers (outgoing arcs) solved                      j6=i
     separately for each word

  • We can encourage parts to agree on the maximizing
    arcs via Lagrange multipliers (c.f. Guignard, Fisher, ‘80s)
Dual decomposition algorithm
• An iterative sub-gradient algorithm (Koo et al., 2010)

   *    John     saw    a    movie    yesterday     that      he      liked

       find a directed spanning tree           X
               ˆ
               y = argmax ✓T (y) +                       y(i, j) (i, j)
                               y
                                                  i,j
       find modifiers of each word                  X
           ˆ0
           y|i = argmax ✓i(y|i)                          y(i, j) (i, j)
                              y|i
                                                  j6=i
        update Lagrange multipliers based on disagreement
               (i, j)       (i, j) + ↵k y 0(i, j)
                                        ˆ                  y (i, j)
                                                           ˆ
Dual decomposition algorithm
• An iterative sub-gradient algorithm (Koo et al., 2010)

   *    John     saw    a    movie    yesterday     that      he      liked

       find a directed spanning tree           X
               ˆ
               y = argmax ✓T (y) +                       y(i, j) (i, j)
                               y
                                                  i,j
       find modifiers of each word                  X
           ˆ0
           y|i = argmax ✓i(y|i)                          y(i, j) (i, j)
                              y|i
                                                  j6=i
        update Lagrange multipliers based on disagreement
               (i, j)       (i, j) + ↵k y 0(i, j)
                                        ˆ                  y (i, j)
                                                           ˆ


• Thm: The solution is optimal if an agreement (no
 updates) is reached
Dual decomposition in practice
nvergence shows the percentage of test cases where the
    • The table
        sub-gradient algorithm quickly finds the optimal solution
                                    CertS     CertG
                          Dan       99.07     98.45
                          Dut       98.19     97.93
                           Por      99.65     99.31
                           Slo      90.55     95.27
                          Swe       98.71     98.97
                           Tur      98.72     99.04
                              1
                          Eng       98.65     99.18
                          Eng2      98.96     99.12
                          Dan       98.50     98.50
                          Dut       98.00     99.50
Goals and challenges
 • Goals
   - use rich classes of output structures
   - exercise fine control of how structures are chosen (scoring)
   - learn models efficiently from data
 • Challenges
X - prediction problems may be provably hard but we can solve
     practical instances effectively with decomposition methods
   - most learning algorithms rely on explicit predictions and are
     therefore inefficient with large amounts of data
   - richer structures lead to ambiguity
Learning to predict
  • We’d like to estimate the score functions from data
       such that
                                     ⇢
              y (i) ⇠ argmax w · f (x(i) , y) ,
                    =                                             i = 1, . . . , n
                             y2Y
                                         parameterized
                                             scores

       - e.g., lexicalized dependency parsing

 (1)                                        y (2)                                             ...
y
            John   saw   a   movie          x (2)   *    kids   make   nutritious    snacks   ...
x(1)    *
• We can0#,12(#' the equilibriumfindthe gamecI2
                      find                        of           (binding the game cI2
                                                                     RNAp
                                                                       frequencies) ••Prediction is often maximizingmaxMRF
                                                                           ••Prediction is often done by done by maximizing
                                                                                      Prediction is often maximizing an MRF
                                                                             Prediction is often done by done by maximizing
                                                                                                  RNAp
                                                                                                                           m
                                                                                                                      an s(y;
                                                                                                                                                       1                                                                                                                                                  1                                                                                                                                     1




                                                                                                                    Binding Frequency (time!average




                                                                                                                                                                                                                                                                       Binding Frequency (time!average




                                                                                                                                                                                                                                                                                                                                                                                                     Binding Frequency (time!average
                                                                                        cI2 0.8 5&1-6#
                                                                                                    RNAp




                            Binding Frequency (time!averag
                                  • We can0#,12(#' the equilibrium of RNAp




                                                                                                                                                                                          Binding Frequency (time!averag




                                                                                                                                                                                                                                                                                                                                 Binding Frequency (time!averag
              0.8                                   0.8        cI2   5&1-6#            (binding frequencies)
                                     0.8                                  0.8                                                                                                                                                                                                                                                                                                                                         0.8
              as a function of overall functionconcentrations.7&8.6,
                                    as a protein of overall proteinRepressor
                                  Repressor
                                                                          concentrations. Repressor                                                                                                                                                                                                                                                                                                                  Repressor


                                                                                                                                                                                                                                                                       X
                                                        Repressor                                                                                                                                                                                                                                                                                                                                                                                                                           Repressor
                     3-142(#
                                                                                                                                                                                                                                                                                X y
                                  RNA!polymerase                        RNA!polymerase                                                                                                                                                                                                                                                                                                                               RNA!polymerase
                                                             0.6                                                                                                         3-142(#                                           0.6
                                                                                                                                                                                                                                       RNA!polymerase
                                                                                                                                                                                                                                                                                                                                                                  0.6 7&8.6,
                                                                                                                                                                                                                                                                                                                                                                         RNA!polymerase                                                                                                     RNA!polymerase




                                                                                                               Learning                                                                                                                                      to predict✓f (yf ; x)✓f (yf ; x)
                                                                                                                                                      0.6                                                                                                                                                0.6                                                                                                          0.6

                                                             0.4


                                                             0.2
                                                                                                                                                      0.4                                                                  0.4


                                                                                                                                                                                                                            cI
                                                                                                                                                                                                                           0.2
                                                                                                                                                                                                                                                             s(y; x) =s(y; x) =
                                                                                                                                                                                                                                           O R3 O R2 O R1 cI cro R3 O R2 O R1
                                                                                                                                                                                                                                                                O
                                                                                                                                                                                                                                             O R3 O R2 O0.2 cI cro R3 0.2 R2 O R1
                                                                                                                                                                                                                                                                  O    O
                                                                                                                                                                                                                                                                                                         0.4                                                      0.4

                                                                                                                                                                                                                                                                                                                                                                                          cro
                                                                                                                                                                                                                                                                                                                                                                                           cro
                                                                                                                                                                                                                                                                                                                                                                                                                                       0.4


                                                                                                                                                                                                                             cI
                                                                                                                                                      0.2                                                                                                                                                                                                                                                                              0.2
                                                                                                                                                                                                                                                         R1



                       x                  x the score functions from data        f                                                                                                                                                                                                                                                                                              f
                                                                 0                                   0 2                  0                                     0                      0                                  0



                                           x
                                                                   !2                   0              !2             0     !2            2      0                !2 2            0      !2           2      0              !2   2          0                  2




                         x can of the equilibrium of the game (binding
         ••We’dfind the equilibriumfindthe game (binding frequencies) frequencies)
                                                                 10                 10               1010          10 10                10 10                  10 10           10     10           10     10             10 10           10                  10


                  like •to estimate
                                                                                    f         /f                   frepressor/fRNA            f         /f                     frepressor/f               f         /f                   frepressor/fRNA
                                                                                     repressor RNA                                             repressor RNA                                               repressor RNA
                                                                                 Bindinget OR31998
                                                                                  Arkin in al.                  Bindinget OR31998
                                                                                                                 Arkin in al.
                                                                                                                                            Binding in OR2                     Binding inRNA 2
                                                                                                                                                                                            OR          Binding in OR1                      Binding in OR1
                                                                                                                   Bindinget OR31998
                                                                                                                    Arkin in al.                                                  Binding in OR2                                               Binding in OR1



            such thatFigure•3:bindingoftotheproteinareproteinthemutated(binding frequencies) 1 for increasingmaxs(ym
                                                                                    Bindinget al. 1998
                                                                                      Arkin in
                                                                                (a) OR 3 OR3 1 1                                               Binding in OR2 1                                             Binding in OR1 1




         Figure• function of overall Predictions 3, O again correct 2, and mutated amounts of cI . max of s(y
                                                                                                                                                                            50                                    50                          (b) O 2                                                                                                                                (c) O 1




                                                                                               Binding Frequency (time!average)




                                                                                                                                                                                                                                                                    Binding Frequency (time!average)




                                                                                                                                                                                                                                                                                                                                                                                                                                Binding Frequency (time!average)
                                                             1                                                  (a) OR 31                            R                      (b) OR 2  1                         R                      (c) OR 1
                         We
               Binding Frequency (time!average)




                                                                                                                                                                                  Binding Frequency (time!average)




                                                                                                                                                                                                                                                                                                                             Binding Frequency (time!average)
                                                                                                                                                                  1                                                            1
           We can




                                                                                            Binding Frequency (time!average)




                                                                                                                                                                                                                                                                 Binding Frequency (time!average)




                                                                                                                                                                                                                                                                                                                                                                                                                             Binding Frequency (time!average)
                                                                 1                                                         1                                                             1
                               • We can find thegame (binding frequencies)
                                                    equilibrium of    game                               0#,12(#'                                                                        5&1-6#
            Binding Frequency (time!average)




                                                                                                                                                                               Binding Frequency (time!average)




                                                                                                                                                                                                                                                                                                                          Binding Frequency (time!average)
            • We can find the equilibrium                                   0#,12(#'                                                                   5&1-6#
                                  0.8                                        0#,12(#'
                                                                                               0.8          0#,12(#'0.8                                      0.8
                                                                                                                                                          5&1-6#                    0.8     5&1-6#                        0.8
                                as a function concentrations. and
                                                of overall protein concentrations. for increasing O                             Repressor                                                     Repressor                                                   Repressor
             as a Predictions are againof overall 2, to sites O 3, O 1
                3: Predicted protein a protein sites O binding
                                  as Predicted qualitativelyconcentrations.
                                     0.8
                                        function                      qualitatively correct
                                                                                 X           X y
                                                                                                  0.8
                                                                                             Repressor
                                                                                                             amounts cI .
                                                                                               Repressor 3-142(#
                                                        as a function of overall protein concentrations.
                                                                           3-142(#           RNA!polymerase
                                                                                                                         0.8 RNA!polymerase
                                                                                                                                     R
                                                                                                                                  Repressor      R 7&8.6,
                                                                                                                                                                0.8
                                                                                                                                                           Repressor
                                                                                                                                                             Repressor
                                                                                                                                                           RNA!polymeraseR
                                                                                                                                                                                       0.8
                                                                                                                                                                                  R 7&8.6,      Repressor
                                                                                                                                                                                              RNA!polymerase
                                                                                                                                                                                                                             0.8
                                                                                                                                                                                                                       Repressor
                                                                                                                                                                                                                       RRepressor ⇢
                                                                                                                                                                                                                       RNA!polymerase     2                 Repressor
                                                                                                                                                                                                                                                          RNA!polymerase 2
                                  0.6
                                     0.6
                                                                                  X
                                                                             3-142(#
                                                                                             X y
                                                                                               0.6          3-142(# 0.6
                                                                                               RNA!polymerase
                                                                                                  0.6                    0.6
                                                                                                                                  RNA!polymerase             0.6
                                                                                                                                                          7&8.6,
                                                                                                                                                             RNA!polymerase
                                                                                                                                                                0.6
                                                                                                                                                                                    0.6
                                                                                                                                                                                       0.6
                                                                                                                                                                                            7&8.6,
                                                                                                                                                                                                RNA!polymerase            0.6
                                                                                                                                                                                                                         RNA!polymerase
                                                                                                                                                                                                                             0.6
                                                                                                                                                                                                                                                            RNA!polymerase




                  yhowever,well.find cIinhibits transcrip- cIinhibits transcrip-= s(y; x) .= from✓data.;x) d
                         =that Note,We’d find s(y;mutated = s(y; 1, (y ;x) ✓ff(yff; x)
 became su⇥ciently high do weargmaxdomutated fAcknowledgments =✓ (y. ,;n
                    (i) ⇠                                         (i)
               • O 1 as cI•y2Y that cI            wlikes(y;x) functions f x) (y
                                         at the we ·                              i ✓ .=
                                                                                                               0.4                                                                                                                                                                                0.4                                                                                                                                           0.4


 O 1 as well. Note,We’d like to learn to learn
                                  0.4
                    became su⇥ciently high
                                  0.2
                                     0.4

                                        however,
                                                             (xthese ,Acknowledgments
                                                               at the , y)
                                                                         x) these functions from
                                                                                      x) f
                                                                                       ff                      0.2
                                                                                                                  0.4
                                                                                                                                                                          2
                                                                                                                                                                                                         0.4


                                                                                                                                                                                                         0.2
                                                                                                                                                                                                            0.4                                              2
                                                                                                                                                                                                                                                                                                  0.2
                                                                                                                                                                                                                                                                                                     0.4
                                                                                                                                                                                                                                                                                                                                                        0.4


                                                                                                                                                                                                                                                                                                                                                        0.2
                                                                                                                                                                                                                                                                                                                                                           0.4
                                                                                                                                                                                                                                                                                                                                                                                                                                                0.2
                                                                                                                                                                                                                                                                                                                                                                                                                                                   0.4

    R                                0.2                                            R                             0.2                                                2                                      0.2                                      2                                               0.2                                                   0.2                                                                                     0.2
 tion at OR 3 prior totion at OR 3OR 1. Thus 0the binding This workbinding This work was supported 52 GM68762 grant GM68762
                       occupying prior to occupying OR 1. Thus the was supported in part by NIH 0 in part by NIH                         grant                      52
           0
           10 0
                                         0
                                         10 0     10 10 0        !2
                                                                   parameterized 1010and 101010 ITR 10e10 0428715. is a e 1010
                                                                   10 10           0
                                                                                         0
                                                                                         10 10
                                                                                            0
                                                                                                       0

                       could not 10 10 observed without 10 repressor/fRNA by 10 10 ITR grant 0428715. /fRNAgrant 10
                      10
                      frepressor/fRNA        be
                                            10
                                                   could
                                                    10 /fRNA         in- 10 and
                                                                      !2                0    NSF       10 0
                                                                                                                      frepressor
                                                                                                                                           0
                                                                                                                                          10     10
                                                                                                                                                      !2

 at the mutated OR 1at10the mutated OR 1 frepressor10 /f not be fobserved without in-frepressor/fRNAby NSF 10 Luis P´rez-Brevafrepressor/fRNAP´rez-Breva is a
                                                                                                                                                     Luis
                                                                                                                                                    10
                                                                                                                                                             2
                                                                                                                                                            !2   2
                                                                                                                                                                                       0
                                                                                                                                                                                                                       0
                                                                                                                                                                                                                             !2
                                                                                                                                                                                                                                  !2
                                                                                                                                                                                                                                             2
                                                                                                                                                                                                                                                 2
                                                                                                                                                                                                                                                     0
                                                                                                                                                                                                                                                         0
                                                                                                                                                                                                                                                                                                               !2 2
                                                                                                                                                                                                                                                                                                                  !2 2
                                                                                                                                                                                                                                                                                                                                             0
                                                                                                                                                                                                                                                                                                                                                                  0
                                                                                                                                                                                                                                                                                                                                                                      !2
                                                                                                                                                                                                                                                                                                                                                                           !2
                                                                                                                                                                                                                                                                                                                                                                                ff   2
                                                                                                                                                                                                                                                                                                                                                                                         2
                                                                                                                                                                                                                                                                                                                                                                                             0
                                                                                                                                                                                                                                                                                                                                                                                                 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                   !2 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      !2 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ff     0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2
             10                                                                                           10

 terventions.
                         frepressor/fRNA
                      terventions.
                    (a) O 3
                                                    frepressor RNA

                                                (a) OR 3            (b) O 2
                                                                            scores on RafaelO“Fundaci´n Rafael del Pino” Fellow.1
                                                                         frepressor/fRNA
                                                                              “Fundaci´
                                                                                        R
                                                                                                    frepressor/fRNA
                                                                                                       del Pino” Fellow.
                                                                                                 (b) R 2
                                                                                                 50                 o
                                                                                                                         frepressor/fRNA

                                                                                                                    (c) O 150
                                                                                                                                                    frepressor/fRNA

                                                                                                                                               (c) OR                                                                                                    R                                                                                                                                       R
                                                                                (a) OR 3                                                                                      (a) OR 3                                                           (b) OR 2                                                                (b)
                                                                                                                                                                                                                                                                                                                         50                                       OR 2                   (c) OR 150                                                                               (c) OR 1

                                 x          x
                      - e.g., stereo reconstruction
               3: PredictedFigure 3: Predictions are again qualitatively mutated OR 1 for increasing amounts of cI2 .
                            protein Predicted protein binding to sites OR 3, OR 2, and correct
        Figure• Predictions• binding to sites OR 3, OR 2, and mutated OR 1 for increasing amounts of cI2 .
         Figure• Predictionsare again sites OR 3, OR 2, andcorrectOR 1 for increasing amounts of cI2 .
                             proteinare againqualitatively sites OR 3, R 2, and mutated
                3: PredictedFigure• binding to proteinare again mutated OReferences OR 1 for increasing amounts of cI2 .
                                    3: Predictions binding to correct
                                       Predicted qualitatively qualitatively correct
                                                            References

OR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip-
                     however, that cI2 inhibits transcrip-
                              (1)
                       however, that cI2 inhibits transcrip- (2)              y
                                                                              y
  OR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip-     ••We’dlearn these functions functions from
                                                                           ••We’d likeWe’d like to learn these functions from
                                                                                       to learn to learn these from data. P
became su⇥ciently becameDiscussion the mutated cI2 [1] the mutated John Adam and Harley H. McAdams.
  7 Discussion7 do su⇥ciently2 high do we find Acknowledgments Arkin, John Ross, and Harley H. McAdams
                      high      we find cI at                                       at Adam Arkin, Acknowledgments
                                                                                        y
  became su⇥ciently became su⇥ciently2 high do mutated cI2 at the mutated Acknowledgments
                        high do we find cI at the we find Acknowledgments
                                                                                              y             ...
                                                                                                                 [1] Ross,
                                                                             We’d like to like these functions from data.
                                                                                       Stochastic kinetic analysis of kinetic analysis path-
                                                                                                                      Stochastic developmental of developmental path
tion at OR 3 prior to occupying OR 1. to occupyingapproach provideswas supportedway bifurcation in phagepart coliNIH grant52 col
                     tion at OR 3 prior Thus the binding This workbifurcation in phage was supported 52 -infected excherichia
                        We believe3theR 1. to theoretic com- 1.Thusway binding
                                           gameprovides binding This the a com-
                                                                     OR 1.                                     This work -infectedgrantin
                                                                                                                       in part by NIH excherichia by52 GM68762            52 GM68
  We believe 3 prior to theoretic approach occupying OR Thuswork binding This work was supported in part by NIH grant GM6
  tion at OR the game occupying O
                       tion at OR prior Thus the a                                          the was supported in part by NIH grant GM68762
at the mutated OR 1 couldmutatedobserved without in- and by NSF ITR grant 0428715. ITR grant 0428715. a
                     at pelling causal abstraction ofwith re- and cells.with ITR grantcells. Genetics, P´rez-Breva isis August rez-Breva i
                         the of biological1systems biological systemsNSF re- and by NSF Luis 149:1633–1648, Luis P´ 1998.
                               not be OR could not be observed without in- 149:1633–1648, August 1998.
                                                                                                   Genetics,                                    e                      e
  at the causal abstraction not be OArt1 could notBooks observed without in- and 0428715. ITR grant 0428715. a
  pelling mutated OR 1 couldmutatedobserved without in- Art
                       at the                                          be                by Dolls Books                 by NSF MoebiusP´rez-Breva Moebius P´rez-Breva
                                                                                                                                        Luis Laundry
                                                                                                                                                  e               Luis e
terventions.         terventions. is complete with prov- “Fundaci´n nprov-thisdel Pino”ison left image ofdel pair and theFellow. ground-truth disparities
                        source constraints.2. The six datasets used incomplete with Rafael ofdel Shown Fellow.
  source constraints. terventions. Figure
                        The model
                                               R
                                                   The model is Figurepaper. Showno o usedimage paper. pair and the n Rafaeleach Pino”disparities.
                                                                                                               “Fundaci´ Rafael
                                                                                                                  Laundry Dolls                          Reindeer             Reindeer
  terventions.                                                              this “Fundaci´ Rafael each Pino” o
                                                                                 2. The six datasets left in
                                                                                               is the             “Fundaci´     Fellow. ground-truth Fellow.
                                                                                                                              the corresponding del Pino”corresponding
                                                                                  [2] Kenneth J. a [2] and Gerard Debreu. Existence Debreu. Existence o
                                                                                                            Arrow Kenneth J. Arrow and Gerard of

                                                                                   x [Scharstein & Pal 07, Middlebury dataset]data
                                                                                           x [Scharstein & Pal 07, Middlebury
                        ably convergent algorithms for finding (2)
  ably convergent algorithms for finding equilibria on a                              equilibria on
                              (1) scale.                                               an equilibrium for a competitive economy. Econo- economy. Econo
                                                                                                  x                   an equilibrium for a competitive
  genome-wide scale. genome-wide an MPE estimate from have an MPE estimate from it
                                     have
                                                                                   x
                                                                                   x        x
                                                                               References References           ...
                                                                      running graph cuts we use running graph cuts we use it
                                                                                                                      metrica, 22(3):265–290, July 1954.
                                     to compute our expectation in acompute our expectation in22(3):265–290,the em- 1954.
                                                                     to manner similar to the em- a manner similar to July
                                                                                       metrica,
                                                                                 References References
     The results from the small scale distribution.small scale distribution. Training a en-
                           The results from the Training pirical application are lattice-structured model us-
                                     pirical application are a en-      lattice-structured model us-
       Discussion couraging. ing the approach successfully path-based methods of
                     77 successfully path-based methods describedreproduces learn-
                                       Our reproduces known [3] Adam Arkin, a G. Adam Arkin, Lee, Ross, and Harley H. McAda
                                               model                                               known
                                                                     ing the approach described here is thus [3] Z. Bar-Joseph, G. Gerber, T. Lee, N. Rinaldi
                                                                                                                   generalization of
7couraging. Our modelDiscussion described here is thus[1]generalizationdescribedJohn Ross, learn- Harley H.N. Rinaldi,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               P
                                                                                  a                                            and                   McAdams.
                                                                     Viterbi in [32]. Z. our Arkin, in [32]. For our andT. John
                                                                                              Bar-Joseph, [1] Gerber,
  7 Discussionbehavior ofViterbi
  behavior of the                     the experiments we molecular
                                                                                    y
                             Discussion switch on ing experimentsStochastic B. Gordon Yoo,of B. GordonRoss, Robert, E. H. McAd
                                                                                   [1] of molecular JohnJ.Adam Arkin, John McAdams.
                                                                                         Adam
                                                                                        For
                                                                                        we Yoo,
                                                                                                                   [1] Ross, up- developmental path- Harley Fraenkel
                        level competition and resource dates with a variable Jaakkola, in phage and D. Gi ord. and of
  level competition and resource constraints, withoutconstraints,T. learning rate. R. Young, bifurcation Young,Compu-
                                                                       the               without the
                                                                                      way bifurcation
                                                                                                                             F.
                                                                                                                                        Harley H. F. and
                        switch on ingthe basis of use straightforward gradient-based up- kinetic analysis Robert, E. Fraenkel,developmental pa
                                                                      the basis J. use straightforward gradient-based
                                                                                     y                                Stochastic kinetic analysis of
                                                                                         Stochastic kinetic T. Jaakkola,developmental path-Gi ord. Compu
                                                                                                                      analysis of kinetic analysis D. developmental p
                                                                                                                        Stochastic R. excherichia coli
                                                                                                                      way -infected excherichia coli
                                                                                                                                                                                            yy                                                                                                                                                                                                                                                                                                                  P
We believe the game theoreticassume protein-protein interactions way bifurcation in tational -infectedinof phage -infected excherichia
                                                                                                                      phage discovery phage
                                     dates with a variable learning rate.
                        need to   approach provides aapproach provides a com-
                     Wetheoreticthe game theoretic a com-
                          believe approach provides                com-                   betweendiscovery of way bifurcation in1998. modules and regulatory
                                                                                                          cI2 149:1633–1648, August gene                      -infected excherichia
  need to assume game believe the game theoretic approach provides a com-
  We believe the protein-protein interactions between cI2
                       We of biological systems with re-                               tational
                                                                                      cells. Genetics,                   gene modules and regulatory
                                                                                                                      cells. Genetics, 149:1633–1648, August 1998.
pelling causal abstraction causalcI2 and RNA-polymerase. systems the con- Laundrycells. Genetics, Biotechnology, 21(11):1337–1342
                     pelling and abstraction of theBooks ArtEvencells. Genetics, 149:1633–1648, August 1998.
  dimers and cI2abstraction of biologicalEven inofwith Datasets networks. re-
                       RNA-polymerase. systems biological systems withNature Biotechnology, 21(11):1337–1342,Moebius
                        dimers
  pelling causal and pelling causalDatasets
                                     4. complete with biological
                                          abstraction
                                             Art                      con-
                                                                     4. re-                in with
                                                                                            Dolls Books               networks. Moebius 149:1633–1648, August 1998.
                                                                                                            re- Laundry Dolls Nature Laundry Reindeer Moebius                    Reind
source constraints. sourcemodel is Figure 2. The sub-system, however,with quan-this paper. pair andis the left image of ground-truth disparities.
                       The of this well-known model isincomplete datasets leftBooks of each Shown the corresponding each pair and the corresponding ground-truth dispar
                        text constraints. Art The six withprov-this paper. Shownfew the J. in
                                                                      Books                   Dolls
                                                  The six datasets used Figure 2. The six is the prov-
                                                                                 Art                                          Dolls      Moebius Laundry  Reindeer                 Re
  text of constraints. sourcemodel is Figure 2. The model isinorderpaper. 2003.significant in this[2] 2003. is the J. Arroweach pairGerard Debreu. Existenc
                         The constraints.
                                                                                                         image
  source this well-known sub-system,complete few quan-complete withused ArroweachKenneth left image of ground-truth disparities.
                                             however, datasets prov-this[2] The six datasets left image ofpaper.training the corresponding and and the corresponding ground-truth disp
                                                                                      Kenneth prov-                 and Gerard Debreu. Existence of
                                                                                            [Scharstein & Pal 07,&Middlebury dataset] da
                                                                                                     [Scharstein Middlebury dataset]
                                                                     used Figure 2. obtain a is
                                                                                        Shown                          pair and data


                                                                                             [Scharstein & Pal 07,& Pal 07, Middlebury da
ably convergent algorithms for finding equilibria are In amount of about data J.aArrow and GerardJ. Arrow and Gerard Debreu. Existen
                     ablyresultsexperimentalequilibriaon availablean approaches, we have [2] competitive economy. Econo-
                            convergent order results for a a [2] Kenneth
                                         In algorithms significant                  to training
                                                                      finding equilibria bind-         on
                                                                                                        used amount of Shown
                        titative for findingto obtain a forbind- learning equilibrium forcreated 30 new Debreu. Existence of
  titative experimental
  ably convergent algorithms are available aboutforonstereo created 30 newon a                                       a Kenneth
                       ably convergent MPE estimate from running graph equilibriait
genome-wide scale. genome-wide scale.
  ing. Proper validation           to scale.
  relies on estimatingthe small scale
   The results from the game parameters
                                              algorithms
                                     for stereo learning approaches, we have cuts we use
                                        validation and usestereo MPE with ground-truth
                                   have an
                                             version from available
                                      to compute our
                                                                       of               cuts
                                                                        manner similarthe the em-
                                                                                           to
                                                                    to compute our expectation em-
                                                                      en-                        the
                                                                                                      [Scharstein Pal 07, Middlebury
                                                                        finding an equilibrium for ancompetitive economy. Econo- economy. Eco
                                                                                                    use
                                                                                                      in
                                                                                                                            equilibrium for a competitive
                                                                                                                       aan use
                        ing.and usestereoour modelground-truth andatasetsestimateweG. 22(3):265–290,weequilibrium for a competitive economy. von
                              Properhave an MPE with thereforeourgraphusing an auto-it disparities usingB.G. Berg, Robert B. Winter, and Peter H. Ec
                                                                                   model therefore graph OttoJuly 1954.and Peter H. von
                                                                                                        running [4] cuts
                                                                    have disparities Ottofrom Berg, Robert an auto-it
                                       of datasets estimate from arunning [4] estimate from                                     Winter,
  genome-wide scale. genome-widecompute our expectation in have an MPE metrica, running graph cuts we use22(3):265–290, July 1954.
                                                                                                                      metrica, it
                                                      of game parameterssimilar structured-lighting technique ofJuly 22(3):265–290, mechanisms of protein
                                                         the are in a manner of metrica, 22(3):265–290, em- 1954.
                        relies on estimating theexpectation mated version fromtoavailable
                                     matedapplication structured-lighting technique of [2]. a manner similar to the [2].Di usion- driven July 1954.
                                                                                                                      Hippel.
                                                                                                                        metrica,
Learning to predict
• We’d like to estimate the score functions from data
 such that
                       ⇢
       y (i) ⇠ argmax w · f (x(i) , y) ,
             =                             i = 1, . . . , n
                 y2Y
                           parameterized
                               scores

• The prediction problem can be challenging. Can we
 learn the parameters more easily?
Learning to predict
• We’d like to estimate the score functions from data
 such that
                       ⇢
       y (i) ⇠ argmax w · f (x(i) , y) ,
             =                             i = 1, . . . , n
                 y2Y
                           parameterized
                               scores

• The prediction problem can be challenging. Can we
 learn the parameters more easily?

• Thm: (Sontag et al.) If “max” is hard, then learning is
 hard as well
Learning to predict
• We’d like to estimate the score functions from data
 such that
                               ⇢
        y (i) ⇠ argmax w · f (x(i) , y) ,
              =                                         i = 1, . . . , n
                         y2Y
                                   parameterized
                                       scores

• Each training example introduces (often) exponentially
 many linear constraints
   w · f (x(i) , y (i) ) > w · f (x(i) , y),       8 y 2 Y  y (i)
   score of the target         score for an          the set of all
       structure                alternative           alternatives
Learning to predict
• We’d like to estimate the score functions from data
 such that
                               ⇢
        y (i) ⇠ argmax w · f (x(i) , y) ,
              =                                         i = 1, . . . , n
                         y2Y
                                   parameterized
                                       scores

• Each training example introduces (often) exponentially
 many linear constraints
   w · f (x(i) , y (i) ) > w · f (x(i) , y),       8 y 2 Y  y (i)
   score of the target         score for an          the set of all
       structure                alternative           alternatives
Learning with pseudo-max
• We’d like to estimate the score functions from data
 such that
                               ⇢
        y (i) ⇠ argmax w · f (x(i) , y) ,
              =                                         i = 1, . . . , n
                         y2Y
                                   parameterized
                                       scores

• Each training example now provides a small number of
 linear constraints for alternatives “around the target”
   w · f (x(i) , y (i) ) > w · f (x(i) , y),       8 y 2 Y (i)
   score of the target         score for an        reduced set of
       structure                alternative         alternatives
 where each alternative may differ from the target in at
 most one (or a few) coordinates
Learning with pseudo-max
• We’d like to estimate the score functions from data
 such that
                               ⇢
        y (i) ⇠ argmax w · f (x(i) , y) ,
              =                                         i = 1, . . . , n
                         y2Y
                                   parameterized
                                       scores

• Each training example now provides a small number of
 linear constraints for alternatives “around the target”
   w · f (x(i) , y (i) ) > w · f (x(i) , y),       8 y 2 Y (i)
   score of the target         score for an        reduced set of
       structure                alternative         alternatives

• Thm: consistency still guaranteed in “restricted” cases
 (cf. pseudo-likelihood)
Learning with pseudo-max
• When the assumptions are strictly correct
                           0.2
                                                     exact
                                                     LP−relaxation
                        0.15                         pseudo−max
           Test error
                           0.1

                        0.05

                            0 1         2              3
                            10         10            10
                                        Train size

• In practice (multi-label prediction)
                           0.4
                                                     exact
                                                     LP−relaxation
                           0.3                       pseudo−max
              Test error




                           0.2

                           0.1

                            0 1    2             3            4
                            10    10           10           10
Goals and challenges
 • Goals
    - use rich classes of output structures
    - exercise fine control of how structures are chosen (scoring)
    - learn models efficiently from data
 • Challenges
X - prediction problems may be provably hard but we can solve
      practical instances effectively with decomposition methods
X   - most learning algorithms rely on explicit predictions and are
      therefore inefficient. Much weaker predictions (constraints)
      may suffice for learning.
    - richer structures lead to ambiguity
Dealing with ambiguity
• Ambiguity underlies many problems that are otherwise
 well suited for structured prediction
 - e.g., dependency parsing


              *   kids     make   nutritious   snacks


 - e.g., pose estimation
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»
Томми Яаккола «Масштабирование структурных предсказаний»

Weitere ähnliche Inhalte

Andere mochten auch

Симаков Алексей - Системы управления кластерами
 Симаков Алексей - Системы управления кластерами   Симаков Алексей - Системы управления кластерами
Симаков Алексей - Системы управления кластерами
Yandex
 
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
Yandex
 
Георгий Мостоловица — «Браузеры»
Георгий Мостоловица — «Браузеры»Георгий Мостоловица — «Браузеры»
Георгий Мостоловица — «Браузеры»
Yandex
 
Denis Lebedev, Swift
Denis  Lebedev, SwiftDenis  Lebedev, Swift
Denis Lebedev, Swift
Yandex
 
Михаил Даниэль - О недоязыках
Михаил Даниэль - О недоязыкахМихаил Даниэль - О недоязыках
Михаил Даниэль - О недоязыках
Yandex
 
Cocaine: погружение в облака — Евгений Сафронов
Cocaine: погружение в облака — Евгений СафроновCocaine: погружение в облака — Евгений Сафронов
Cocaine: погружение в облака — Евгений Сафронов
Yandex
 
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
Yandex
 
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
Yandex
 
Konstantin Kichinsky, Windows phone colors 43
Konstantin Kichinsky, Windows phone colors 43Konstantin Kichinsky, Windows phone colors 43
Konstantin Kichinsky, Windows phone colors 43
Yandex
 

Andere mochten auch (12)

Симаков Алексей - Системы управления кластерами
 Симаков Алексей - Системы управления кластерами   Симаков Алексей - Системы управления кластерами
Симаков Алексей - Системы управления кластерами
 
Подготовка к уроку по безопасности для школьников - Наталья Куканова, Мария Г...
Подготовка к уроку по безопасности для школьников - Наталья Куканова, Мария Г...Подготовка к уроку по безопасности для школьников - Наталья Куканова, Мария Г...
Подготовка к уроку по безопасности для школьников - Наталья Куканова, Мария Г...
 
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
Владимир Алаев, Сергей Белов "Новые возможности bem-tools для автоматической ...
 
Apache Spark — Егор Пахомов
Apache Spark — Егор ПахомовApache Spark — Егор Пахомов
Apache Spark — Егор Пахомов
 
Георгий Мостоловица — «Браузеры»
Георгий Мостоловица — «Браузеры»Георгий Мостоловица — «Браузеры»
Георгий Мостоловица — «Браузеры»
 
Denis Lebedev, Swift
Denis  Lebedev, SwiftDenis  Lebedev, Swift
Denis Lebedev, Swift
 
Михаил Даниэль - О недоязыках
Михаил Даниэль - О недоязыкахМихаил Даниэль - О недоязыках
Михаил Даниэль - О недоязыках
 
Cocaine: погружение в облака — Евгений Сафронов
Cocaine: погружение в облака — Евгений СафроновCocaine: погружение в облака — Евгений Сафронов
Cocaine: погружение в облака — Евгений Сафронов
 
Go в продакшене Яндекса: отчёт после года использования — Вячеслав Бахмутов
Go в продакшене Яндекса: отчёт после года использования — Вячеслав БахмутовGo в продакшене Яндекса: отчёт после года использования — Вячеслав Бахмутов
Go в продакшене Яндекса: отчёт после года использования — Вячеслав Бахмутов
 
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
Александр Панин, Как мы делали кроссплатформенную библиотеку SpeechKit на C++
 
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
Владимир Варанкин "БЭМ и JavaScript: Зачем мы написали JS-фреймворк?"
 
Konstantin Kichinsky, Windows phone colors 43
Konstantin Kichinsky, Windows phone colors 43Konstantin Kichinsky, Windows phone colors 43
Konstantin Kichinsky, Windows phone colors 43
 

Mehr von Yandex

Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Yandex
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Yandex
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Yandex
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Yandex
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Yandex
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Yandex
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Yandex
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Yandex
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Yandex
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Yandex
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Yandex
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Yandex
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Yandex
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Yandex
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Yandex
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Yandex
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Yandex
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Yandex
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Yandex
 

Mehr von Yandex (20)

Предсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of TanksПредсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of Tanks
 
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Томми Яаккола «Масштабирование структурных предсказаний»

  • 1. Scaling structured prediction Tommi Jaakkola MIT in collaboration with M. Collins, M. Fromer, T. Hazan, T. Koo, O. Meshi, A. Rush, D. Sontag
  • 2. Structured prediction • Natural language processing - e.g., tagging, morphology segmentation, dependency parsing • Computer vision - e.g., segmentation, stereo reconstruction, object recognition • Computational biology - e.g., molecular structure prediction, pathway reconstruction • Robotics - e.g., imitation learning, inverse kinematics • Human-computer interaction - e.g., interface alignment, example based designs • etc.
  • 3. Structured prediction • The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked
  • 4. 0.4 0.4 = x) ✓ • We’d like to learn these = s(y; x)s(y; func became su⇥ciently high do we find cI2 at the mutated Acknowledgments 0.4 0.4 0.4 Binding Freq Binding Freq Binding Freq Binding Freq Binding Freq Binding Freq OR 1 as well. Note, however, that cI2 inhibits transcrip- 0.2 0.2 0.2 0.2 0.2 0 !2 0 Structured prediction tion at OR 3 prior 0to occupying OR 1. 0 Thus the binding This work was supported in !2 0 2 0 at the mutated ORRNAcould not beRNA 10 10 without10in- and by NSF 10 10 grant 10 10 frepressor/f 1 10 10 10 frepressor/f observed 10 10 frepressor/fRNA 10 0 frepressor/fRNA ITR 10 !2 f042 repre 2 0 !2 2 0 !2 2 f 0 terventions. “Fundaci´n Rafael del Pino o • The(a) O 3is to learn a3mapping from input examples (x) goal (a) O R (b) O 2 (b) O 2 R 50 50 R R (c) O to complex objects (y) x Figure• Predictions are again are Ragain and mutated OR 1 for increasin 3: Predicted protein binding to sites O 3, OR 2, qualitatively mutated OR Figure• Predictions qualitatively R 3, OR 2, and correct 3: Predicted protein binding to sites O correct - e.g., from pairs of images (x) to disparity maps (y) References • We’d•ylike tolike to these functio We’d learn learn these f ame 7 Discussion we find cI2 do we find cI2 at Acknowledgments became su⇥ciently high at the mutated the mutated Acknowledgm su⇥ciently high do 1 as well. Note, as well. Note, cI2 inhibits transcrip- OR 1 however, that however, that cI2 inhibits transcrip- y =3 priorO 1. Thus the binding This binding This work was supp n at OR 3 prior to occupying to occupying OR 1. Thus the work was supported in in pha tion at OR [1] Adam Arkin, John Ro Stochastic kinetic ana way bifurcation part b We believe the game theoretic approach provides a com- R the mutatedcausalcould notOR 1observed without in- and by NSF ITR Genetics, 149:16 pelling at the mutated beof could not be observed without in- andgrant 0428715.g OR 1 abstraction biological systems with re- cells. by NSF ITR ventions. terventions. The model is complete with prov- Art “Fundaci´n Rafael del Pino” Fello Books o “Fundaci´n Rafael Dolls o Laundry source constraints. Figure 2. The six datasets used in this paper. Shown is the left image of each pair [2] Kenneth J. Arrow and ably convergent algorithms for finding equilibria on a x= genome-wide scale. x [Scharstein & Pal 07, Mid x an equilibrium for a c have an MPE estimate from running graph cuts we use it to compute our expectation Referencesthe em- References metrica, 22(3):265–290 in a manner similar to The results from the small scale distribution. Training a en- pirical application are lattice-structured model us- DiscussionOur model successfully approach described here is Adam Arkin, John Ross, and couraging. Discussion 7 ing the reproduces known thus [3] Z. Bar-Joseph, G. G [1] a generalization Adam Arkin, J [1] of behavior of the y y Viterbi path-based methods described in [32]. For our learn- switch on ing experiments of use straightforward gradient-based up- B. Gordon the basis we molecularStochastic Yoo, J. kinetic analysis o level competition and resource constraints, al.,learning rate. way bifurcation in phage -in (Scharsteinvariable ’07) the dates with a et without believe theWe believe the game theoretic approach provides a com- game theoretic approach provides a com- Stochastic kin T. Jaakkola,bifurcatio way R. Youn need to assume protein-protein interactions between cI2cells. Genetics, discovery of gtationalcells. Genetics 149:1633–164 ling causal abstraction of abstraction of biologicalre- pelling causal biological systems withArtsystems with re- dimers and cI2 and RNA-polymerase. EvenBooksthe con- Booksnetworks. Laundry Biot Art Datasets 4. in Dolls Nature Dolls rce constraints. The model is Figure 2.model is Figure 2. The six with prov-this paper. Shown is theand the cor source constraints. The The six with complete datasets used the left image of each pair left image complete datasets prov-this paper. Shown is in used in text of this well-known sub-system, however, few quan-Kenneth J.[2] KennethGerard [2] 2003. Arrow and J. Ar y convergent algorithms for algorithms for finding a ably convergent finding equilibria on equilibria on a
  • 5. Structured prediction • The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from pairs of web pages (x) to their alignments (y) Structured-Prediction Algorithm for Example-Based Web Design y= semantic alignment Ranjitha Kumar Jerry O. Talton Salman Ahmad Scott R Klemmer Stanford University⇤ a corpus of design examples unparalleled ver, leveraging existing designs to pro- x= ntly difficult. This paper introduces the utomatically transferring design and con- Bricolage introduces a novel structured- learns to create coherent mappings be- n human-generated exemplars. The pro- be used to automatically transfer the con- he style and layout of another. We show o accurately reproduce human page map- (Kumar et al., ’10) s a general, efficient, and automatic tech- ent between a variety of real Web pages. N rely on examples for inspiration [Herring es can facilitate better design work [Lee
  • 6. Structured prediction • Natural language processing - e.g., tagging, morphology segmentation, dependency parsing • Computer vision - e.g., segmentation, stereo reconstruction, object recognition • Computational biology - e.g., molecular structure prediction, pathway reconstruction • Robotics - e.g., imitation learning, inverse kinematics • Human-computer interaction - e.g., interface alignment, example based designs • etc.
  • 7. Goals and challenges • Goals - use rich classes of output structures - exercise fine control of how structures are chosen (scoring) - learn models efficiently from data • Challenges - prediction problems are often provably hard - most learning algorithms rely on explicit predictions and are therefore inefficient with large amounts of data - richer structures lead to ambiguity
  • 8. Structured prediction • The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked - in lexicalized dependency parsing, we draw an arc from the head word of each phrase to words that modify it - the resulting parse is a directed tree. In many languages, the tree is non-projective (crossing arcs) - each sentence is mapped to arc scores; the parse is obtained as the highest scoring directed tree
  • 9. Structured prediction • The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selected n and zero otherwise 2 1 ⇤ 1 2 n
  • 10. Structured prediction • The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selected n and zero otherwise x ! w · f (x; i, j) = ✓(i, j) sentence features arc scores parameters 2 1 ⇤ 1 2 n
  • 11. Structured prediction • The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selected n and zero otherwise x ! w · f (x; i, j) = ✓(i, j) sentence features arc scores parameters ⇢X 2 1 y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y ⇤ 1 2 n i,j highest scoring tree
  • 12. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n • The complexity of the prediction task depends on how we score each candidate tree • In an arc factoredX model (as before) each arc is scored separately y(i, j)✓(i, j) i,j • The highest scoring tree is found as the maximum weighted directed spanning tree ⇢X y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y i,j
  • 13. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n • The complexity of the prediction task depends on how we score each candidate tree • In an arc factoredX model (as before) each arc is scored separately y(i, j)✓(i, j) i,j • The highest scoring tree is found as the maximum weighted directed spanning tree ⇢X y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y i,j
  • 14. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n • The complexity of the prediction task depends on how we score each candidate tree • It is often advantageous to include interactions between modifiers (outgoing arcs) known as “sibling scoring” X ✓i (y|i ), where y|i = { y(i, j), j 6= i } i
  • 15. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n • The complexity of the prediction task depends on how we score each candidate tree • It is often advantageous to include interactions between modifiers (outgoing arcs) known as “sibling scoring” X ✓i (y|i ), where y|i = { y(i, j), j 6= i } i • Finding the highest scoring tree is now NP-hard (McDonald and Satta,⇢2007) X y ⇤ = argmax ✓(y|i ) + ✓T (y) y i
  • 16. Decomposition * John saw a movie yesterday that he liked i=0 1 2 n ✓T (y) directed tree arc factored scores ... ✓0(y|0) ✓2(y|2) ✓n(y|n) modifiers (outgoing arcs) solved separately for each word • We can always turn a hard problem into an easy one by solving each “part” separately from others • But the parts are unlikely to agree on a solution ...
  • 17. Dual decomposition * John saw a movie yesterday that he liked i=0 1 2 n X ✓T (y) directed tree ✓T (y) + y(i, j) (i, j) arc factored scores i,j effective arc agreement ... X ... ✓0(y|0) ✓2(y|2) ✓n(y|n) ✓i(y|i) y(i, j) (i, j) modifiers (outgoing arcs) solved j6=i separately for each word • We can encourage parts to agree on the maximizing arcs via Lagrange multipliers (c.f. Guignard, Fisher, ‘80s)
  • 18. Dual decomposition algorithm • An iterative sub-gradient algorithm (Koo et al., 2010) * John saw a movie yesterday that he liked find a directed spanning tree X ˆ y = argmax ✓T (y) + y(i, j) (i, j) y i,j find modifiers of each word X ˆ0 y|i = argmax ✓i(y|i) y(i, j) (i, j) y|i j6=i update Lagrange multipliers based on disagreement (i, j) (i, j) + ↵k y 0(i, j) ˆ y (i, j) ˆ
  • 19. Dual decomposition algorithm • An iterative sub-gradient algorithm (Koo et al., 2010) * John saw a movie yesterday that he liked find a directed spanning tree X ˆ y = argmax ✓T (y) + y(i, j) (i, j) y i,j find modifiers of each word X ˆ0 y|i = argmax ✓i(y|i) y(i, j) (i, j) y|i j6=i update Lagrange multipliers based on disagreement (i, j) (i, j) + ↵k y 0(i, j) ˆ y (i, j) ˆ • Thm: The solution is optimal if an agreement (no updates) is reached
  • 20. Dual decomposition in practice nvergence shows the percentage of test cases where the • The table sub-gradient algorithm quickly finds the optimal solution CertS CertG Dan 99.07 98.45 Dut 98.19 97.93 Por 99.65 99.31 Slo 90.55 95.27 Swe 98.71 98.97 Tur 98.72 99.04 1 Eng 98.65 99.18 Eng2 98.96 99.12 Dan 98.50 98.50 Dut 98.00 99.50
  • 21. Goals and challenges • Goals - use rich classes of output structures - exercise fine control of how structures are chosen (scoring) - learn models efficiently from data • Challenges X - prediction problems may be provably hard but we can solve practical instances effectively with decomposition methods - most learning algorithms rely on explicit predictions and are therefore inefficient with large amounts of data - richer structures lead to ambiguity
  • 22. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores - e.g., lexicalized dependency parsing (1) y (2) ... y John saw a movie x (2) * kids make nutritious snacks ... x(1) *
  • 23. • We can0#,12(#' the equilibriumfindthe gamecI2 find of (binding the game cI2 RNAp frequencies) ••Prediction is often maximizingmaxMRF ••Prediction is often done by done by maximizing Prediction is often maximizing an MRF Prediction is often done by done by maximizing RNAp m an s(y; 1 1 1 Binding Frequency (time!average Binding Frequency (time!average Binding Frequency (time!average cI2 0.8 5&1-6# RNAp Binding Frequency (time!averag • We can0#,12(#' the equilibrium of RNAp Binding Frequency (time!averag Binding Frequency (time!averag 0.8 0.8 cI2 5&1-6# (binding frequencies) 0.8 0.8 0.8 as a function of overall functionconcentrations.7&8.6, as a protein of overall proteinRepressor Repressor concentrations. Repressor Repressor X Repressor Repressor 3-142(# X y RNA!polymerase RNA!polymerase RNA!polymerase 0.6 3-142(# 0.6 RNA!polymerase 0.6 7&8.6, RNA!polymerase RNA!polymerase Learning to predict✓f (yf ; x)✓f (yf ; x) 0.6 0.6 0.6 0.4 0.2 0.4 0.4 cI 0.2 s(y; x) =s(y; x) = O R3 O R2 O R1 cI cro R3 O R2 O R1 O O R3 O R2 O0.2 cI cro R3 0.2 R2 O R1 O O 0.4 0.4 cro cro 0.4 cI 0.2 0.2 R1 x x the score functions from data f f 0 0 2 0 0 0 0 x !2 0 !2 0 !2 2 0 !2 2 0 !2 2 0 !2 2 0 2 x can of the equilibrium of the game (binding ••We’dfind the equilibriumfindthe game (binding frequencies) frequencies) 10 10 1010 10 10 10 10 10 10 10 10 10 10 10 10 10 10 like •to estimate f /f frepressor/fRNA f /f frepressor/f f /f frepressor/fRNA repressor RNA repressor RNA repressor RNA Bindinget OR31998 Arkin in al. Bindinget OR31998 Arkin in al. Binding in OR2 Binding inRNA 2 OR Binding in OR1 Binding in OR1 Bindinget OR31998 Arkin in al. Binding in OR2 Binding in OR1 such thatFigure•3:bindingoftotheproteinareproteinthemutated(binding frequencies) 1 for increasingmaxs(ym Bindinget al. 1998 Arkin in (a) OR 3 OR3 1 1 Binding in OR2 1 Binding in OR1 1 Figure• function of overall Predictions 3, O again correct 2, and mutated amounts of cI . max of s(y 50 50 (b) O 2 (c) O 1 Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 (a) OR 31 R (b) OR 2 1 R (c) OR 1 We Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 1 We can Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 1 1 • We can find thegame (binding frequencies) equilibrium of game 0#,12(#' 5&1-6# Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) • We can find the equilibrium 0#,12(#' 5&1-6# 0.8 0#,12(#' 0.8 0#,12(#'0.8 0.8 5&1-6# 0.8 5&1-6# 0.8 as a function concentrations. and of overall protein concentrations. for increasing O Repressor Repressor Repressor as a Predictions are againof overall 2, to sites O 3, O 1 3: Predicted protein a protein sites O binding as Predicted qualitativelyconcentrations. 0.8 function qualitatively correct X X y 0.8 Repressor amounts cI . Repressor 3-142(# as a function of overall protein concentrations. 3-142(# RNA!polymerase 0.8 RNA!polymerase R Repressor R 7&8.6, 0.8 Repressor Repressor RNA!polymeraseR 0.8 R 7&8.6, Repressor RNA!polymerase 0.8 Repressor RRepressor ⇢ RNA!polymerase 2 Repressor RNA!polymerase 2 0.6 0.6 X 3-142(# X y 0.6 3-142(# 0.6 RNA!polymerase 0.6 0.6 RNA!polymerase 0.6 7&8.6, RNA!polymerase 0.6 0.6 0.6 7&8.6, RNA!polymerase 0.6 RNA!polymerase 0.6 RNA!polymerase yhowever,well.find cIinhibits transcrip- cIinhibits transcrip-= s(y; x) .= from✓data.;x) d =that Note,We’d find s(y;mutated = s(y; 1, (y ;x) ✓ff(yff; x) became su⇥ciently high do weargmaxdomutated fAcknowledgments =✓ (y. ,;n (i) ⇠ (i) • O 1 as cI•y2Y that cI wlikes(y;x) functions f x) (y at the we · i ✓ .= 0.4 0.4 0.4 O 1 as well. Note,We’d like to learn to learn 0.4 became su⇥ciently high 0.2 0.4 however, (xthese ,Acknowledgments at the , y) x) these functions from x) f ff 0.2 0.4 2 0.4 0.2 0.4 2 0.2 0.4 0.4 0.2 0.4 0.2 0.4 R 0.2 R 0.2 2 0.2 2 0.2 0.2 0.2 tion at OR 3 prior totion at OR 3OR 1. Thus 0the binding This workbinding This work was supported 52 GM68762 grant GM68762 occupying prior to occupying OR 1. Thus the was supported in part by NIH 0 in part by NIH grant 52 0 10 0 0 10 0 10 10 0 !2 parameterized 1010and 101010 ITR 10e10 0428715. is a e 1010 10 10 0 0 10 10 0 0 could not 10 10 observed without 10 repressor/fRNA by 10 10 ITR grant 0428715. /fRNAgrant 10 10 frepressor/fRNA be 10 could 10 /fRNA in- 10 and !2 0 NSF 10 0 frepressor 0 10 10 !2 at the mutated OR 1at10the mutated OR 1 frepressor10 /f not be fobserved without in-frepressor/fRNAby NSF 10 Luis P´rez-Brevafrepressor/fRNAP´rez-Breva is a Luis 10 2 !2 2 0 0 !2 !2 2 2 0 0 !2 2 !2 2 0 0 !2 !2 ff 2 2 0 0 !2 2 !2 2 ff 0 0 2 2 10 10 terventions. frepressor/fRNA terventions. (a) O 3 frepressor RNA (a) OR 3 (b) O 2 scores on RafaelO“Fundaci´n Rafael del Pino” Fellow.1 frepressor/fRNA “Fundaci´ R frepressor/fRNA del Pino” Fellow. (b) R 2 50 o frepressor/fRNA (c) O 150 frepressor/fRNA (c) OR R R (a) OR 3 (a) OR 3 (b) OR 2 (b) 50 OR 2 (c) OR 150 (c) OR 1 x x - e.g., stereo reconstruction 3: PredictedFigure 3: Predictions are again qualitatively mutated OR 1 for increasing amounts of cI2 . protein Predicted protein binding to sites OR 3, OR 2, and correct Figure• Predictions• binding to sites OR 3, OR 2, and mutated OR 1 for increasing amounts of cI2 . Figure• Predictionsare again sites OR 3, OR 2, andcorrectOR 1 for increasing amounts of cI2 . proteinare againqualitatively sites OR 3, R 2, and mutated 3: PredictedFigure• binding to proteinare again mutated OReferences OR 1 for increasing amounts of cI2 . 3: Predictions binding to correct Predicted qualitatively qualitatively correct References OR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip- however, that cI2 inhibits transcrip- (1) however, that cI2 inhibits transcrip- (2) y y OR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip- ••We’dlearn these functions functions from ••We’d likeWe’d like to learn these functions from to learn to learn these from data. P became su⇥ciently becameDiscussion the mutated cI2 [1] the mutated John Adam and Harley H. McAdams. 7 Discussion7 do su⇥ciently2 high do we find Acknowledgments Arkin, John Ross, and Harley H. McAdams high we find cI at at Adam Arkin, Acknowledgments y became su⇥ciently became su⇥ciently2 high do mutated cI2 at the mutated Acknowledgments high do we find cI at the we find Acknowledgments y ... [1] Ross, We’d like to like these functions from data. Stochastic kinetic analysis of kinetic analysis path- Stochastic developmental of developmental path tion at OR 3 prior to occupying OR 1. to occupyingapproach provideswas supportedway bifurcation in phagepart coliNIH grant52 col tion at OR 3 prior Thus the binding This workbifurcation in phage was supported 52 -infected excherichia We believe3theR 1. to theoretic com- 1.Thusway binding gameprovides binding This the a com- OR 1. This work -infectedgrantin in part by NIH excherichia by52 GM68762 52 GM68 We believe 3 prior to theoretic approach occupying OR Thuswork binding This work was supported in part by NIH grant GM6 tion at OR the game occupying O tion at OR prior Thus the a the was supported in part by NIH grant GM68762 at the mutated OR 1 couldmutatedobserved without in- and by NSF ITR grant 0428715. ITR grant 0428715. a at pelling causal abstraction ofwith re- and cells.with ITR grantcells. Genetics, P´rez-Breva isis August rez-Breva i the of biological1systems biological systemsNSF re- and by NSF Luis 149:1633–1648, Luis P´ 1998. not be OR could not be observed without in- 149:1633–1648, August 1998. Genetics, e e at the causal abstraction not be OArt1 could notBooks observed without in- and 0428715. ITR grant 0428715. a pelling mutated OR 1 couldmutatedobserved without in- Art at the be by Dolls Books by NSF MoebiusP´rez-Breva Moebius P´rez-Breva Luis Laundry e Luis e terventions. terventions. is complete with prov- “Fundaci´n nprov-thisdel Pino”ison left image ofdel pair and theFellow. ground-truth disparities source constraints.2. The six datasets used incomplete with Rafael ofdel Shown Fellow. source constraints. terventions. Figure The model R The model is Figurepaper. Showno o usedimage paper. pair and the n Rafaeleach Pino”disparities. “Fundaci´ Rafael Laundry Dolls Reindeer Reindeer terventions. this “Fundaci´ Rafael each Pino” o 2. The six datasets left in is the “Fundaci´ Fellow. ground-truth Fellow. the corresponding del Pino”corresponding [2] Kenneth J. a [2] and Gerard Debreu. Existence Debreu. Existence o Arrow Kenneth J. Arrow and Gerard of x [Scharstein & Pal 07, Middlebury dataset]data x [Scharstein & Pal 07, Middlebury ably convergent algorithms for finding (2) ably convergent algorithms for finding equilibria on a equilibria on (1) scale. an equilibrium for a competitive economy. Econo- economy. Econo x an equilibrium for a competitive genome-wide scale. genome-wide an MPE estimate from have an MPE estimate from it have x x x References References ... running graph cuts we use running graph cuts we use it metrica, 22(3):265–290, July 1954. to compute our expectation in acompute our expectation in22(3):265–290,the em- 1954. to manner similar to the em- a manner similar to July metrica, References References The results from the small scale distribution.small scale distribution. Training a en- The results from the Training pirical application are lattice-structured model us- pirical application are a en- lattice-structured model us- Discussion couraging. ing the approach successfully path-based methods of 77 successfully path-based methods describedreproduces learn- Our reproduces known [3] Adam Arkin, a G. Adam Arkin, Lee, Ross, and Harley H. McAda model known ing the approach described here is thus [3] Z. Bar-Joseph, G. Gerber, T. Lee, N. Rinaldi generalization of 7couraging. Our modelDiscussion described here is thus[1]generalizationdescribedJohn Ross, learn- Harley H.N. Rinaldi, P a and McAdams. Viterbi in [32]. Z. our Arkin, in [32]. For our andT. John Bar-Joseph, [1] Gerber, 7 Discussionbehavior ofViterbi behavior of the the experiments we molecular y Discussion switch on ing experimentsStochastic B. Gordon Yoo,of B. GordonRoss, Robert, E. H. McAd [1] of molecular JohnJ.Adam Arkin, John McAdams. Adam For we Yoo, [1] Ross, up- developmental path- Harley Fraenkel level competition and resource dates with a variable Jaakkola, in phage and D. Gi ord. and of level competition and resource constraints, withoutconstraints,T. learning rate. R. Young, bifurcation Young,Compu- the without the way bifurcation F. Harley H. F. and switch on ingthe basis of use straightforward gradient-based up- kinetic analysis Robert, E. Fraenkel,developmental pa the basis J. use straightforward gradient-based y Stochastic kinetic analysis of Stochastic kinetic T. Jaakkola,developmental path-Gi ord. Compu analysis of kinetic analysis D. developmental p Stochastic R. excherichia coli way -infected excherichia coli yy P We believe the game theoreticassume protein-protein interactions way bifurcation in tational -infectedinof phage -infected excherichia phage discovery phage dates with a variable learning rate. need to approach provides aapproach provides a com- Wetheoreticthe game theoretic a com- believe approach provides com- betweendiscovery of way bifurcation in1998. modules and regulatory cI2 149:1633–1648, August gene -infected excherichia need to assume game believe the game theoretic approach provides a com- We believe the protein-protein interactions between cI2 We of biological systems with re- tational cells. Genetics, gene modules and regulatory cells. Genetics, 149:1633–1648, August 1998. pelling causal abstraction causalcI2 and RNA-polymerase. systems the con- Laundrycells. Genetics, Biotechnology, 21(11):1337–1342 pelling and abstraction of theBooks ArtEvencells. Genetics, 149:1633–1648, August 1998. dimers and cI2abstraction of biologicalEven inofwith Datasets networks. re- RNA-polymerase. systems biological systems withNature Biotechnology, 21(11):1337–1342,Moebius dimers pelling causal and pelling causalDatasets 4. complete with biological abstraction Art con- 4. re- in with Dolls Books networks. Moebius 149:1633–1648, August 1998. re- Laundry Dolls Nature Laundry Reindeer Moebius Reind source constraints. sourcemodel is Figure 2. The sub-system, however,with quan-this paper. pair andis the left image of ground-truth disparities. The of this well-known model isincomplete datasets leftBooks of each Shown the corresponding each pair and the corresponding ground-truth dispar text constraints. Art The six withprov-this paper. Shownfew the J. in Books Dolls The six datasets used Figure 2. The six is the prov- Art Dolls Moebius Laundry Reindeer Re text of constraints. sourcemodel is Figure 2. The model isinorderpaper. 2003.significant in this[2] 2003. is the J. Arroweach pairGerard Debreu. Existenc The constraints. image source this well-known sub-system,complete few quan-complete withused ArroweachKenneth left image of ground-truth disparities. however, datasets prov-this[2] The six datasets left image ofpaper.training the corresponding and and the corresponding ground-truth disp Kenneth prov- and Gerard Debreu. Existence of [Scharstein & Pal 07,&Middlebury dataset] da [Scharstein Middlebury dataset] used Figure 2. obtain a is Shown pair and data [Scharstein & Pal 07,& Pal 07, Middlebury da ably convergent algorithms for finding equilibria are In amount of about data J.aArrow and GerardJ. Arrow and Gerard Debreu. Existen ablyresultsexperimentalequilibriaon availablean approaches, we have [2] competitive economy. Econo- convergent order results for a a [2] Kenneth In algorithms significant to training finding equilibria bind- on used amount of Shown titative for findingto obtain a forbind- learning equilibrium forcreated 30 new Debreu. Existence of titative experimental ably convergent algorithms are available aboutforonstereo created 30 newon a a Kenneth ably convergent MPE estimate from running graph equilibriait genome-wide scale. genome-wide scale. ing. Proper validation to scale. relies on estimatingthe small scale The results from the game parameters algorithms for stereo learning approaches, we have cuts we use validation and usestereo MPE with ground-truth have an version from available to compute our of cuts manner similarthe the em- to to compute our expectation em- en- the [Scharstein Pal 07, Middlebury finding an equilibrium for ancompetitive economy. Econo- economy. Eco use in equilibrium for a competitive aan use ing.and usestereoour modelground-truth andatasetsestimateweG. 22(3):265–290,weequilibrium for a competitive economy. von Properhave an MPE with thereforeourgraphusing an auto-it disparities usingB.G. Berg, Robert B. Winter, and Peter H. Ec model therefore graph OttoJuly 1954.and Peter H. von running [4] cuts have disparities Ottofrom Berg, Robert an auto-it of datasets estimate from arunning [4] estimate from Winter, genome-wide scale. genome-widecompute our expectation in have an MPE metrica, running graph cuts we use22(3):265–290, July 1954. metrica, it of game parameterssimilar structured-lighting technique ofJuly 22(3):265–290, mechanisms of protein the are in a manner of metrica, 22(3):265–290, em- 1954. relies on estimating theexpectation mated version fromtoavailable matedapplication structured-lighting technique of [2]. a manner similar to the [2].Di usion- driven July 1954. Hippel. metrica,
  • 24. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • The prediction problem can be challenging. Can we learn the parameters more easily?
  • 25. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • The prediction problem can be challenging. Can we learn the parameters more easily? • Thm: (Sontag et al.) If “max” is hard, then learning is hard as well
  • 26. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • Each training example introduces (often) exponentially many linear constraints w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y y (i) score of the target score for an the set of all structure alternative alternatives
  • 27. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • Each training example introduces (often) exponentially many linear constraints w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y y (i) score of the target score for an the set of all structure alternative alternatives
  • 28. Learning with pseudo-max • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • Each training example now provides a small number of linear constraints for alternatives “around the target” w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y (i) score of the target score for an reduced set of structure alternative alternatives where each alternative may differ from the target in at most one (or a few) coordinates
  • 29. Learning with pseudo-max • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores • Each training example now provides a small number of linear constraints for alternatives “around the target” w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y (i) score of the target score for an reduced set of structure alternative alternatives • Thm: consistency still guaranteed in “restricted” cases (cf. pseudo-likelihood)
  • 30. Learning with pseudo-max • When the assumptions are strictly correct 0.2 exact LP−relaxation 0.15 pseudo−max Test error 0.1 0.05 0 1 2 3 10 10 10 Train size • In practice (multi-label prediction) 0.4 exact LP−relaxation 0.3 pseudo−max Test error 0.2 0.1 0 1 2 3 4 10 10 10 10
  • 31. Goals and challenges • Goals - use rich classes of output structures - exercise fine control of how structures are chosen (scoring) - learn models efficiently from data • Challenges X - prediction problems may be provably hard but we can solve practical instances effectively with decomposition methods X - most learning algorithms rely on explicit predictions and are therefore inefficient. Much weaker predictions (constraints) may suffice for learning. - richer structures lead to ambiguity
  • 32. Dealing with ambiguity • Ambiguity underlies many problems that are otherwise well suited for structured prediction - e.g., dependency parsing * kids make nutritious snacks - e.g., pose estimation