7. Bilingual Word Alignment What is the anticipated cost of collecting fees under the new proposal? En vertu des nouvelles propositions, quel est le coût prévu de perception des droits? x y What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions , quel est le coût prévu de perception de les droits ? Combinatorial structure
8. Protein Structure and Disulfide Bridges Protein: 1IMT AVITGA C ERDLQ C G KGT CC AVSLWIKSV RV C TPVGTSGED C H PASHKIPFSGQRMH HT C P C APNLA C VQT SPKKFK C LSK
15. Chain Markov Net (aka CRF*) y x *Lafferty et al. 01 a-z a-z a-z a-z a-z
16. Chain Markov Net (aka CRF*) y x *Lafferty et al. 01 a-z a-z a-z a-z a-z
17. Associative Markov Nets Point features spin-images, point height Edge features length of edge, edge orientation y j y k jk j “ associative” restriction
20. Disulfide Bonds: Non-bipartite Matching 1 2 3 4 6 5 RS CC P C YWGG C PWGQN C YPEG C SGPKV 1 2 3 4 5 6 6 1 2 4 5 3 Fariselli & Casadio `01, Baldi et al. ‘04
21.
22.
23. Supervised Structured Prediction Learning Prediction Estimate w Example: Weighted matching Generally: Combinatorial optimization Data Model: Likelihood (can be intractable) Margin Local (ignores structure)
24.
25.
26.
27.
28.
29.
30. Structured Loss b c a r e b r o r e b r o c e b r a c e 2 2 1 0 ‘ What is the’ ‘ Quel est le’ 0 1 2 2 ‘ It was red’ 0 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 S A E C D S B E A C S B D A C S A B C D
38. Matching Inference LP Has integral solutions z ( A is totally unimodular) degree What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions , quel est le co û t prévu de perception de le droits ? j k [Nemhauser+Wolsey 88] Need Hamming-like loss
39. y z Map for Markov Nets 0 0 0 0 0 . . . 0 . 0 0 0 . 1 0 0 : 0 1 0 : 1 0 0 : 0 1 0 : 1 0 0 : 1 0 z : b a 0 0 0 0 0 . . . 0 . 0 1 0 . 0 0 0 0 0 0 0 . . . 0 . 0 0 0 . 1 0 0 0 0 0 0 . . . 0 . 1 0 0 . 0 0 z : b a z . b a z . b a z . b a z . b a
40. Markov Net Inference LP Has integral solutions z for chains, (hyper)trees Can be fractional for untriangulated networks normalization agreement [Chekuri+al 01, Wainright+al 02] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
48. Factored Primal/Dual By QP duality Dual inherits structure from problem-specific inference LP Variables correspond to a decomposition of variables of the flat case
49. The Connection b c a r e b r o r e b r o c e b r a c e r c a o c r .2 .15 .25 .4 .2 .35 .65 .8 .4 .6 1 b 1 e 2 2 1 0
50.
51. 3D Mapping Laser Range Finder GPS IMU Data provided by: Michael Montemerlo & Sebastian Thrun Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points
58. LAGRbot: Real-time Navigation LAGRbot: Paul Vernaza & Dan Lee Range of stereo vision limited to approximately 15 m or less
59. LAGRbot: Real-time Navigation 160x120 images: Real time prediction/learning (~100ms) Current work with Paul Vernaza, Dan Lee 8% Structured 17% Local Error Model
60.
61. Word Alignment Results Data: [Hansards – Canadian Parliament] Features induced on 1 mil unsupervised sentences Trained on 100 sentences (10,000 edges) Tested on 350 sentences (35,000 edges) [Taskar+al 05] *Error: weighted combination of precision/recall [Lacoste-Julien+Taskar+al 06] *Error Model 6.5 GIZA/IBM4 [Och & Ney 03] 4.5 +Our approach+QAP 5.4 +Local learning+matching 4.9 +Our approach
We also tried our framework on a webpage classification task, where we have websites of several computer science departments and we try to classify pages into five categories. The first model we tried is a linear svm that classifies each page based on the bag of words it contains. In our second model, described in our earlier work, we learn a relational markov network which has an edge between hyperlinked pages. This model captures very strong correlations between the labels of linked pages, for example the fact that students usually point to the advisor’s page and faculty rarely point to other faculty and achieves a significant gain over svms. Note that inference in this model is intractible, so we used loopy belief propagation. For m^3 nets we also have to use the relaxed dual, without the clique-tree-constraints. It achieves error 19% over the markov network with the same features.