Sparse Data Structures for Weighted Bipartite Matching

•

1 gefällt mir•428 views

Jason Riedy

A stab at optimizing the inner loop for auctiion-based, sparse bipartite matching.

Technologie Bildung

Sparse Data Structures for Weighted Bipartite
Matching
E. Jason Riedy
Dr. James Demmel
(. . . and thanks to the BeBOP group)

SIAM Workshop on Combinatorial Scientiﬁc Computing 2004

Use Sparse Matrix Optimizations...

Take a ﬁxed, simple algorithm: Auction alg. for matchings
Repeated iterations over a sparse graph.
What’s expensive, and is there anything we can do about it?
Take an idea from optimizing sparse matrix-vector products.
A little speed-up in some cases, but there are more ideas
available...

Where’s the Time Going?

Auction algorithm: Iterative, greedy algorithm bipartite matching:
1. An unmatched row i ﬁnds a “most
proﬁtable” column j
π(i) = maxj b(i, j) − p(i)
2. Row i places a bid for column j.
Bid price raised until j is no longer the best
choice. (Min. increment µ)
3. Highest bid gets the matching (i, j).

Time linear in entries examined...
Number of entries examined is problem-dependent.

4.5
4
3.5
3
time (s)

2.5
2
1.5
1
0.5
0
0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07
number of entries examined

Expensive Inner Loop!
1.3 GHz Itanium 2
10

8

6

4

2

0
100 150 200 250 300 350 400 450
cycles per entry

Verifying...
Using kcachegrind (N.Nethercote and J.Weidendorfer) and valgrind
(J. Seward).

And Locating...

No obvious culprits in the instructions...

And Locating...

But considering cache eﬀects!

Auction’s Inner Loop

Price
Index
Entry

value = entry - price
save largest...

Auction’s Inner Loop

Same accesses as sparse matrix-vector multiplication!
Price
Index
Entry

y += a(i,j) * x(j)

Performance Through Blocking?

(Images swiped from Berkeley’s BeBOP group.)

Performance Through Blocking?

More entries, but 1.5× performance on Pentium 3!
(Images swiped from Berkeley’s BeBOP group.)

Blocking Speeds Some Matches
Finite element matrix from Vavasis (in UF collection):

Observations

A blocked graph data structure may provide additional
performance if:
you iterate over whole rows,
the graph / matrix has runs of columns, and
you’re willing to use an automated tuning system.
Maximizing the runs: linear arrangement. Hard, but there may be
cheap heuristics. Only worth-while if you’re performing many
iterations. (For mat-vec, often > 50 computations of Ax.)

Empfohlen

To Swift 2...and Beyond!Scott Gardner

16 strings-and-text-processing-120712074956-phpapp02Abdul Samee

String java774474

Dynamic Objects,Pointer to function,Array & Pointer,Character String ProcessingMeghaj Mallick

Aae oop xp_06Niit Care

Seeing with Python presented at PyCon AU 2014Mark Rees

Syntax Comparison of Golang with C and Java - MindbowserMindbowser Inc

Exploiting Concurrency with Dynamic LanguagesTobias Lindaaker

Empfohlen

To Swift 2...and Beyond!Scott Gardner

16 strings-and-text-processing-120712074956-phpapp02Abdul Samee

String java774474

Dynamic Objects,Pointer to function,Array & Pointer,Character String ProcessingMeghaj Mallick

Aae oop xp_06Niit Care

Seeing with Python presented at PyCon AU 2014Mark Rees

Syntax Comparison of Golang with C and Java - MindbowserMindbowser Inc

Exploiting Concurrency with Dynamic LanguagesTobias Lindaaker

International Journal of Engineering Research and Development (IJERD)IJERD Editor

[JavaOne 2011] Models for Concurrent ProgrammingTobias Lindaaker

Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.

Project FortressAlex Miller

Input and output in C++Nilesh Dalvi

constructor with default arguments and dynamic initialization of objectsKanhaiya Saxena

Review constdestrrajudasraju

Fluent14Brendan Eich

Tiapcong lei

ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa

CBSE Question Paper Computer Science with C++ 2011Deepak Singh

Magical float reprdickinsm

Java Day-5People Strategists

O caml2014 leroy-slidesOCaml

Tpr star treeWin Yu

Using Akka FuturesKnoldus Inc.

05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...guestd436758

Value Objects, Full Throttle (to be updated for spring TC39 meetings)Brendan Eich

Fast Identification of Heavy Hitters by Cached and Packed Group TestingRakuten Group, Inc.

StringsNilesh Dalvi

GraphsAli Saleem

Micro-benchmarking the Tera MTAJason Riedy

Weitere ähnliche Inhalte

Was ist angesagt?

International Journal of Engineering Research and Development (IJERD)IJERD Editor

[JavaOne 2011] Models for Concurrent ProgrammingTobias Lindaaker

Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.

Project FortressAlex Miller

Input and output in C++Nilesh Dalvi

constructor with default arguments and dynamic initialization of objectsKanhaiya Saxena

Review constdestrrajudasraju

Fluent14Brendan Eich

Tiapcong lei

ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa

CBSE Question Paper Computer Science with C++ 2011Deepak Singh

Magical float reprdickinsm

Java Day-5People Strategists

O caml2014 leroy-slidesOCaml

Tpr star treeWin Yu

Using Akka FuturesKnoldus Inc.

05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...guestd436758

Value Objects, Full Throttle (to be updated for spring TC39 meetings)Brendan Eich

Fast Identification of Heavy Hitters by Cached and Packed Group TestingRakuten Group, Inc.

StringsNilesh Dalvi

Was ist angesagt? (20)

International Journal of Engineering Research and Development (IJERD)

[JavaOne 2011] Models for Concurrent Programming

Faster Practical Block Compression for Rank/Select Dictionaries

Project Fortress

Input and output in C++

constructor with default arguments and dynamic initialization of objects

Review constdestr

Fluent14

Tiap

ICML2013読み会 Large-Scale Learning with Less RAM via Randomization

CBSE Question Paper Computer Science with C++ 2011

Magical float repr

Java Day-5

O caml2014 leroy-slides

Tpr star tree

Using Akka Futures

05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...

Value Objects, Full Throttle (to be updated for spring TC39 meetings)

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Strings

Andere mochten auch

GraphsAli Saleem

Micro-benchmarking the Tera MTAJason Riedy

JDeodorant: Clone RefactoringNikolaos Tsantalis

Cross domain sentiment classification via spectral feature alignmentlau

Lect12 graph miningHouw Liong The

Graph Matchinggraphitech

Simple algorithm & hopcroft karp for bipartite graphMiguel Pereira

Intro to sentiment analysisTimea Turdean

Maximum Matching in General GraphsAhmad Khayyat

[ACM-ICPC] Dinic's AlgorithmChih-Hsuan Kuo

Andere mochten auch (10)

Graphs

Micro-benchmarking the Tera MTA

JDeodorant: Clone Refactoring

Cross domain sentiment classification via spectral feature alignment

Lect12 graph mining

Graph Matching

Simple algorithm & hopcroft karp for bipartite graph

Intro to sentiment analysis

Maximum Matching in General Graphs

[ACM-ICPC] Dinic's Algorithm

Ähnlich wie Sparse Data Structures for Weighted Bipartite Matching

Understanding Javascript Engines Parashuram N

20190927 generative models_aiaYi-Fan Liou

机器学习AdaboostShocky1

Optimizing array-based data structures to the limitRoman Leventov

JavaOne 2012 - JVM JIT for DummiesCharles Nutter

Hidden Truths in Dead Software PathsBen Hermann

Quality Python Homework HelpPython Homework Help

Beating Floating Point at its Own Game: Posit Arithmeticinside-BigData.com

Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam

Neural networksHarshitGupta367

Orthogonal Functional ArchitectureJohn De Goes

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Charles Nutter

Time and Space Complexity Analysis.pptxdudelover

Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit

Engineering fast indexes (Deepdive)Daniel Lemire

Haskell for data scienceJohn Cant

Scala collections wizardry - ScalapeñoSagie Davidovich

AnalysisSri Prasanna

Chapter3 bpkumar tm

ScalaCheckBeScala

Ähnlich wie Sparse Data Structures for Weighted Bipartite Matching (20)

Understanding Javascript Engines

20190927 generative models_aia

机器学习Adaboost

Optimizing array-based data structures to the limit

JavaOne 2012 - JVM JIT for Dummies

Hidden Truths in Dead Software Paths

Quality Python Homework Help

Beating Floating Point at its Own Game: Posit Arithmetic

Design and Analysis of Algorithms Lecture Notes

Neural networks

Orthogonal Functional Architecture

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...

Time and Space Complexity Analysis.pptx

Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...

Engineering fast indexes (Deepdive)

Haskell for data science

Scala collections wizardry - Scalapeño

Analysis

Chapter3 bp

ScalaCheck

Mehr von Jason Riedy

Lucata at the HPEC GraphBLAS BoFJason Riedy

LAGraph 2021-10-13Jason Riedy

Lucata at the HPEC GraphBLAS BoFJason Riedy

Graph analysis and novel architecturesJason Riedy

GraphBLAS and EmusJason Riedy

Reproducible Linear Algebra from Application to ArchitectureJason Riedy

PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy

ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy

Novel Architectures for Applications in Data Science and BeyondJason Riedy

Characterization of Emu Chick with MicrobenchmarksJason Riedy

CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy

Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy

Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy

CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy

A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy

High-Performance Analysis of Streaming Graphs Jason Riedy

High-Performance Analysis of Streaming GraphsJason Riedy

Updating PageRank for Streaming GraphsJason Riedy

Mehr von Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF

LAGraph 2021-10-13

Lucata at the HPEC GraphBLAS BoF

Graph analysis and novel architectures

GraphBLAS and Emus

Reproducible Linear Algebra from Application to Architecture

PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...

ICIAM 2019: Reproducible Linear Algebra from Application to Architecture

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis

Novel Architectures for Applications in Data Science and Beyond

Characterization of Emu Chick with Microbenchmarks

CRNCH 2018 Summit: Rogues Gallery Update

Augmented Arithmetic Operations Proposed for IEEE-754 2018

Graph Analysis: New Algorithm Models, New Architectures

CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms

A New Algorithm Model for Massive-Scale Streaming Graph Analysis

High-Performance Analysis of Streaming Graphs

Updating PageRank for Streaming Graphs

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .Alan Dix

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

How to write a Business Continuity PlanDatabarracks

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

unit 4 immunoblotting technique complete.pptxBkGupta21

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Streamlining Python Development: A Guide to a Modern Project Setup

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

DevEX - reference for building teams, processes, and platforms

Connect Wave/ connectwave Pitch Deck Presentation

How to write a Business Continuity Plan

Nell’iperspazio con Rocket: il Framework Web di Rust!

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Commit 2024 - Secret Management made easy

Ensuring Technical Readiness For Copilot in Microsoft 365

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Scanning the Internet for External Cloud Exposures via SSL Certs

How AI, OpenAI, and ChatGPT impact business and software.

Generative AI for Technical Writer or Information Developers

Take control of your SAP testing with UiPath Test Suite

Gen AI in Business - Global Trends Report 2024.pdf

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Moving Beyond Passwords: FIDO Paris Seminar.pdf

unit 4 immunoblotting technique complete.pptx

Sparse Data Structures for Weighted Bipartite Matching

1. Sparse Data Structures for Weighted Bipartite Matching E. Jason Riedy Dr. James Demmel (. . . and thanks to the BeBOP group) SIAM Workshop on Combinatorial Scientiﬁc Computing 2004

2. Use Sparse Matrix Optimizations... Take a ﬁxed, simple algorithm: Auction alg. for matchings Repeated iterations over a sparse graph. What’s expensive, and is there anything we can do about it? Take an idea from optimizing sparse matrix-vector products. A little speed-up in some cases, but there are more ideas available...

3. Where’s the Time Going? Auction algorithm: Iterative, greedy algorithm bipartite matching: 1. An unmatched row i ﬁnds a “most proﬁtable” column j π(i) = maxj b(i, j) − p(i) 2. Row i places a bid for column j. Bid price raised until j is no longer the best choice. (Min. increment µ) 3. Highest bid gets the matching (i, j).

4. Time linear in entries examined... Number of entries examined is problem-dependent. 4.5 4 3.5 3 time (s) 2.5 2 1.5 1 0.5 0 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 number of entries examined

5. Expensive Inner Loop! 1.3 GHz Itanium 2 10 8 6 4 2 0 100 150 200 250 300 350 400 450 cycles per entry

6. Verifying... Using kcachegrind (N.Nethercote and J.Weidendorfer) and valgrind (J. Seward).

7. And Locating... No obvious culprits in the instructions...

8. And Locating... But considering cache eﬀects!

9. Auction’s Inner Loop Price Index Entry value = entry - price save largest...

10. Auction’s Inner Loop Same accesses as sparse matrix-vector multiplication! Price Index Entry y += a(i,j) * x(j)

11. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)

12. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)

13. Performance Through Blocking? More entries, but 1.5× performance on Pentium 3! (Images swiped from Berkeley’s BeBOP group.)

14. Blocking Speeds Some Matches Finite element matrix from Vavasis (in UF collection):

15. Blocking Speeds Some Matches

16. Blocking Speeds Some Matches

17. Observations A blocked graph data structure may provide additional performance if: you iterate over whole rows, the graph / matrix has runs of columns, and you’re willing to use an automated tuning system. Maximizing the runs: linear arrangement. Hard, but there may be cheap heuristics. Only worth-while if you’re performing many iterations. (For mat-vec, often > 50 computations of Ax.)