SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Clustering Using
Integer Programming
Katie Kuksenok
Michael Brooks
What are we doing?
• Investigating IP formulations of the clustering
  problem

• Developing a web application for clustering

• Applying clustering to several datasets

• Visualizing resulting clusters
The Clustering Problem
• Input:
  ▫ N entities
  ▫ Each entity has a value for each of M features

• Goal:
  ▫ Partition the entities “optimally”

• Issues with clustering:
  ▫ Definition of “optimal”
  ▫ Different feature types don’t work well together
The Clustering Problem
Exposes high-level properties of data

Important for exploratory data analysis in data
 mining

Finding the optimal clustering is NP-Complete
Grotschel and Wakabayashi IP
• Recall the Grotschel-Wakabayashi formulation
  (clustering wild cats):

• Variables: xij = 1 if i and j are in the same cluster

• Data: wij = disagreements(i,j) – agreements(i,j)

• Minimize: SUM { wij xij : over all i, j }
Formulation: Constraints
• Constraints for reflexivity, symmetry, transitivity
  reduce to the following constraints:
Exploratory Clustering Application
• Currently in development

• User provides data to website
• Web server processes data
  ▫ Produces OPL code
  ▫ Runs the model in LPSolve (another MIP solver)
• Results are returned to the user, with visuals
Preliminary Exploration
• With working parts of application:
• can cluster and visualize datasets

• verified Grotschel/Wakabayashi clustering results
• used clustering to explore senate voting records
Data Sets (Cats)
• Cluster 30 varieties of wild cats (from paper)
• Cats have 14 features
• Reproduced results from paper

• Model has 12180 constraints and 435 variables
• Runs in less than 1 second
• No branching
Data Sets (Senate Voting Records)
• 100 senators
• Senate records for 8 passed bills from this year

• Model has 485100 constraints and 4950 variables
• Runs in 1 minute
• No branching
Data Sets (Senate Voting By State)
• Grouped senate voting by state
• If senators for a state disagreed, ‘Maybe’

• Model has 58800 constraints and 1225 variables
• Runs in 1.5 seconds
• No branching
What now?
• The Application
 ▫ We’d like to put it all together
• Branch and Bound
 ▫ No real-life datasets so far have required branching
 ▫ We have constructed datasets that did
 ▫ Can we determine under what conditions branching
   will be required?
This is a guepard (cheetah in French?):




       …any other questions?
Approaches to Clustering

             • What makes a good clustering
               algorithm?
              ▫ Runtime and space efficiency (when N is
                large)
     Many
algorithms
              ▫ Ability to quickly incorporate new data
              ▫ Ability to deal with many-dimensional
        IP      data (when M is large)
              ▫ Optimality (closeness to)

Weitere ähnliche Inhalte

Andere mochten auch

Helping Computers Find Meaning they Lost in Translation
Helping Computers Find Meaning they Lost in TranslationHelping Computers Find Meaning they Lost in Translation
Helping Computers Find Meaning they Lost in TranslationKaterena Kuksenok
 
Games on Social Networks: Balancing Friendships
Games on Social Networks: Balancing FriendshipsGames on Social Networks: Balancing Friendships
Games on Social Networks: Balancing FriendshipsKaterena Kuksenok
 
Storyboard(printscreen)
Storyboard(printscreen)Storyboard(printscreen)
Storyboard(printscreen)Sam Arblaster
 
Uncertainty in Chronic Illness and Patients' Online Experience
Uncertainty in Chronic Illness and Patients' Online ExperienceUncertainty in Chronic Illness and Patients' Online Experience
Uncertainty in Chronic Illness and Patients' Online ExperienceKaterena Kuksenok
 
Michel Foucault
Michel FoucaultMichel Foucault
Michel Foucaultkhalfyard
 
Brand management Full notes
Brand management Full notesBrand management Full notes
Brand management Full notesversatileBschool
 
Mishel's Uncertainty in Illness Theory
Mishel's Uncertainty in Illness TheoryMishel's Uncertainty in Illness Theory
Mishel's Uncertainty in Illness TheorySujata Mohapatra
 
Patricia Benner (Novice to Expert Theory)
Patricia Benner (Novice to Expert Theory)Patricia Benner (Novice to Expert Theory)
Patricia Benner (Novice to Expert Theory)Mary Ann Adiong
 

Andere mochten auch (8)

Helping Computers Find Meaning they Lost in Translation
Helping Computers Find Meaning they Lost in TranslationHelping Computers Find Meaning they Lost in Translation
Helping Computers Find Meaning they Lost in Translation
 
Games on Social Networks: Balancing Friendships
Games on Social Networks: Balancing FriendshipsGames on Social Networks: Balancing Friendships
Games on Social Networks: Balancing Friendships
 
Storyboard(printscreen)
Storyboard(printscreen)Storyboard(printscreen)
Storyboard(printscreen)
 
Uncertainty in Chronic Illness and Patients' Online Experience
Uncertainty in Chronic Illness and Patients' Online ExperienceUncertainty in Chronic Illness and Patients' Online Experience
Uncertainty in Chronic Illness and Patients' Online Experience
 
Michel Foucault
Michel FoucaultMichel Foucault
Michel Foucault
 
Brand management Full notes
Brand management Full notesBrand management Full notes
Brand management Full notes
 
Mishel's Uncertainty in Illness Theory
Mishel's Uncertainty in Illness TheoryMishel's Uncertainty in Illness Theory
Mishel's Uncertainty in Illness Theory
 
Patricia Benner (Novice to Expert Theory)
Patricia Benner (Novice to Expert Theory)Patricia Benner (Novice to Expert Theory)
Patricia Benner (Novice to Expert Theory)
 

Ähnlich wie IP Clustering as Exploratory Data Analysis

Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical dataPaul Skeie
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening ClassesMenish Gupta
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureBogdan Bocse
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017VisageCloud
 
Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhcdrsm79
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist zekeLabs Technologies
 
Master guide to become a Data Scientist -by zekeLabs
Master guide to become a Data Scientist  -by zekeLabsMaster guide to become a Data Scientist  -by zekeLabs
Master guide to become a Data Scientist -by zekeLabszekeLabs Technologies
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
 
Research network infrastructure engineers
Research network infrastructure engineersResearch network infrastructure engineers
Research network infrastructure engineersJisc
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
Zestimate Lambda Architecture
Zestimate Lambda ArchitectureZestimate Lambda Architecture
Zestimate Lambda ArchitectureSteven Hoelscher
 
SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART Infrastructure Facility
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryHow we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryOUTFITTERY
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 

Ähnlich wie IP Clustering as Exploratory Data Analysis (20)

Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening Classes
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
 
Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhc
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Master guide to become a Data Scientist -by zekeLabs
Master guide to become a Data Scientist  -by zekeLabsMaster guide to become a Data Scientist  -by zekeLabs
Master guide to become a Data Scientist -by zekeLabs
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
Research network infrastructure engineers
Research network infrastructure engineersResearch network infrastructure engineers
Research network infrastructure engineers
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Zestimate Lambda Architecture
Zestimate Lambda ArchitectureZestimate Lambda Architecture
Zestimate Lambda Architecture
 
SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryHow we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Big Data
Big DataBig Data
Big Data
 

Kürzlich hochgeladen

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 

Kürzlich hochgeladen (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

IP Clustering as Exploratory Data Analysis

  • 2. What are we doing? • Investigating IP formulations of the clustering problem • Developing a web application for clustering • Applying clustering to several datasets • Visualizing resulting clusters
  • 3. The Clustering Problem • Input: ▫ N entities ▫ Each entity has a value for each of M features • Goal: ▫ Partition the entities “optimally” • Issues with clustering: ▫ Definition of “optimal” ▫ Different feature types don’t work well together
  • 4. The Clustering Problem Exposes high-level properties of data Important for exploratory data analysis in data mining Finding the optimal clustering is NP-Complete
  • 5. Grotschel and Wakabayashi IP • Recall the Grotschel-Wakabayashi formulation (clustering wild cats): • Variables: xij = 1 if i and j are in the same cluster • Data: wij = disagreements(i,j) – agreements(i,j) • Minimize: SUM { wij xij : over all i, j }
  • 6. Formulation: Constraints • Constraints for reflexivity, symmetry, transitivity reduce to the following constraints:
  • 7. Exploratory Clustering Application • Currently in development • User provides data to website • Web server processes data ▫ Produces OPL code ▫ Runs the model in LPSolve (another MIP solver) • Results are returned to the user, with visuals
  • 8. Preliminary Exploration • With working parts of application: • can cluster and visualize datasets • verified Grotschel/Wakabayashi clustering results • used clustering to explore senate voting records
  • 9. Data Sets (Cats) • Cluster 30 varieties of wild cats (from paper) • Cats have 14 features • Reproduced results from paper • Model has 12180 constraints and 435 variables • Runs in less than 1 second • No branching
  • 10.
  • 11. Data Sets (Senate Voting Records) • 100 senators • Senate records for 8 passed bills from this year • Model has 485100 constraints and 4950 variables • Runs in 1 minute • No branching
  • 12.
  • 13. Data Sets (Senate Voting By State) • Grouped senate voting by state • If senators for a state disagreed, ‘Maybe’ • Model has 58800 constraints and 1225 variables • Runs in 1.5 seconds • No branching
  • 14.
  • 15. What now? • The Application ▫ We’d like to put it all together • Branch and Bound ▫ No real-life datasets so far have required branching ▫ We have constructed datasets that did ▫ Can we determine under what conditions branching will be required?
  • 16. This is a guepard (cheetah in French?): …any other questions?
  • 17.
  • 18. Approaches to Clustering • What makes a good clustering algorithm? ▫ Runtime and space efficiency (when N is large) Many algorithms ▫ Ability to quickly incorporate new data ▫ Ability to deal with many-dimensional IP data (when M is large) ▫ Optimality (closeness to)