SlideShare a Scribd company logo
1 of 20
Maximizing Correctness with Minimal User Effort
to Learn Data Transformations
Bo Wu and Craig Knoblock
University of Southern California
1
Department of Computer Science
2
Art website Buyer
3
Dimension of artworks
4
Programming by Example
Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)
Too Many Records
5
Overconfident Users
6
Users are often too confident to examine the results thoroughly
Variations
7
Problem
Enable the users of PBE systems to achieve maximal
correctness with minimal effort on large datasets
8
Help users to identify at least one of all incorrect
records in every iteration with minimal effort on
large datasets
Approach Overview
9
Raw Transformed
10“ H x 8” W 10
H: 58 x
W:25”
58
12”H x 9”W 12
11”H x 6” 11
… …
30 x 46” 30 x 46
Entire dataset
Random
Sampling
Raw Transformed
10“ H x 8” W 10
11”H x 6” 11
… …
30 x 46” 30 x 46
Sampled records
Verifying records
Raw Transformed
11”H x 6” 11
30 x 46” 30 x 46
… …
Sorting and
color-codingRaw Transformed
30 x 46” 30 x 46
11”H x 6” 11
… …
Learning from users’ feedback
10
Verifying Records
• First recommend records causing runtime
errors
– Records cause the program exit abnormally
• Second recommend potentially incorrect
records
– Learn a binary meta-classifier
11
Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic
Raw Transformed
11”H x 6” 11
30 x 46” 30 x 46
… …
Ex:
Learning the Meta-classifier
12
cs1
…
Meta-classifier
cs2
cs4 cs3
cp1
…
cp2
cp3 cp4
cf1
…
cf2
cf3 cf4
Program agreement
Format ambiguity
Similarity
cs3
cs4
cp2
cf1
w1
w2
w3
w4
…
Evaluation
• The recommendation contains incorrect
records
13
Evaluation
• The recommendation can place incorrect
records on top
14
User study
15
Experiment setup:
• 5 scenarios with 4000 records per scenario
• 10 graduate students divided into two groups
Summary and Future Work
• Summary
– Sample records
– Identify incorrect/questionable records
– Allow user to refine the recommendation
– Color-code the results
• Future work
– Show histograms of the data
– Translate the program to readable natural text
16
17
Questions ?
Data and system available at
https://github.com/areshand/Web-Karma
Type of Classifiers
• Classifier based on distance
• Classifier based on agreement of programs
• Classifier based on format ambiguity
18
Learning from various past results
19
…
Raw Transformed
26" H x 24" W x 12.5 26
Framed at 21.75" H x 24.25” W 21
12" H x 9" 12
…
Raw Transformed
Ravage 2099#24 (November, 1994) November, 1994
Gambit III#1 (September, 1997) September, 1997
(comic) Spidey Super Stories#12/2
(September, 1975)
comic
…
Examples
Incorrect
records
Correct
records
Sorting Records
20
Runtime errors
Rank records
using #failed_subprograms
Rank records
using meta-classifier output
Yes
No
Checking
transformed
records
Record #failed_subprograms
2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic 3
1998 Honda Civic 12k miles s. Auto. - $3800 (Arcadia) 2

More Related Content

Similar to Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesRui Vieira
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18Matt Yang
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinityPeterMorrell4
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptxArthur240715
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesJo-fai Chow
 

Similar to Maximizing Correctness with Minimal User Effort to Learn Data Transformations (20)

Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
 
Big data
Big dataBig data
Big data
 
IFAC MIM 2013
IFAC MIM 2013IFAC MIM 2013
IFAC MIM 2013
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
 
SMS Module-4(theory) ppt.pptx
SMS Module-4(theory) ppt.pptxSMS Module-4(theory) ppt.pptx
SMS Module-4(theory) ppt.pptx
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing Values
 
Kx for wine tasting
Kx for wine tastingKx for wine tasting
Kx for wine tasting
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

  • 1. Maximizing Correctness with Minimal User Effort to Learn Data Transformations Bo Wu and Craig Knoblock University of Southern California 1 Department of Computer Science
  • 4. 4 Programming by Example Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)
  • 6. Overconfident Users 6 Users are often too confident to examine the results thoroughly
  • 8. Problem Enable the users of PBE systems to achieve maximal correctness with minimal effort on large datasets 8 Help users to identify at least one of all incorrect records in every iteration with minimal effort on large datasets
  • 9. Approach Overview 9 Raw Transformed 10“ H x 8” W 10 H: 58 x W:25” 58 12”H x 9”W 12 11”H x 6” 11 … … 30 x 46” 30 x 46 Entire dataset Random Sampling Raw Transformed 10“ H x 8” W 10 11”H x 6” 11 … … 30 x 46” 30 x 46 Sampled records Verifying records Raw Transformed 11”H x 6” 11 30 x 46” 30 x 46 … … Sorting and color-codingRaw Transformed 30 x 46” 30 x 46 11”H x 6” 11 … …
  • 10. Learning from users’ feedback 10
  • 11. Verifying Records • First recommend records causing runtime errors – Records cause the program exit abnormally • Second recommend potentially incorrect records – Learn a binary meta-classifier 11 Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic Raw Transformed 11”H x 6” 11 30 x 46” 30 x 46 … … Ex:
  • 12. Learning the Meta-classifier 12 cs1 … Meta-classifier cs2 cs4 cs3 cp1 … cp2 cp3 cp4 cf1 … cf2 cf3 cf4 Program agreement Format ambiguity Similarity cs3 cs4 cp2 cf1 w1 w2 w3 w4 …
  • 13. Evaluation • The recommendation contains incorrect records 13
  • 14. Evaluation • The recommendation can place incorrect records on top 14
  • 15. User study 15 Experiment setup: • 5 scenarios with 4000 records per scenario • 10 graduate students divided into two groups
  • 16. Summary and Future Work • Summary – Sample records – Identify incorrect/questionable records – Allow user to refine the recommendation – Color-code the results • Future work – Show histograms of the data – Translate the program to readable natural text 16
  • 17. 17 Questions ? Data and system available at https://github.com/areshand/Web-Karma
  • 18. Type of Classifiers • Classifier based on distance • Classifier based on agreement of programs • Classifier based on format ambiguity 18
  • 19. Learning from various past results 19 … Raw Transformed 26" H x 24" W x 12.5 26 Framed at 21.75" H x 24.25” W 21 12" H x 9" 12 … Raw Transformed Ravage 2099#24 (November, 1994) November, 1994 Gambit III#1 (September, 1997) September, 1997 (comic) Spidey Super Stories#12/2 (September, 1975) comic … Examples Incorrect records Correct records
  • 20. Sorting Records 20 Runtime errors Rank records using #failed_subprograms Rank records using meta-classifier output Yes No Checking transformed records Record #failed_subprograms 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic 3 1998 Honda Civic 12k miles s. Auto. - $3800 (Arcadia) 2

Editor's Notes

  1. Ashley wants to buy a painting for the space over her sofa She has strict space limits. Ex: the painting should be about 60’’ wide and 40’’ high
  2. Ashley got a spreadsheet of artworks on sale. The sizes information that she got is a long list of entries with the height, width and even depth in one entry. She has to split them into three columns and remove some extra text such as “H:”, “in.”, etc. Thus, she can then filter the artworks based on each degree’s size. Dataset has so many records that it requires her to write programs to solve problem. Problem: a long learning curve to learn this skill. The time should be used to decorate her house instead.
  3. Programming by example doesn’t require users to write code anymore.
  4. The list can have thousands of records. It is really hard to notice some records in the middle that are transformed incorrectly.
  5. According to previous research, User often believe that they have carefully examined all the records. They stop checking the results when there is still a large percentage of incorrect records in the dataset.
  6. To identify the Cannot rely on single rule or
  7. Random sampling is to address the too many records problem Verifying records can capture incorrect records in various scenarios Sorting and color-coding is to address over confident user problem Can also learn from the users interaction in current iteration to refine the recommendation
  8. Learn from the users feedback to refine the recommendation
  9. First, describe correctness Second, iteration time Third, total time. explain why certain scenarios have longer total time. Why in s5 and s3 beta has twice the iteration time as our approach? Why does the iteration time in beta varies much more than the times in our approach?
  10. Summary vs Conclusion