SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Downloaden Sie, um offline zu lesen
Minimax Rates
            for
Homology Inference

        Don Sheehy

        Joint work with
    Sivaraman Balakrishan,
      Alessandro Rinaldo,
        Aarti Singh, and
       Larry Wasserman
Something like a joke.
Something like a joke.


 What is topological inference?
Something like a joke.


   What is topological inference?


It’s when you infer the topology of a
   space given only a finite subset.
Something like a joke.


   What is topological inference?


It’s when you infer the topology of a
   space given only a finite subset.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
Input: n points from a d-manifold M in D-dimensions.
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
sam pled i.i.d.

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
         pled i.i.d.   distribut d on
     sam               supporte

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
         pled i.i.d.             distribut d on
     sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                 e
                       with nois

Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?
                                                                       ing
                                                   probabil ity of giv
                                                                    r
                                                   a wro ng answe
Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?
                                                                       ing
                                                   probabil ity of giv
                                                                    r
                                                   a wro ng answe
Lower Bound: What is the worst case complexity of
the best possible algorithm?


                              The Goal:
                  Matching Bounds
                          (asymptotically)
Minimax risk is the error probability of the
best estimator on the hardest examples.
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf   sup    n ˆ
                                 Q (H = H(M ))
                      ˆ
                      H    Q∈Q
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf         sup    n ˆ
                                       Q (H = H(M ))
                             ˆ
                             H   Q∈Q

                the best r
                estimato
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                the best r       t
                                             t
                                   he hardes n
                estimato         d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion
                the best r       t
                                             t
                                   he hardes n   distribut
                estimato         d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion   the true
                the best r                   t
                                   he hardes n   distribut
                estimato
                                 t                             homology
                                 d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion   the true
                the best r                   t
                                   he hardes n   distribut
                estimato
                                 t                             homology
                                 d istributio




Sample Complexity:   n( ) = min{n : Rn ≤ }
We assume manifolds without boundary of
bounded volume and reach.
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
      3 The reach of M is at most τ .
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
      3 The reach of M is at most τ .

      Let P be the set of probability distributions
   supported over M ∈ M with densities bounded
   from below by a constant a.
We consider 4 different noise models.
Noiseless              Clutter




Tubular                Additive
We consider 4 different noise models.
Noiseless              Clutter


            Q=P




Tubular                Additive
We consider 4 different noise models.
Noiseless              Clutter
                                  Q = (1 − γ)U + γP
            Q=P
                                       P ∈P
                                    U is uniform
                                    on ball(0, 1)


Tubular                Additive
We consider 4 different noise models.
Noiseless                        Clutter
                                            Q = (1 − γ)U + γP
               Q=P
                                                 P ∈P
                                              U is uniform
                                              on ball(0, 1)


Tubular                          Additive
             Let QM,σ be
             uniform on M σ .

            Q = {QM,σ : M ∈ M}
We consider 4 different noise models.
Noiseless                        Clutter
                                            Q = (1 − γ)U + γP
               Q=P
                                                  P ∈P
                                              U is uniform
                                              on ball(0, 1)


Tubular                          Additive
             Let QM,σ be
             uniform on M σ .               Q = {P   Φ : P ∈ P}
                                              Φ is Gaussian
            Q = {QM,σ : M ∈ M}                with σ   τ
                                       or Φ has Fourier transform
                                       bounded away from 0
                                       and τ is fixed.
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y


                          n ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
                 inf sup Q (H
                  ˆ
                  H Q∈Q                  8
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y


                        ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
        Rn = inf sup Q (Hn
              ˆ
              H Q∈Q                  8
The lower bound requires two manifolds that are
geometrically close but topologically distinct.
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )   A = B  balld (0, 2τ )
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )   A = B  balld (0, 2τ )



        M1 = ∂(B )    τ
                                M2 = ∂(A )    τ
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )                 A = B  balld (0, 2τ )



        M1 = ∂(B )    τ
                                              M2 = ∂(A )    τ




                              The overlap
It suffices to bound the total variation distance.
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d

Minimax Risk:
     1                 2n 1         d 2n 1 −2Cd aτ d n
 Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
     8                    8              8
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d

Minimax Risk:
     1                 2n 1         d 2n 1 −2Cd aτ d n
 Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
     8                    8              8


Sampling Rate:                    1
                                      d
                                              1
                      n( ) ≥              log
                                  τ           ε
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the




        To prove: The density is bounded from below near M
        and from above far from M.
Many fundamental problems are still open.
Many fundamental problems are still open.


       1 Is the reach the right parameter?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
       4 How to choose parameters?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
       4 How to choose parameters?
       5 Are there efficient algorithms?
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt? (7)

Lecture5 xing
Lecture5 xingLecture5 xing
Lecture5 xing
 
Bachelor thesis of do dai chi
Bachelor thesis of do dai chiBachelor thesis of do dai chi
Bachelor thesis of do dai chi
 
Lesson19 Maximum And Minimum Values 034 Slides
Lesson19   Maximum And Minimum Values 034 SlidesLesson19   Maximum And Minimum Values 034 Slides
Lesson19 Maximum And Minimum Values 034 Slides
 
SIAM CSE 2017 talk
SIAM CSE 2017 talkSIAM CSE 2017 talk
SIAM CSE 2017 talk
 
Georgia Tech 2017 March Talk
Georgia Tech 2017 March TalkGeorgia Tech 2017 March Talk
Georgia Tech 2017 March Talk
 
Tulane March 2017 Talk
Tulane March 2017 TalkTulane March 2017 Talk
Tulane March 2017 Talk
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 

Ähnlich wie Minimax Rates for Homology Inference

Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01
宁 梅
 
Spatial Point Processes and Their Applications in Epidemiology
Spatial Point Processes and Their Applications in EpidemiologySpatial Point Processes and Their Applications in Epidemiology
Spatial Point Processes and Their Applications in Epidemiology
Lilac Liu Xu
 

Ähnlich wie Minimax Rates for Homology Inference (15)

Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
 
probability assignment help (2)
probability assignment help (2)probability assignment help (2)
probability assignment help (2)
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01
 
RSS Read Paper by Mark Girolami
RSS Read Paper by Mark GirolamiRSS Read Paper by Mark Girolami
RSS Read Paper by Mark Girolami
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
M16302
M16302M16302
M16302
 
Spatial Point Processes and Their Applications in Epidemiology
Spatial Point Processes and Their Applications in EpidemiologySpatial Point Processes and Their Applications in Epidemiology
Spatial Point Processes and Their Applications in Epidemiology
 
Probability and basic statistics with R
Probability and basic statistics with RProbability and basic statistics with R
Probability and basic statistics with R
 
mcp-bandits.pptx
mcp-bandits.pptxmcp-bandits.pptx
mcp-bandits.pptx
 
RossellaMarrano_vagueness
RossellaMarrano_vaguenessRossellaMarrano_vagueness
RossellaMarrano_vagueness
 
Robustness under Independent Contamination Model
Robustness under Independent Contamination ModelRobustness under Independent Contamination Model
Robustness under Independent Contamination Model
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 

Mehr von Don Sheehy

Persistent Homology and Nested Dissection
Persistent Homology and Nested DissectionPersistent Homology and Nested Dissection
Persistent Homology and Nested Dissection
Don Sheehy
 

Mehr von Don Sheehy (20)

Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
Characterizing the Distortion of Some Simple Euclidean Embeddings
Characterizing the Distortion of Some Simple Euclidean EmbeddingsCharacterizing the Distortion of Some Simple Euclidean Embeddings
Characterizing the Distortion of Some Simple Euclidean Embeddings
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Persistent Homology and Nested Dissection
Persistent Homology and Nested DissectionPersistent Homology and Nested Dissection
Persistent Homology and Nested Dissection
 
The Persistent Homology of Distance Functions under Random Projection
The Persistent Homology of Distance Functions under Random ProjectionThe Persistent Homology of Distance Functions under Random Projection
The Persistent Homology of Distance Functions under Random Projection
 
Geometric and Topological Data Analysis
Geometric and Topological Data AnalysisGeometric and Topological Data Analysis
Geometric and Topological Data Analysis
 
Geometric Separators and the Parabolic Lift
Geometric Separators and the Parabolic LiftGeometric Separators and the Parabolic Lift
Geometric Separators and the Parabolic Lift
 
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay TriangulationsA New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
 
Optimal Meshing
Optimal MeshingOptimal Meshing
Optimal Meshing
 
Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
Output-Sensitive Voronoi Diagrams and Delaunay Triangulations Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
 
Mesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisMesh Generation and Topological Data Analysis
Mesh Generation and Topological Data Analysis
 
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationSOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
 
Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...
Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...
Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...
 
A Multicover Nerve for Geometric Inference
A Multicover Nerve for Geometric InferenceA Multicover Nerve for Geometric Inference
A Multicover Nerve for Geometric Inference
 
ATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration
ATMCS: Linear-Size Approximations to the Vietoris-Rips FiltrationATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration
ATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration
 
New Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal MeshesNew Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal Meshes
 
Flips in Computational Geometry
Flips in Computational GeometryFlips in Computational Geometry
Flips in Computational Geometry
 
Beating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point MeshingBeating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point Meshing
 
Ball Packings and Fat Voronoi Diagrams
Ball Packings and Fat Voronoi DiagramsBall Packings and Fat Voronoi Diagrams
Ball Packings and Fat Voronoi Diagrams
 
Learning with Nets and Meshes
Learning with Nets and MeshesLearning with Nets and Meshes
Learning with Nets and Meshes
 

Minimax Rates for Homology Inference

  • 1. Minimax Rates for Homology Inference Don Sheehy Joint work with Sivaraman Balakrishan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman
  • 3. Something like a joke. What is topological inference?
  • 4. Something like a joke. What is topological inference? It’s when you infer the topology of a space given only a finite subset.
  • 5. Something like a joke. What is topological inference? It’s when you infer the topology of a space given only a finite subset.
  • 6. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 7. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 8. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 9. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 10.
  • 11. Input: n points from a d-manifold M in D-dimensions.
  • 12. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M.
  • 13. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity?
  • 14. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 15. sam pled i.i.d. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 16. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 17. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e with nois Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 18. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 19. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? ing probabil ity of giv r a wro ng answe Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 20. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? ing probabil ity of giv r a wro ng answe Lower Bound: What is the worst case complexity of the best possible algorithm? The Goal: Matching Bounds (asymptotically)
  • 21. Minimax risk is the error probability of the best estimator on the hardest examples.
  • 22. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q
  • 23. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q the best r estimato
  • 24. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q the best r t t he hardes n estimato d istributio
  • 25. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the best r t t he hardes n distribut estimato d istributio
  • 26. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the true the best r t he hardes n distribut estimato t homology d istributio
  • 27. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the true the best r t he hardes n distribut estimato t homology d istributio Sample Complexity: n( ) = min{n : Rn ≤ }
  • 28. We assume manifolds without boundary of bounded volume and reach.
  • 29. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that
  • 30. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1)
  • 31. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd
  • 32. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd 3 The reach of M is at most τ .
  • 33. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd 3 The reach of M is at most τ . Let P be the set of probability distributions supported over M ∈ M with densities bounded from below by a constant a.
  • 34. We consider 4 different noise models. Noiseless Clutter Tubular Additive
  • 35. We consider 4 different noise models. Noiseless Clutter Q=P Tubular Additive
  • 36. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive
  • 37. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive Let QM,σ be uniform on M σ . Q = {QM,σ : M ∈ M}
  • 38. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive Let QM,σ be uniform on M σ . Q = {P Φ : P ∈ P} Φ is Gaussian Q = {QM,σ : M ∈ M} with σ τ or Φ has Fourier transform bounded away from 0 and τ is fixed.
  • 39. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds.
  • 40. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8
  • 41. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y
  • 42. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y n ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n inf sup Q (H ˆ H Q∈Q 8
  • 43. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n Rn = inf sup Q (Hn ˆ H Q∈Q 8
  • 44. The lower bound requires two manifolds that are geometrically close but topologically distinct.
  • 45. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ )
  • 46. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ ) M1 = ∂(B ) τ M2 = ∂(A ) τ
  • 47. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ ) M1 = ∂(B ) τ M2 = ∂(A ) τ The overlap
  • 48. It suffices to bound the total variation distance.
  • 49. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d
  • 50. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d Minimax Risk: 1 2n 1 d 2n 1 −2Cd aτ d n Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e 8 8 8
  • 51. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d Minimax Risk: 1 2n 1 d 2n 1 −2Cd aτ d n Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e 8 8 8 Sampling Rate: 1 d 1 n( ) ≥ log τ ε
  • 52. The upper bound uses a union of balls to estimate the homology of M.
  • 53. The upper bound uses a union of balls to estimate the homology of M.
  • 54. The upper bound uses a union of balls to estimate the homology of M.
  • 55. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls.
  • 56. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 57. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 58. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 59. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 60. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 61. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 62. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 63. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the To prove: The density is bounded from below near M and from above far from M.
  • 64. Many fundamental problems are still open.
  • 65. Many fundamental problems are still open. 1 Is the reach the right parameter?
  • 66. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary?
  • 67. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence?
  • 68. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence? 4 How to choose parameters?
  • 69. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence? 4 How to choose parameters? 5 Are there efficient algorithms?