SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Converting High Dimensional
Problems to Low Dimensional
           Ones
General Paradigm
                 Reduce and Conquer

• Large Problem  Small Problem

   – Break array into two parts

   – Consider odd and even elements

   – Sample edges in a graph to obtain a smaller graph

   – Represent a graph by a collection of trees

   – Take number modulo small prime

   – Multiply matrix by a random vector

   – Project high dimensional point sets into fewer dimensions
The Problem


• Given n points in D dimensional space

• Project them in d << D dimensions
   – So (Euclidean) distance between every pair of points is
     (almost) preserved

• How does d compare to n?
Application


• Hierarchical Clustering

• Say ten thousand samples each over a few million
  SNPs

• Few million  Few Hundreds/Thousands? And Fast?
First Attempt


• Can we make d=n-1?

  – X axis through 2 of the points

  – Y axis so 3rd point is in the XY
    plane

  – Z axis so 4th point is in the XYZ
    3d space

  – And so on
First Attempt


• Time taken

  – Each new axis has to be made
    orthogonal to all previous axes

  – O(n2 D)

  – Too slow
Second Attempt
          Use Random Projections


• Take d random vectors r1..rd

• For every point p, take the d dimensional point
      • [ p.r1 p.r2 .. p.rd ] * scaling-factor

• Do these d-dim points preserve inter-point
  distances approximately? How large should d be?
Random Projections
              Further Simplification


• Take any vector p in D dimensions

• Suppose we show
   – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p|
   – Failure prob < 1/n3

• Prob that even one of the n2 difference vector
  lengths is not preserved with prob < n2/n3 ~ 1/n
Random Projections
        What is a random vector?



• No directional bias
Normal Distributions

• Pr of being between x and x+dx




       For N(0,1), ~ e-x2/2
Generating Random Vectors without
           Directional Bias
• Take D numbers (X1...XD), each N(0,1), independently

• Distribution of each number X
   – Pr of being between a..a+da ~ e-a2/2

• Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD
   – e-a12/2 e-a22/2 … e-aD2/2   da1da2….daD
   – e-(a12+a22+aD2)/2           da1da2….daD
   – e-l2/2                      da1da2….daD

   So no dependence on direction, only on length l !
The Algorithm

• Take d random vectors r1..rd
   – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from
     N(0,1) independently


• For every point p, take the d dimensional point
      • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d)

• Time: n*d*D
Simplifying Further

• Take any vector p in D dimensions

• We need to show that
    • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p|
    • Failure prob < 1/n3

• We can assume p to be 1 0 0 0 0 0 …
   – because random vectors have no directional bias
   – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
Analysis

• We need to show that
       • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1
       • Failure prob < 1/n3

• Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3


• Or (X12+…+Xd2) ~ d, failure prob < 1/n3


• Note Xi has mean 1 and s.d sqrt(2)
Law of Large Numbers

• Y1..Yd each with any (decent) distribution with mean
  1 and s.d sqrt(2)

• Then Y1+…+Yd tends to a Normal distribution with
  mean d and s.d sqrt(2d) (for large d)

• Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) <
      • e-(∆d)2/2.2d = e-∆2d/4
• Choose d=12 ln n/∆2 , this is < 1/n3 as needed
Conclusion


• n numbers in D dimensions

  – can be projected to 12 ln n/∆2 dimensions

  – all distances stretch only by (1+/-∆)

  – with prob > 1-1/n

Weitere ähnliche Inhalte

Was ist angesagt?

Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
Traian Rebedea
 

Was ist angesagt? (20)

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
 
Greedy Algorithms with examples' b-18298
Greedy Algorithms with examples'  b-18298Greedy Algorithms with examples'  b-18298
Greedy Algorithms with examples' b-18298
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression model
 
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
 
Cpsc125 ch6sec3
Cpsc125 ch6sec3Cpsc125 ch6sec3
Cpsc125 ch6sec3
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Daa unit 4
Daa unit 4Daa unit 4
Daa unit 4
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural Network
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Discrete fourier transform
Discrete fourier transformDiscrete fourier transform
Discrete fourier transform
 
Ram minimum spanning tree
Ram   minimum spanning treeRam   minimum spanning tree
Ram minimum spanning tree
 
Shortest Path Problem
Shortest Path ProblemShortest Path Problem
Shortest Path Problem
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 

Andere mochten auch

Andere mochten auch (12)

Towards energy efficient big data gathering
Towards energy efficient big data gatheringTowards energy efficient big data gathering
Towards energy efficient big data gathering
 
Energy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networksEnergy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networks
 
Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015
 
Batch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightBatch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to Insight
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 
DESIGN AND ANALYSIS OF ALGORITHM (DAA)
DESIGN AND ANALYSIS OF ALGORITHM (DAA)DESIGN AND ANALYSIS OF ALGORITHM (DAA)
DESIGN AND ANALYSIS OF ALGORITHM (DAA)
 
Nosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networks
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Ähnlich wie Converting High Dimensional Problems to Low Dimensional Ones

Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductors
wtyru1989
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdf
ssuser8b461f
 
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsAchieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Don Sheehy
 
Multiple intigration ppt
Multiple intigration pptMultiple intigration ppt
Multiple intigration ppt
Manish Mor
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.
Mohd Arif
 

Ähnlich wie Converting High Dimensional Problems to Low Dimensional Ones (20)

Chap-2 Preliminary Concepts and Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and  Linear Finite Elements.pptxChap-2 Preliminary Concepts and  Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and Linear Finite Elements.pptx
 
ML unit2.pptx
ML unit2.pptxML unit2.pptx
ML unit2.pptx
 
Computing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum AlgorithmsComputing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum Algorithms
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Lecture5
Lecture5Lecture5
Lecture5
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Line drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdfLine drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdf
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductors
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdf
 
IARE_DSP_PPT.pptx
IARE_DSP_PPT.pptxIARE_DSP_PPT.pptx
IARE_DSP_PPT.pptx
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsAchieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
 
Multiple ppt
Multiple pptMultiple ppt
Multiple ppt
 
Multiple intigration ppt
Multiple intigration pptMultiple intigration ppt
Multiple intigration ppt
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.
 
Algorithms - A Sneak Peek
Algorithms - A Sneak PeekAlgorithms - A Sneak Peek
Algorithms - A Sneak Peek
 

Mehr von Strand Life Sciences Pvt Ltd

Mehr von Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Converting High Dimensional Problems to Low Dimensional Ones

  • 1. Converting High Dimensional Problems to Low Dimensional Ones
  • 2. General Paradigm Reduce and Conquer • Large Problem  Small Problem – Break array into two parts – Consider odd and even elements – Sample edges in a graph to obtain a smaller graph – Represent a graph by a collection of trees – Take number modulo small prime – Multiply matrix by a random vector – Project high dimensional point sets into fewer dimensions
  • 3. The Problem • Given n points in D dimensional space • Project them in d << D dimensions – So (Euclidean) distance between every pair of points is (almost) preserved • How does d compare to n?
  • 4. Application • Hierarchical Clustering • Say ten thousand samples each over a few million SNPs • Few million  Few Hundreds/Thousands? And Fast?
  • 5. First Attempt • Can we make d=n-1? – X axis through 2 of the points – Y axis so 3rd point is in the XY plane – Z axis so 4th point is in the XYZ 3d space – And so on
  • 6. First Attempt • Time taken – Each new axis has to be made orthogonal to all previous axes – O(n2 D) – Too slow
  • 7. Second Attempt Use Random Projections • Take d random vectors r1..rd • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * scaling-factor • Do these d-dim points preserve inter-point distances approximately? How large should d be?
  • 8. Random Projections Further Simplification • Take any vector p in D dimensions • Suppose we show – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p| – Failure prob < 1/n3 • Prob that even one of the n2 difference vector lengths is not preserved with prob < n2/n3 ~ 1/n
  • 9. Random Projections What is a random vector? • No directional bias
  • 10. Normal Distributions • Pr of being between x and x+dx For N(0,1), ~ e-x2/2
  • 11. Generating Random Vectors without Directional Bias • Take D numbers (X1...XD), each N(0,1), independently • Distribution of each number X – Pr of being between a..a+da ~ e-a2/2 • Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD – e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD – e-(a12+a22+aD2)/2 da1da2….daD – e-l2/2 da1da2….daD So no dependence on direction, only on length l !
  • 12. The Algorithm • Take d random vectors r1..rd – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from N(0,1) independently • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) • Time: n*d*D
  • 13. Simplifying Further • Take any vector p in D dimensions • We need to show that • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p| • Failure prob < 1/n3 • We can assume p to be 1 0 0 0 0 0 … – because random vectors have no directional bias – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
  • 14. Analysis • We need to show that • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1 • Failure prob < 1/n3 • Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3 • Or (X12+…+Xd2) ~ d, failure prob < 1/n3 • Note Xi has mean 1 and s.d sqrt(2)
  • 15. Law of Large Numbers • Y1..Yd each with any (decent) distribution with mean 1 and s.d sqrt(2) • Then Y1+…+Yd tends to a Normal distribution with mean d and s.d sqrt(2d) (for large d) • Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) < • e-(∆d)2/2.2d = e-∆2d/4 • Choose d=12 ln n/∆2 , this is < 1/n3 as needed
  • 16. Conclusion • n numbers in D dimensions – can be projected to 12 ln n/∆2 dimensions – all distances stretch only by (1+/-∆) – with prob > 1-1/n