SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Sequence mining algorithm



           Monica Dăgădiţă
                        ISI
 Introduction
             to sequence mining
 Why sequence mining?
 Sequence mining algorithms
 SPADE
    Motivation
    Definitions and examples
    Algorithm
    Implementation




                     Data Mining   11/8/2011   2
 Aim - finding statistically relevant patterns
 between data examples where the values are
 delivered in a sequence

 Originallyintroduced for market basket
 analysis - customer behaviour predictions

2    types of sequence mining:
     string mining – biology (gene/protein sequences)
     itemset mining - marketing and CRM applications

                       Data Mining   11/8/2011   3
 Discovering   patterns:
    Bookstore: 70% of the people who buy Jane
     Austen’s “Pride and Prejudice” also buy “Emma”
     within a month
    Website: finding sequences of most frequently
     accessed pages

 Usage:
    Promotions
    Shelf placement
    Restructure the website
    Recommender systems

                     Data Mining   11/8/2011   4
 Apriori
 GSP  (Generalized Sequential Pattern)
 FreeSpan (Frequent pattern-projected
  Sequential pattern mining)
 PrefixSpan (Prefix-projected Sequential
  pattern mining)
 SPADE (Sequential PAttern Discovery using
  Equivalence classes)




                  Data Mining   11/8/2011   5
 Problems   of existing solutions
    Repeated database scans
    Complex internal data structures


 Key   features of SPADE:
    Fixed number of database scans
    Vertical id-list database format
    Decomposition of search space into smaller
     pieces – processed independently




                     Data Mining   11/8/2011      6
 Itemset:    set of m distinct items
   I = {i1, i2, …, im }
 Event: non-empty collection of items
   (i1,i2 … ik)
 Sequence : ordered list of events
  < e1 -> e2 -> … -> en >
 K-sequence : sequence with k items
  (B->AC) – 3-sequence



                  Data Mining   11/8/2011   7
 Subsequence:   given two sequences α=<a1 a2 … an>
 and β=<b1 b2 … bm>, α is called a subsequence of
 β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2
 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn

  Examples:
  1. (B->AC) is a subsequence of (AB->E->ACD)
  2. (AB->E) is not a subsequence of (ABE)




                    Data Mining   11/8/2011     8
Data Mining   11/8/2011   9
Id-lists of the most frequent items (1-sequences)




                   Data Mining   11/8/2011   10
 D->BF->A
    Step 1: D->B




    Step 2: D->BF




                     Data Mining   11/8/2011   11
 D->BF->A
    Step 3 : D->BF->A




 Not   space-efficient
    Solution: 2 columns - (sid,eid) for each sequence
    Eid – id of the sequence’s last item


                      Data Mining   11/8/2011   12
 D->BF->A   (space-efficient id-list joins)
                                                               D->B

                                                       SID       EID
                                                       1         15
                                                       1         20
                                                       4         20




                   D->BF->A                                  D->BF

             SID       EID                         SID          EID
             1         25                          1            20
             4         25                          4            20


                         Data Mining   11/8/2011                      13
 Complete   latice representation




                   Data Mining   11/8/2011   14
Data Mining   11/8/2011   15
 Decomposing  the latice => smaller pieces
 that can be solved independently

 Equivalence   classes
 2 sequences are in the same class (Θk) if they
  share a common k length prefix
 Example
   k=1 : Θ1 -> {[A],[B],[D],[F]}




                    Data Mining   11/8/2011   16
Data Mining   11/8/2011   17
Data Mining   11/8/2011   18
 SPADE(min_sup,D)
  //min_sup – minimum_support
 //D –initial dataset
 F1<- {frequent items or 1-sequences}
 F2<- {frequent 2-sequences}
 Ε <- {equivalence classes [X] Θ1 }
 for all [X] in E
   enumerate_frequent_seq([X],min_sup)




                  Data Mining   11/8/2011   19
   Enumerate_frequent_seq(S,min_sup)
      for all Ai in S
          Ti <- {}
          for all Aj in S, with j≥i
              R<- Ai v Aj (join)
              if R satisfies min_sup
                   Ti <- Ti U {R}
          end
          Enumerate_frequent_seq(Ti , min_sup) //DFS
    end
    For all non-empty Ti
      Enumerate_frequent_seq(Ti , min_sup) //BFS


                       Data Mining   11/8/2011   20
 The   R Project for Statistical Computing
    developed at Bell Laboratories (formerly
     AT&T, now Lucent Technologies) by John
     Chambers and colleagues

    Different implementation of S language

    arulesSequences package




                      Data Mining   11/8/2011   21
Data Mining   11/8/2011   22

Weitere ähnliche Inhalte

Was ist angesagt?

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithmGangadhar S
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm DHIVYADEVAKI
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesHouw Liong The
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph miningKrish_ver2
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series dataKrish_ver2
 

Was ist angesagt? (20)

3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 

Ähnlich wie SPADE -

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliNETWAYS
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
 
Interval intersection
Interval intersectionInterval intersection
Interval intersectionAabida Noman
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Michael Rush
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern miningkiran said
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerliqiang xu
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipatLATIPAT
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Vincent Michel
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageMajid Abdollahi
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilab
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Marlon Dumas
 

Ähnlich wie SPADE - (20)

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
Cdi implementation
Cdi implementationCdi implementation
Cdi implementation
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipat
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KIT
 
SMDMS'13
SMDMS'13SMDMS'13
SMDMS'13
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

SPADE -

  • 1. Sequence mining algorithm Monica Dăgădiţă ISI
  • 2.  Introduction to sequence mining  Why sequence mining?  Sequence mining algorithms  SPADE  Motivation  Definitions and examples  Algorithm  Implementation Data Mining 11/8/2011 2
  • 3.  Aim - finding statistically relevant patterns between data examples where the values are delivered in a sequence  Originallyintroduced for market basket analysis - customer behaviour predictions 2 types of sequence mining:  string mining – biology (gene/protein sequences)  itemset mining - marketing and CRM applications Data Mining 11/8/2011 3
  • 4.  Discovering patterns:  Bookstore: 70% of the people who buy Jane Austen’s “Pride and Prejudice” also buy “Emma” within a month  Website: finding sequences of most frequently accessed pages  Usage:  Promotions  Shelf placement  Restructure the website  Recommender systems Data Mining 11/8/2011 4
  • 5.  Apriori  GSP (Generalized Sequential Pattern)  FreeSpan (Frequent pattern-projected Sequential pattern mining)  PrefixSpan (Prefix-projected Sequential pattern mining)  SPADE (Sequential PAttern Discovery using Equivalence classes) Data Mining 11/8/2011 5
  • 6.  Problems of existing solutions  Repeated database scans  Complex internal data structures  Key features of SPADE:  Fixed number of database scans  Vertical id-list database format  Decomposition of search space into smaller pieces – processed independently Data Mining 11/8/2011 6
  • 7.  Itemset: set of m distinct items I = {i1, i2, …, im }  Event: non-empty collection of items (i1,i2 … ik)  Sequence : ordered list of events < e1 -> e2 -> … -> en >  K-sequence : sequence with k items (B->AC) – 3-sequence Data Mining 11/8/2011 7
  • 8.  Subsequence: given two sequences α=<a1 a2 … an> and β=<b1 b2 … bm>, α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn  Examples: 1. (B->AC) is a subsequence of (AB->E->ACD) 2. (AB->E) is not a subsequence of (ABE) Data Mining 11/8/2011 8
  • 9. Data Mining 11/8/2011 9
  • 10. Id-lists of the most frequent items (1-sequences) Data Mining 11/8/2011 10
  • 11.  D->BF->A  Step 1: D->B  Step 2: D->BF Data Mining 11/8/2011 11
  • 12.  D->BF->A  Step 3 : D->BF->A  Not space-efficient  Solution: 2 columns - (sid,eid) for each sequence  Eid – id of the sequence’s last item Data Mining 11/8/2011 12
  • 13.  D->BF->A (space-efficient id-list joins) D->B SID EID 1 15 1 20 4 20 D->BF->A D->BF SID EID SID EID 1 25 1 20 4 25 4 20 Data Mining 11/8/2011 13
  • 14.  Complete latice representation Data Mining 11/8/2011 14
  • 15. Data Mining 11/8/2011 15
  • 16.  Decomposing the latice => smaller pieces that can be solved independently  Equivalence classes 2 sequences are in the same class (Θk) if they share a common k length prefix Example k=1 : Θ1 -> {[A],[B],[D],[F]} Data Mining 11/8/2011 16
  • 17. Data Mining 11/8/2011 17
  • 18. Data Mining 11/8/2011 18
  • 19.  SPADE(min_sup,D) //min_sup – minimum_support //D –initial dataset F1<- {frequent items or 1-sequences} F2<- {frequent 2-sequences} Ε <- {equivalence classes [X] Θ1 } for all [X] in E enumerate_frequent_seq([X],min_sup) Data Mining 11/8/2011 19
  • 20. Enumerate_frequent_seq(S,min_sup) for all Ai in S Ti <- {} for all Aj in S, with j≥i R<- Ai v Aj (join) if R satisfies min_sup Ti <- Ti U {R} end Enumerate_frequent_seq(Ti , min_sup) //DFS end For all non-empty Ti Enumerate_frequent_seq(Ti , min_sup) //BFS Data Mining 11/8/2011 20
  • 21.  The R Project for Statistical Computing  developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues  Different implementation of S language  arulesSequences package Data Mining 11/8/2011 21
  • 22. Data Mining 11/8/2011 22