SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |.  Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence	s:  1 Minimum support: 3  	RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence:                              Support: 3 Subsequence:                              Support: 2 … B C
Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression  R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ...  A C A B C A ...  B ...  A A ...  ...  C ...  C A B ...  A B A C B A A A A B B C ...
Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place.  Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places.  External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination  place. External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
Conclusions:  We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions.   Around this basic mechanism we built a family of algorithms embedding different techniques.   The efficiency of our proposal has been thoroughly proven empirically.   The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.

Weitere ähnliche Inhalte

Was ist angesagt?

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signedAfshanKhan51
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1Vin Voro
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problemsSampath Kumar S
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpadMedia4math
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)SRI TECHNOLOGICAL SOLUTIONS
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5Vin Voro
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016appasami
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4Vin Voro
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmMeghaj Mallick
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to DruidNavis Ryu
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015appasami
 

Was ist angesagt? (20)

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signed
 
Propulsion ii
Propulsion iiPropulsion ii
Propulsion ii
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1
 
Mid term
Mid termMid term
Mid term
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problems
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
Cs 62
Cs 62Cs 62
Cs 62
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)
 
Sns pre sem
Sns pre semSns pre sem
Sns pre sem
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5
 
Prepostinfix
PrepostinfixPrepostinfix
Prepostinfix
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
 
Assignment2
Assignment2Assignment2
Assignment2
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4
 
Lo18
Lo18Lo18
Lo18
 
Turing machine
Turing machineTuring machine
Turing machine
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer Algorithm
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to Druid
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015
 

Andere mochten auch

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdataPranab Ghosh
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Andere mochten auch (10)

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdata
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
SPADE -
SPADE - SPADE -
SPADE -
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Ähnlich wie Sma

Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmCSCJournals
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Beniamino Murgante
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Vishalkumarec
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd Iaetsd
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmissionPunk Pankaj
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalIJMER
 

Ähnlich wie Sma (20)

Er24902905
Er24902905Er24902905
Er24902905
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Lect6 csp
Lect6 cspLect6 csp
Lect6 csp
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
 
Nc2421532161
Nc2421532161Nc2421532161
Nc2421532161
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
MATEX @ DAC14
MATEX @ DAC14MATEX @ DAC14
MATEX @ DAC14
 
DC_PPT.pptx
DC_PPT.pptxDC_PPT.pptx
DC_PPT.pptx
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmission
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss Signal
 

Mehr von Roberto Trasarti

Mehr von Roberto Trasarti (8)

Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human MobilityPreserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
 
Cast
CastCast
Cast
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
Athena
AthenaAthena
Athena
 
K-BestMatch
K-BestMatchK-BestMatch
K-BestMatch
 
Where Next
Where NextWhere Next
Where Next
 
Daedalus
DaedalusDaedalus
Daedalus
 
ConQueSt
ConQueStConQueSt
ConQueSt
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Sma

  • 1. Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
  • 2. Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |. Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence s: 1 Minimum support: 3 RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence: Support: 3 Subsequence: Support: 2 … B C
  • 3. Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ... A C A B C A ... B ... A A ... ... C ... C A B ... A B A C B A A A A B B C ...
  • 4. Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place. Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places. External signal Example RE: A*B(B|C)D*E
  • 5. Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination place. External signal Example RE: A*B(B|C)D*E
  • 6. Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
  • 7. One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
  • 8. Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
  • 9. Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
  • 10. Conclusions: We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions. Around this basic mechanism we built a family of algorithms embedding different techniques. The efficiency of our proposal has been thoroughly proven empirically. The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.