SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Statistical Characteristics of
Modified Stochastic Algorithm

Vilnius University
Institute of Mathematics and Informatics

Loreta Savulioniene
Structure
•
•
•
•

Data mining
Steps of the Apriori algorithm
Association rules
Modified stochastic algorithm for mining frequent
subsequences
• Computer Modeling

2
Introduction (1)

Discovering new knowledge consists of some steps:
• Data selection;
• Data preparation for analysis;
• Application of algorithms to discover knowledge;
• Presentation of new knowledge.

3
Introduction (2)
• Data mining is research and analysis of large amounts of
data using automated or semi-automated methods in order
to find important relation between data, discover models
and association rules.
• Data mining is defined as the method of acquisition,
tracking and discovering of new meanings in data.

4
Introduction (3)
All algorithms used for frequent sequence mining could be
classified in two groups:
• Exact algorithms;
• Approximate algorithms.

5
Apriori algorithm
• Frequent one element itemsets are found in the first step of
the Apriori algorithm step.
• Other steps of the algorithm consist of two parts:
• generating potentially frequent itemsets;
• determining the frequent candidate itemsets.

6
Association rules (1)
Let I={i1; i2, …, in} be a set of items.
Let D be a database of transactions, where each transaction T
consists of a set of items such that T⊆ I.
Given itemset X⊆ I, transaction T contains X if and only if X
⊆ T.
Definition 1. An association rule is an implication of the
form X⇒Y, where X⊆ I, Y⊆ I and X∩Y=∅ .
Definition 2. The association rule X ⇒ Y holds in D with
confidence conf if the probability of a transaction in D which
contains X also contains Y is conf.
7
Association rules (2)
Definition 3. The association rule X⇒Y has support supp in
D if the probability of a transaction in D contains both X and
Y is supp.
Definition 4. Confidence conf of the association rule X⇒Y is
called a value:
(1)

8
Association rules (3)

Discovering of association rules consists of two steps:
1. Discovering of frequent itemsets.
2. Creation of an association rule according to identified
frequent itemsets.

9
Modified stochastic algorithm for mining
frequent subsequences (1)
• Let us analyse an M-length database D.
• Namely, randomly selected random length l subsets,
containing at least one frequent element, determined by
the Apriori algorithm, are analysed.
• Assume that the analysed subset length is distributed
according to the geometrical distribution with the
parameter q, and the spacing between the two subset
lengths is also distributed according to the geometrical
distribution with the parameter p.

10
Modified stochastic algorithm for mining
frequent subsequences (2)
The average analysed subset length is:
l=q/(1-q) (2),
and the average length of the gap between adjacent subsets is
equal to:
t=p/(1-p)
(3).
Let us randomly choose N (number of samples) subsets of
various lengths for analysing database D. Subset frequencies
ci of the appropriate length are calculated using the following
formula (4):
ci=Ni /N, where i=1, 2, …, n,
(4)
11
Statistical Characteristics of Modified
Stochastic Algorithm (1)
We have two independent subset samples with their sizes being
n1 and n2. In the first sample there occur k1 and in the second k2 elements with necessary attribute value.
The hypothesis:
H0: p1 =p2
H1: p1≠ p2.

(5)
(6)

12
Statistical Characteristics of Modified
Stochastic Algorithm (2)
Criterion Statistics u
Criterion statistics u is estimated according to this formula (7):
u=

d1 − d 2
 k1 + k2   k1 + k2   1 1 

 n + n  ⋅ 1 − n + n  ⋅  n + n 
 
 

1
2   1
2 
 1 2 

(7).

If d is labeled d = (k1 + k2)/(n1+ n2), the formula is as follows (8):
u=

d1 − d 2
1 1
d ⋅ (1 − d ) ⋅  + 
n n 
2 
 1

(8).

13
Statistical Characteristics of Modified
Stochastic Algorithm (3)
Criterion Statistics z
Criterion statistics z is estimated according to this formula (9):

(

)

z = 2 arcsin d1 − 2 arcsin d 2 ⋅

n1 ⋅ n2
n1 + n2

(9).

14
Statistical Characteristics of Modified
Stochastic Algorithm (4)
Assumption Evaluation
After criterion statistics is estimated, assumption of probability
evaluation is performed. When alternative is double (H1: p1≠ p2),
the obtained value u, corresponding value P, is calculated as
follows (10):
P=2-(l-NORMSDIST(ABS(u))).

(10)

15
Computer Modeling(1)
Transaction number
...
1001
1001
1001
...
1002
1002
...

Item title
...
I
J
T
...
A
C
...

Quantity
...
1
1
1
...
2
2
...

16
Computer Modeling(2)
ABCDEFGHIJKLMPRSTUV
ACEGIKM
ABTUV
..............................
ABCDEF
CDEFGHIJKLMPRST
............

17
Computer Modeling(3)
This file is processed by the modified stochastic algorithm,
when 50 ≤ min_supp ≤ 600.
The average processing time of the algorithm is 2 min. 20 s.

18
Computer Modeling(4)

19
Computer Modeling(5)

20
Computer Modeling(6)

21
Conclusion
• The modified stochastic algorithm is based on the analysis
of randomly chosen subsets, that include at least one
frequent element, determined by the Apriori algorithm.
• This algorithm is applied in solving the problem of the
market basket.
• The most frequent market basket consists of 6 items.

22
Thank you!
Questions?

23

Weitere ähnliche Inhalte

Was ist angesagt?

Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
Kumar
 

Was ist angesagt? (19)

A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHMA PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithms
 
DATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTESDATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTES
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
Chapter 7 ds
Chapter 7 dsChapter 7 ds
Chapter 7 ds
 
Linear models
Linear modelsLinear models
Linear models
 
stacks and queues for public
stacks and queues for publicstacks and queues for public
stacks and queues for public
 
Segment tree
Segment treeSegment tree
Segment tree
 
Dynamic Memory & Linked Lists
Dynamic Memory & Linked ListsDynamic Memory & Linked Lists
Dynamic Memory & Linked Lists
 
Segment tree
Segment treeSegment tree
Segment tree
 
Segment tree
Segment treeSegment tree
Segment tree
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
 
algorithm unit 1
algorithm unit 1algorithm unit 1
algorithm unit 1
 
algorithm Unit 5
algorithm Unit 5 algorithm Unit 5
algorithm Unit 5
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data frames
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 

Andere mochten auch

Andere mochten auch (13)

Pyž, Gražina „Lietuviškų fonemų dinaminių modelių analizė ir sintezė“
Pyž, Gražina „Lietuviškų fonemų dinaminių modelių analizė ir sintezė“Pyž, Gražina „Lietuviškų fonemų dinaminių modelių analizė ir sintezė“
Pyž, Gražina „Lietuviškų fonemų dinaminių modelių analizė ir sintezė“
 
KoDi2013_programa
KoDi2013_programaKoDi2013_programa
KoDi2013_programa
 
Luca, Marius Alexandru „BitDefender apsaugos sprendimai organizacijoms“ (Rumu...
Luca, Marius Alexandru „BitDefender apsaugos sprendimai organizacijoms“ (Rumu...Luca, Marius Alexandru „BitDefender apsaugos sprendimai organizacijoms“ (Rumu...
Luca, Marius Alexandru „BitDefender apsaugos sprendimai organizacijoms“ (Rumu...
 
Niakšu, Olegas ; Kurasova, Olga ; Gedminaitė, Jurgita „Duomenų tyryba BRCA1 g...
Niakšu, Olegas ; Kurasova, Olga ; Gedminaitė, Jurgita „Duomenų tyryba BRCA1 g...Niakšu, Olegas ; Kurasova, Olga ; Gedminaitė, Jurgita „Duomenų tyryba BRCA1 g...
Niakšu, Olegas ; Kurasova, Olga ; Gedminaitė, Jurgita „Duomenų tyryba BRCA1 g...
 
Žemaitienė, Inga ; Pigulevičienė, Julita „Švietimas saugesnio interneto klaus...
Žemaitienė, Inga ; Pigulevičienė, Julita „Švietimas saugesnio interneto klaus...Žemaitienė, Inga ; Pigulevičienė, Julita „Švietimas saugesnio interneto klaus...
Žemaitienė, Inga ; Pigulevičienė, Julita „Švietimas saugesnio interneto klaus...
 
Dagienė, Valentina ; Maceikaitė, Gintarė „Informatikos varžybos pradinukams“ ...
Dagienė, Valentina ; Maceikaitė, Gintarė „Informatikos varžybos pradinukams“ ...Dagienė, Valentina ; Maceikaitė, Gintarė „Informatikos varžybos pradinukams“ ...
Dagienė, Valentina ; Maceikaitė, Gintarė „Informatikos varžybos pradinukams“ ...
 
Rainys, Rytis „Kibernetinis saugumas ir Lietuvos tinklų infrastruktūra “ RRT
Rainys, Rytis „Kibernetinis saugumas ir Lietuvos tinklų infrastruktūra“ RRTRainys, Rytis „Kibernetinis saugumas ir Lietuvos tinklų infrastruktūra“ RRT
Rainys, Rytis „Kibernetinis saugumas ir Lietuvos tinklų infrastruktūra “ RRT
 
Dagienė, Valentina; Daukšaitė, Gabrielė „Elektroninio aplanko ir socialinių t...
Dagienė, Valentina; Daukšaitė, Gabrielė „Elektroninio aplanko ir socialinių t...Dagienė, Valentina; Daukšaitė, Gabrielė „Elektroninio aplanko ir socialinių t...
Dagienė, Valentina; Daukšaitė, Gabrielė „Elektroninio aplanko ir socialinių t...
 
Ledas, Žilvinas ; Baronas, Romas ; Šimkus, Remigijus „Švytinčių bakterijų str...
Ledas, Žilvinas ; Baronas, Romas ; Šimkus, Remigijus „Švytinčių bakterijų str...Ledas, Žilvinas ; Baronas, Romas ; Šimkus, Remigijus „Švytinčių bakterijų str...
Ledas, Žilvinas ; Baronas, Romas ; Šimkus, Remigijus „Švytinčių bakterijų str...
 
Lapin, Kristina ; Dapkūnas, Sigitas „Duomenų apdorojimas ir vizualizavimas mo...
Lapin, Kristina ; Dapkūnas, Sigitas „Duomenų apdorojimas ir vizualizavimas mo...Lapin, Kristina ; Dapkūnas, Sigitas „Duomenų apdorojimas ir vizualizavimas mo...
Lapin, Kristina ; Dapkūnas, Sigitas „Duomenų apdorojimas ir vizualizavimas mo...
 
Maumevičienė, Dainora „Ar lokalizuota programinė įranga keičia tikslinę kalbą...
Maumevičienė, Dainora „Ar lokalizuota programinė įranga keičia tikslinę kalbą...Maumevičienė, Dainora „Ar lokalizuota programinė įranga keičia tikslinę kalbą...
Maumevičienė, Dainora „Ar lokalizuota programinė įranga keičia tikslinę kalbą...
 
Paulauskienė, Kotryna ; Kurasova, Olga „Duomenų tyrybos sistemų galimybių tyr...
Paulauskienė, Kotryna ; Kurasova, Olga „Duomenų tyrybos sistemų galimybių tyr...Paulauskienė, Kotryna ; Kurasova, Olga „Duomenų tyrybos sistemų galimybių tyr...
Paulauskienė, Kotryna ; Kurasova, Olga „Duomenų tyrybos sistemų galimybių tyr...
 
Simanavičienė, Rūta „Statistinių metodų taikymas daugiatikslių sprendimų pati...
Simanavičienė, Rūta „Statistinių metodų taikymas daugiatikslių sprendimų pati...Simanavičienė, Rūta „Statistinių metodų taikymas daugiatikslių sprendimų pati...
Simanavičienė, Rūta „Statistinių metodų taikymas daugiatikslių sprendimų pati...
 

Ähnlich wie Savulionienė, Loreta ; Sakalauskas, Leonidas „Modifikuoto stochastinio dažnų posekių paieškos algoritmo tikimybinės charakteristikos“ (VU MII)

Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
Rajiv Advani
 

Ähnlich wie Savulionienė, Loreta ; Sakalauskas, Leonidas „Modifikuoto stochastinio dažnų posekių paieškos algoritmo tikimybinės charakteristikos“ (VU MII) (20)

Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Session II - Estimation methods and accuracy Li-Chun Zhang Discussion: Sess...
Session II - Estimation methods and accuracy   Li-Chun Zhang Discussion: Sess...Session II - Estimation methods and accuracy   Li-Chun Zhang Discussion: Sess...
Session II - Estimation methods and accuracy Li-Chun Zhang Discussion: Sess...
 
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Observer design for descriptor linear systems
Observer design for descriptor linear systemsObserver design for descriptor linear systems
Observer design for descriptor linear systems
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo Method
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
lecture_01.ppt
lecture_01.pptlecture_01.ppt
lecture_01.ppt
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERSADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
 
Analysis.ppt
Analysis.pptAnalysis.ppt
Analysis.ppt
 

Mehr von Lietuvos kompiuterininkų sąjunga

Mehr von Lietuvos kompiuterininkų sąjunga (20)

LIKS ataskaita 2021-2023
LIKS ataskaita 2021-2023LIKS ataskaita 2021-2023
LIKS ataskaita 2021-2023
 
Eimutis KARČIAUSKAS. Informatikos mokymo pasiekimų vertinimų analizė
Eimutis KARČIAUSKAS. Informatikos mokymo pasiekimų vertinimų analizėEimutis KARČIAUSKAS. Informatikos mokymo pasiekimų vertinimų analizė
Eimutis KARČIAUSKAS. Informatikos mokymo pasiekimų vertinimų analizė
 
B. Čiapas. Prekių atpažinimo tyrimas naudojant giliuosius neuroninius tinklus...
B. Čiapas. Prekių atpažinimo tyrimas naudojant giliuosius neuroninius tinklus...B. Čiapas. Prekių atpažinimo tyrimas naudojant giliuosius neuroninius tinklus...
B. Čiapas. Prekių atpažinimo tyrimas naudojant giliuosius neuroninius tinklus...
 
D. Dluznevskij. YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemose
D. Dluznevskij.  YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemoseD. Dluznevskij.  YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemose
D. Dluznevskij. YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemose
 
I. Jakšaitytė. Nuotoliniai kursai informatikos mokytojų kvalifikacijai kelti:...
I. Jakšaitytė. Nuotoliniai kursai informatikos mokytojų kvalifikacijai kelti:...I. Jakšaitytė. Nuotoliniai kursai informatikos mokytojų kvalifikacijai kelti:...
I. Jakšaitytė. Nuotoliniai kursai informatikos mokytojų kvalifikacijai kelti:...
 
G. Mezetis. Skaimenines valstybes link
G. Mezetis. Skaimenines valstybes link G. Mezetis. Skaimenines valstybes link
G. Mezetis. Skaimenines valstybes link
 
E..Zikariene. Priziurima aplinkos duomenu klasifikacija, pagrista erdviniais ...
E..Zikariene. Priziurima aplinkos duomenu klasifikacija, pagrista erdviniais ...E..Zikariene. Priziurima aplinkos duomenu klasifikacija, pagrista erdviniais ...
E..Zikariene. Priziurima aplinkos duomenu klasifikacija, pagrista erdviniais ...
 
V. Jakuška. Ką reikėtu žinoti apie .lt domeną?
V. Jakuška. Ką reikėtu žinoti apie .lt domeną?V. Jakuška. Ką reikėtu žinoti apie .lt domeną?
V. Jakuška. Ką reikėtu žinoti apie .lt domeną?
 
V. Marcinkevičius. ARIS dirbtinio intelekto kurso mokymosi medžiaga, www.aris...
V. Marcinkevičius. ARIS dirbtinio intelekto kurso mokymosi medžiaga, www.aris...V. Marcinkevičius. ARIS dirbtinio intelekto kurso mokymosi medžiaga, www.aris...
V. Marcinkevičius. ARIS dirbtinio intelekto kurso mokymosi medžiaga, www.aris...
 
Jolanta Navickaitė. Skaitmeninė kompetencija ir informatikos naujovės bendraj...
Jolanta Navickaitė. Skaitmeninė kompetencija ir informatikos naujovės bendraj...Jolanta Navickaitė. Skaitmeninė kompetencija ir informatikos naujovės bendraj...
Jolanta Navickaitė. Skaitmeninė kompetencija ir informatikos naujovės bendraj...
 
Raimundas Matylevičius. Asmens duomenų valdymas
Raimundas Matylevičius. Asmens duomenų valdymasRaimundas Matylevičius. Asmens duomenų valdymas
Raimundas Matylevičius. Asmens duomenų valdymas
 
Romas Baronas. Tarpdisciplininiai moksliniai tyrimai – galimybė atsiverti ir ...
Romas Baronas. Tarpdisciplininiai moksliniai tyrimai – galimybė atsiverti ir ...Romas Baronas. Tarpdisciplininiai moksliniai tyrimai – galimybė atsiverti ir ...
Romas Baronas. Tarpdisciplininiai moksliniai tyrimai – galimybė atsiverti ir ...
 
Monika Danilovaitė. Informatikos metodų taikymas balso klosčių būklei įvertin...
Monika Danilovaitė. Informatikos metodų taikymas balso klosčių būklei įvertin...Monika Danilovaitė. Informatikos metodų taikymas balso klosčių būklei įvertin...
Monika Danilovaitė. Informatikos metodų taikymas balso klosčių būklei įvertin...
 
Rima Šiaulienė. IT VBE 2021 teksto maketavimo užduotis
Rima Šiaulienė. IT VBE 2021 teksto maketavimo užduotisRima Šiaulienė. IT VBE 2021 teksto maketavimo užduotis
Rima Šiaulienė. IT VBE 2021 teksto maketavimo užduotis
 
Gražina Korvel. Lombardo šnekos ir jos akustinių ypatybių analizė
Gražina Korvel. Lombardo šnekos ir jos akustinių ypatybių analizėGražina Korvel. Lombardo šnekos ir jos akustinių ypatybių analizė
Gražina Korvel. Lombardo šnekos ir jos akustinių ypatybių analizė
 
Gediminas Navickas. Ar mes visi vienodai suvokiame sintetinę kalbą?
Gediminas Navickas. Ar mes visi vienodai suvokiame sintetinę kalbą?Gediminas Navickas. Ar mes visi vienodai suvokiame sintetinę kalbą?
Gediminas Navickas. Ar mes visi vienodai suvokiame sintetinę kalbą?
 
Eugenijus Valavičius. Hiperteksto kelias
Eugenijus Valavičius. Hiperteksto keliasEugenijus Valavičius. Hiperteksto kelias
Eugenijus Valavičius. Hiperteksto kelias
 
Tomas Kasperavičius. Robotikos realizacija edukacinėje erdvėje
Tomas Kasperavičius. Robotikos realizacija edukacinėje erdvėjeTomas Kasperavičius. Robotikos realizacija edukacinėje erdvėje
Tomas Kasperavičius. Robotikos realizacija edukacinėje erdvėje
 
Paulius Šakalys. Robotika: sąvoka, rūšys, pritaikymas edukacinėje erdvėje
Paulius Šakalys. Robotika: sąvoka, rūšys, pritaikymas edukacinėje erdvėjePaulius Šakalys. Robotika: sąvoka, rūšys, pritaikymas edukacinėje erdvėje
Paulius Šakalys. Robotika: sąvoka, rūšys, pritaikymas edukacinėje erdvėje
 
Olga Kurasova. Dirbtinis intelektas ir neuroniniai tinklai
Olga Kurasova. Dirbtinis intelektas ir neuroniniai tinklaiOlga Kurasova. Dirbtinis intelektas ir neuroniniai tinklai
Olga Kurasova. Dirbtinis intelektas ir neuroniniai tinklai
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Savulionienė, Loreta ; Sakalauskas, Leonidas „Modifikuoto stochastinio dažnų posekių paieškos algoritmo tikimybinės charakteristikos“ (VU MII)

  • 1. Statistical Characteristics of Modified Stochastic Algorithm Vilnius University Institute of Mathematics and Informatics Loreta Savulioniene
  • 2. Structure • • • • Data mining Steps of the Apriori algorithm Association rules Modified stochastic algorithm for mining frequent subsequences • Computer Modeling 2
  • 3. Introduction (1) Discovering new knowledge consists of some steps: • Data selection; • Data preparation for analysis; • Application of algorithms to discover knowledge; • Presentation of new knowledge. 3
  • 4. Introduction (2) • Data mining is research and analysis of large amounts of data using automated or semi-automated methods in order to find important relation between data, discover models and association rules. • Data mining is defined as the method of acquisition, tracking and discovering of new meanings in data. 4
  • 5. Introduction (3) All algorithms used for frequent sequence mining could be classified in two groups: • Exact algorithms; • Approximate algorithms. 5
  • 6. Apriori algorithm • Frequent one element itemsets are found in the first step of the Apriori algorithm step. • Other steps of the algorithm consist of two parts: • generating potentially frequent itemsets; • determining the frequent candidate itemsets. 6
  • 7. Association rules (1) Let I={i1; i2, …, in} be a set of items. Let D be a database of transactions, where each transaction T consists of a set of items such that T⊆ I. Given itemset X⊆ I, transaction T contains X if and only if X ⊆ T. Definition 1. An association rule is an implication of the form X⇒Y, where X⊆ I, Y⊆ I and X∩Y=∅ . Definition 2. The association rule X ⇒ Y holds in D with confidence conf if the probability of a transaction in D which contains X also contains Y is conf. 7
  • 8. Association rules (2) Definition 3. The association rule X⇒Y has support supp in D if the probability of a transaction in D contains both X and Y is supp. Definition 4. Confidence conf of the association rule X⇒Y is called a value: (1) 8
  • 9. Association rules (3) Discovering of association rules consists of two steps: 1. Discovering of frequent itemsets. 2. Creation of an association rule according to identified frequent itemsets. 9
  • 10. Modified stochastic algorithm for mining frequent subsequences (1) • Let us analyse an M-length database D. • Namely, randomly selected random length l subsets, containing at least one frequent element, determined by the Apriori algorithm, are analysed. • Assume that the analysed subset length is distributed according to the geometrical distribution with the parameter q, and the spacing between the two subset lengths is also distributed according to the geometrical distribution with the parameter p. 10
  • 11. Modified stochastic algorithm for mining frequent subsequences (2) The average analysed subset length is: l=q/(1-q) (2), and the average length of the gap between adjacent subsets is equal to: t=p/(1-p) (3). Let us randomly choose N (number of samples) subsets of various lengths for analysing database D. Subset frequencies ci of the appropriate length are calculated using the following formula (4): ci=Ni /N, where i=1, 2, …, n, (4) 11
  • 12. Statistical Characteristics of Modified Stochastic Algorithm (1) We have two independent subset samples with their sizes being n1 and n2. In the first sample there occur k1 and in the second k2 elements with necessary attribute value. The hypothesis: H0: p1 =p2 H1: p1≠ p2. (5) (6) 12
  • 13. Statistical Characteristics of Modified Stochastic Algorithm (2) Criterion Statistics u Criterion statistics u is estimated according to this formula (7): u= d1 − d 2  k1 + k2   k1 + k2   1 1    n + n  ⋅ 1 − n + n  ⋅  n + n       1 2   1 2   1 2  (7). If d is labeled d = (k1 + k2)/(n1+ n2), the formula is as follows (8): u= d1 − d 2 1 1 d ⋅ (1 − d ) ⋅  +  n n  2   1 (8). 13
  • 14. Statistical Characteristics of Modified Stochastic Algorithm (3) Criterion Statistics z Criterion statistics z is estimated according to this formula (9): ( ) z = 2 arcsin d1 − 2 arcsin d 2 ⋅ n1 ⋅ n2 n1 + n2 (9). 14
  • 15. Statistical Characteristics of Modified Stochastic Algorithm (4) Assumption Evaluation After criterion statistics is estimated, assumption of probability evaluation is performed. When alternative is double (H1: p1≠ p2), the obtained value u, corresponding value P, is calculated as follows (10): P=2-(l-NORMSDIST(ABS(u))). (10) 15
  • 16. Computer Modeling(1) Transaction number ... 1001 1001 1001 ... 1002 1002 ... Item title ... I J T ... A C ... Quantity ... 1 1 1 ... 2 2 ... 16
  • 18. Computer Modeling(3) This file is processed by the modified stochastic algorithm, when 50 ≤ min_supp ≤ 600. The average processing time of the algorithm is 2 min. 20 s. 18
  • 22. Conclusion • The modified stochastic algorithm is based on the analysis of randomly chosen subsets, that include at least one frequent element, determined by the Apriori algorithm. • This algorithm is applied in solving the problem of the market basket. • The most frequent market basket consists of 6 items. 22