SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
A Genetic Algorithm for Dynamic
   Modelling and Prediction of
  Activity in Document Streams
                Lourdes Araujo,JJ Merelo
           lurdes@lsi.uned.es, jj@merelo.net


         Dpto. Lenguajes y Sistemas Inform´ ticos
                                           a
      Universidad Nacional de Educaci´ n a Distancia
                                      o
     Dpto. Arquitectura y Tecnolog´a de Computadores
                                  ı
                 Universidad de Granada
                           Spain


                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.1/24
Why
     Document
 •
     metadata, such as
     arrival time help
     organize document
     streams.
     Temporal
 •
     information help
     make sense of
     document streams
     such as e-mails and
     news items.
     Its study combines
 •
     content analysis and
     time series mode-
     lling.       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.2/24
Showing interest
      Hypothesis: Explosions in interest match points
  •
      in time where arrival intensity increases sharply.
      In general, arrival time is quite irregular.
  •


                               Y

               #document arrivals




                                                                                        X
                                               Time



                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.3/24
Regularizing irregularity
      A cost function, that reflects
  •
      how difficult is hiking from
      one state to another, is
      introduced.
      Intervals of similar frequency
  •
      should be grouped in a sin-
      gle state, so change of sta-
      te will be penalyzed. But we
      shouldn’t overdo it.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.4/24
Kleinberg’s model
      The document stream is modeled as an infinite
  •
      state automaton, A, which emits messages with
      different frequencies.
      Each state has a frequency assigned.
  •

      Bursts are indicated by transitions from a lower
  •
      to a higher state.
      Frequency changes are controlled by assigning
  •
      costs to state changes, avoiding small explosions
      and making identification of real explosions
      easier.



                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.5/24
Infinite state automaton model
      Generation of time sequence
  •
      based on a exponential
      distribution.
       • Time interval x between
          message i and i + 1
          follows exponential
          distribution function
          f (x) = αe−αx , for α > 0.
       • Expected value for the
          interval is α−1 .




                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.6/24
First things first: two state mo-
del
      Basic model 2-State probabilistic automata A: q0
  •
      (low emission rate) y q1 (high).
                                                          q1
                                 q0



      n + 1 messages, n intervals: Bayes procedure
  •
      used to fit to a conditional probability of a state
      sequence: q = (qi1 , · · · , qin ):
                                                                n
                          1−p
          c(q|x) = b ln (     )+(     −ln fit (xt ))
                           p      t=1

      where b = state transitions, 1st term: low number
      of transitions, 2nd : states fit the sequence
                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.7/24
To the infinite and beyond
      Given a sequence of intervals x =
  •
      (x1 , x2 , · · · , xn ), a sequence q = (qi1 , · · · , qin )
      that minimizes
                        n−1                                           n
           c(q|x) =              τ (it , it+1 ) +                             −ln fit (xt )
                        t=0                                         t=1

      must be found
      f is related to the resolution of discrete rates
  •
      within continuous emission rates, and τ the
      facility of changing state.


                         A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.8/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                       s,γ
      with k states.




                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                        s,γ
      with k states.
      We will use a evolutionary algorithm to find Ak .
  •
                                                      s,γ




                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                        s,γ
      with k states.
      We will use a evolutionary algorithm to find Ak .
  •
                                                      s,γ

      Finally!
  •




                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Individual representation
      n integer sequence,1 < qij < E, representing
  •
      automaton state and id i of last document in
      sequence.
      i arrives at 0 ≤ ti ≤ T (intervals xi = ti − ti−1 ).
  •

                                     ···
            t1            t2                                                    tn
       | qt1 , tk1 | qtk1 +1 , tk2 | · · ·                                  | qtf , tn |

      Fitness function = cost function.
  •

      Initial population: documents chosen at random
  •
      that split the document stream in intervals, with
      random states.


                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.10/24
Crossover
           g11                                       g1i                                                  g1f1
                          ···                                                          ···

    q11 , (t1 , · · · )          q1i , (t − n1 , · · · , t, · · · t + m1 )                        q1f1 , (· · · , tn )
                          ···                                                          ···


           g21                                       g2j                                                  g2f2
                          ···                                                          ···

    q21 , (t1 , · · · )          q2j , (t − n2 , · · · , t, · · · t + m2 )                        q2f2 , (· · · , tn )
                          ···                                                          ···


                                                      c.p.
    g11                         g1i−1                                     g2j+1                                   g2f2
                  ···                                                                              ···

    q11                         q1i−1                                      q2j+1                                  q2f2

 (t1 , · · · )            (· · · , t − n1 − 1)                   (t + m2 + 1, · · · )                         (· · · , tn )
                                                        ?
                  ···                                                                              ···


                                                      c.p.
    g21                         g2j−1                                      g1i+1                                  g1f1
                  ···                                                                              ···

    q21                         q2j−1                                      q1i+1                                  q1f1

 (t1 , · · · )            (· · · , t − n2 − 1)                   (t + m1 + 1, · · · )                         (· · · , tn )
                                                       ?
                  ···                                                                              ···


                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.11/24
Mutation
      Several mutation
  •
      operators
       • Increment state by
         one
       • Merge two genes,
         state taken randomly
       • Split a gene in two:
         one with original
         state, another ±1.




                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.12/24
Effect of crossover
                500


                400
Generation N.




                300


                       stream a
                200    stream b
                       stream c

                100
                  10    20              30                          40                       50
                                  Crossover rate %




                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.13/24
Effect of mutation
                500


                400
Generation N.



                300


                200

                                                             stream a
                100                                          stream b
                                                             stream c

                 0
                  0   5   10      15       20                         25               30
                            Mutation rate %




                              A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.14/24
Effect of population size
                500
                                                                   stream a
                                                                   stream b
                400                                                stream c
Generation N.



                300


                200


                100


                 0
                 100   200        300                        400                      500
                             Population size




                               A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.15/24
Effect of number of generations
                9e+05

                8e+05

                7e+05
Cost function




                6e+05
                               stream a
                5e+05          stream b
                               stream c
                4e+05

                3e+05

                2e+05
                     0   100   200       300                        400                500
                               Generation N.




                                A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.16/24
Time results
    State n.                      Viterbi                                          Evo. Alg
               Ex. time                     Cost        Ex. time                Cost (Av. Cost, Std. dev.)
         15    2319.36                 277402           1678.61                277712 (279385.6, 980.11)
         20    3117.28                 277306           2182.12                277528 (278980.4, 1114.91)
         25    3835.37                 277260           2033.81                277270 (279472.6, 1116.03)


                                                      Time comparison
                           4000
                           3000
               time (s.)

                           2000
                           1000




                                                                     Evolutionary algorithm

                                                                     Viterbi
                           0




                                  15                          20                              25

                                                            states



                                       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.17/24
Predicting the state of new arri-
vals
      Main point of this work:
  •
      to predict whether buzz
      is going up or down.
      Several possible
  •
      approaches: using
      Viterbi algorithm over
      the whole sequence, and
      reusing evolutionary
      algorithms.
      Easy approach for a sin-
  •
      gle state: assume current
      trend continues.

                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.18/24
Local approximation: results
 Previous substream          A. T.            Old s.             New s.               Trend
 · · · 38 38 39 41 49 49                                                                   ↓
                                52                12                   0
 · · · 41 49 49 52 68 69                                                                   ↑
                                69                 3                   4
 · · · 88 89 90 90 91 92                                                                  →
                                95                 0                   0




                       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.19/24
But it breaks down after a while
 date             GA                                       approx.
 0(2004-04-02)    7(0.694669)
 ···              ···
 74(2004-06-15)   14(0.797281)
 75(2004-06-16)   24(0.970706)
 76(2004-06-17)   19(0.87973)
 77(2004-06-18)   19(0.87973)                              19(0.87973)
 78(2004-06-19)   0(0.605263)                              19(0.87973)
 79(2004-06-20)   0(0.605263)                              19(0.87973)



                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.20/24
Fast GA for modelling new arri-
vals
      Using results of previous fitting
  •

      Chromosome extended, and last gene mutation
  •
      probability higher.
                          1

                                                                          GA fit
                                                                          approx. fit
                         0,9
             Frequency


                         0,8




                         0,7




                         0,6
                               0                                    100
                                              50                                           150
                                                        Time




                                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.21/24
Fast GA: Results
 Subst. len. New Subs. len.             T. w/out seed                           T. w/ seed
  219900          100                                                      141.45 (79.09)
                                              3895.28
  219000         1000                                                      144.75 (81.96)
  210000         10000                                                     166.73 (79.32)

 Subst. Len. New Subs. len.               T. w/out seed                      T. w/ seed
   3032            100                                                              54.6
   2632            500                                                           92.247
                                               5048.49
   2132           1000                                                           294.97
   1132           2000                                                           570.41

                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.22/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.
     Extending this EA allows fitting a stream which
 •
     is an extension of a previously fitted substream in
     a very short time.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.
     Extending this EA allows fitting a stream which
 •
     is an extension of a previously fitted substream in
     a very short time.
     We plan to study correlations among document
 •
     streams, to automatically detect the occurrence of
     new topics composed of multi-word concepts.
                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
The end




     Thanks for your attention
 •




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24
The end




     Thanks for your attention
 •

     Any question?
 •




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24

Weitere ähnliche Inhalte

Ähnlich wie Dynamic modelling of document streams

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filternozomuhamada
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejatknowdiff
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataHopeBay Technologies, Inc.
 
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Michael Maroun
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
First paper with the NITheCS affiliation
First paper with the NITheCS affiliationFirst paper with the NITheCS affiliation
First paper with the NITheCS affiliationRene Kotze
 
On the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemOn the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemijfcstjournal
 
Module v sp
Module v spModule v sp
Module v spVijaya79
 
SSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesSSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesDaniel Fagerstrom
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and youJiahao Chen
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisAnissa ATMANI
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...IFPRI-EPTD
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detectionRavi Kiran B.
 

Ähnlich wie Dynamic modelling of document streams (20)

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejat
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series Data
 
Presentation
PresentationPresentation
Presentation
 
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
stochastic processes-2.ppt
stochastic processes-2.pptstochastic processes-2.ppt
stochastic processes-2.ppt
 
First paper with the NITheCS affiliation
First paper with the NITheCS affiliationFirst paper with the NITheCS affiliation
First paper with the NITheCS affiliation
 
On the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemOn the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic system
 
Module v sp
Module v spModule v sp
Module v sp
 
Fourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdfFourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdf
 
SSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesSSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-Spaces
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and you
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detection
 

Mehr von Juan J. Merelo

Acta de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésActa de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésJuan J. Merelo
 
Ciencia y videojuegos v4
Ciencia y videojuegos v4Ciencia y videojuegos v4
Ciencia y videojuegos v4Juan J. Merelo
 
Como triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónComo triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónJuan J. Merelo
 
Benchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationBenchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationJuan J. Merelo
 
Benchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsBenchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsJuan J. Merelo
 
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantesJuan J. Merelo
 
Creación de panorámicas con Hugin
Creación de panorámicas con HuginCreación de panorámicas con Hugin
Creación de panorámicas con HuginJuan J. Merelo
 
Introducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceIntroducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceJuan J. Merelo
 
Introducción al 7º hackathon UGR
Introducción al 7º hackathon UGRIntroducción al 7º hackathon UGR
Introducción al 7º hackathon UGRJuan J. Merelo
 
Nuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXINuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXIJuan J. Merelo
 
Open Access and Copyleft
Open Access and CopyleftOpen Access and Copyleft
Open Access and CopyleftJuan J. Merelo
 
Luminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceLuminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceJuan J. Merelo
 
Enforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesEnforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesJuan J. Merelo
 
Evostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceEvostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceJuan J. Merelo
 
Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Juan J. Merelo
 
Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Juan J. Merelo
 
Redes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaRedes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaJuan J. Merelo
 
¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?Juan J. Merelo
 
Presentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoPresentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoJuan J. Merelo
 
El software libre contado a los universitarios
El software libre contado a los universitariosEl software libre contado a los universitarios
El software libre contado a los universitariosJuan J. Merelo
 

Mehr von Juan J. Merelo (20)

Acta de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésActa de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergés
 
Ciencia y videojuegos v4
Ciencia y videojuegos v4Ciencia y videojuegos v4
Ciencia y videojuegos v4
 
Como triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónComo triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatón
 
Benchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationBenchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computation
 
Benchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsBenchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithms
 
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
 
Creación de panorámicas con Hugin
Creación de panorámicas con HuginCreación de panorámicas con Hugin
Creación de panorámicas con Hugin
 
Introducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceIntroducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con Luminance
 
Introducción al 7º hackathon UGR
Introducción al 7º hackathon UGRIntroducción al 7º hackathon UGR
Introducción al 7º hackathon UGR
 
Nuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXINuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXI
 
Open Access and Copyleft
Open Access and CopyleftOpen Access and Copyleft
Open Access and Copyleft
 
Luminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceLuminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminance
 
Enforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesEnforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence Techniques
 
Evostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceEvostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conference
 
Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014
 
Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón.
 
Redes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaRedes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisa
 
¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?
 
Presentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoPresentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadino
 
El software libre contado a los universitarios
El software libre contado a los universitariosEl software libre contado a los universitarios
El software libre contado a los universitarios
 

Kürzlich hochgeladen

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Kürzlich hochgeladen (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Dynamic modelling of document streams

  • 1. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams Lourdes Araujo,JJ Merelo lurdes@lsi.uned.es, jj@merelo.net Dpto. Lenguajes y Sistemas Inform´ ticos a Universidad Nacional de Educaci´ n a Distancia o Dpto. Arquitectura y Tecnolog´a de Computadores ı Universidad de Granada Spain A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.1/24
  • 2. Why Document • metadata, such as arrival time help organize document streams. Temporal • information help make sense of document streams such as e-mails and news items. Its study combines • content analysis and time series mode- lling. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.2/24
  • 3. Showing interest Hypothesis: Explosions in interest match points • in time where arrival intensity increases sharply. In general, arrival time is quite irregular. • Y #document arrivals X Time A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.3/24
  • 4. Regularizing irregularity A cost function, that reflects • how difficult is hiking from one state to another, is introduced. Intervals of similar frequency • should be grouped in a sin- gle state, so change of sta- te will be penalyzed. But we shouldn’t overdo it. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.4/24
  • 5. Kleinberg’s model The document stream is modeled as an infinite • state automaton, A, which emits messages with different frequencies. Each state has a frequency assigned. • Bursts are indicated by transitions from a lower • to a higher state. Frequency changes are controlled by assigning • costs to state changes, avoiding small explosions and making identification of real explosions easier. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.5/24
  • 6. Infinite state automaton model Generation of time sequence • based on a exponential distribution. • Time interval x between message i and i + 1 follows exponential distribution function f (x) = αe−αx , for α > 0. • Expected value for the interval is α−1 . A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.6/24
  • 7. First things first: two state mo- del Basic model 2-State probabilistic automata A: q0 • (low emission rate) y q1 (high). q1 q0 n + 1 messages, n intervals: Bayes procedure • used to fit to a conditional probability of a state sequence: q = (qi1 , · · · , qin ): n 1−p c(q|x) = b ln ( )+( −ln fit (xt )) p t=1 where b = state transitions, 1st term: low number of transitions, 2nd : states fit the sequence A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.7/24
  • 8. To the infinite and beyond Given a sequence of intervals x = • (x1 , x2 , · · · , xn ), a sequence q = (qi1 , · · · , qin ) that minimizes n−1 n c(q|x) = τ (it , it+1 ) + −ln fit (xt ) t=0 t=1 must be found f is related to the resolution of discrete rates • within continuous emission rates, and τ the facility of changing state. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.8/24
  • 9. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 10. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. We will use a evolutionary algorithm to find Ak . • s,γ A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 11. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. We will use a evolutionary algorithm to find Ak . • s,γ Finally! • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 12. Individual representation n integer sequence,1 < qij < E, representing • automaton state and id i of last document in sequence. i arrives at 0 ≤ ti ≤ T (intervals xi = ti − ti−1 ). • ··· t1 t2 tn | qt1 , tk1 | qtk1 +1 , tk2 | · · · | qtf , tn | Fitness function = cost function. • Initial population: documents chosen at random • that split the document stream in intervals, with random states. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.10/24
  • 13. Crossover g11 g1i g1f1 ··· ··· q11 , (t1 , · · · ) q1i , (t − n1 , · · · , t, · · · t + m1 ) q1f1 , (· · · , tn ) ··· ··· g21 g2j g2f2 ··· ··· q21 , (t1 , · · · ) q2j , (t − n2 , · · · , t, · · · t + m2 ) q2f2 , (· · · , tn ) ··· ··· c.p. g11 g1i−1 g2j+1 g2f2 ··· ··· q11 q1i−1 q2j+1 q2f2 (t1 , · · · ) (· · · , t − n1 − 1) (t + m2 + 1, · · · ) (· · · , tn ) ? ··· ··· c.p. g21 g2j−1 g1i+1 g1f1 ··· ··· q21 q2j−1 q1i+1 q1f1 (t1 , · · · ) (· · · , t − n2 − 1) (t + m1 + 1, · · · ) (· · · , tn ) ? ··· ··· A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.11/24
  • 14. Mutation Several mutation • operators • Increment state by one • Merge two genes, state taken randomly • Split a gene in two: one with original state, another ±1. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.12/24
  • 15. Effect of crossover 500 400 Generation N. 300 stream a 200 stream b stream c 100 10 20 30 40 50 Crossover rate % A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.13/24
  • 16. Effect of mutation 500 400 Generation N. 300 200 stream a 100 stream b stream c 0 0 5 10 15 20 25 30 Mutation rate % A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.14/24
  • 17. Effect of population size 500 stream a stream b 400 stream c Generation N. 300 200 100 0 100 200 300 400 500 Population size A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.15/24
  • 18. Effect of number of generations 9e+05 8e+05 7e+05 Cost function 6e+05 stream a 5e+05 stream b stream c 4e+05 3e+05 2e+05 0 100 200 300 400 500 Generation N. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.16/24
  • 19. Time results State n. Viterbi Evo. Alg Ex. time Cost Ex. time Cost (Av. Cost, Std. dev.) 15 2319.36 277402 1678.61 277712 (279385.6, 980.11) 20 3117.28 277306 2182.12 277528 (278980.4, 1114.91) 25 3835.37 277260 2033.81 277270 (279472.6, 1116.03) Time comparison 4000 3000 time (s.) 2000 1000 Evolutionary algorithm Viterbi 0 15 20 25 states A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.17/24
  • 20. Predicting the state of new arri- vals Main point of this work: • to predict whether buzz is going up or down. Several possible • approaches: using Viterbi algorithm over the whole sequence, and reusing evolutionary algorithms. Easy approach for a sin- • gle state: assume current trend continues. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.18/24
  • 21. Local approximation: results Previous substream A. T. Old s. New s. Trend · · · 38 38 39 41 49 49 ↓ 52 12 0 · · · 41 49 49 52 68 69 ↑ 69 3 4 · · · 88 89 90 90 91 92 → 95 0 0 A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.19/24
  • 22. But it breaks down after a while date GA approx. 0(2004-04-02) 7(0.694669) ··· ··· 74(2004-06-15) 14(0.797281) 75(2004-06-16) 24(0.970706) 76(2004-06-17) 19(0.87973) 77(2004-06-18) 19(0.87973) 19(0.87973) 78(2004-06-19) 0(0.605263) 19(0.87973) 79(2004-06-20) 0(0.605263) 19(0.87973) A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.20/24
  • 23. Fast GA for modelling new arri- vals Using results of previous fitting • Chromosome extended, and last gene mutation • probability higher. 1 GA fit approx. fit 0,9 Frequency 0,8 0,7 0,6 0 100 50 150 Time A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.21/24
  • 24. Fast GA: Results Subst. len. New Subs. len. T. w/out seed T. w/ seed 219900 100 141.45 (79.09) 3895.28 219000 1000 144.75 (81.96) 210000 10000 166.73 (79.32) Subst. Len. New Subs. len. T. w/out seed T. w/ seed 3032 100 54.6 2632 500 92.247 5048.49 2132 1000 294.97 1132 2000 570.41 A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.22/24
  • 25. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 26. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 27. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. Extending this EA allows fitting a stream which • is an extension of a previously fitted substream in a very short time. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 28. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. Extending this EA allows fitting a stream which • is an extension of a previously fitted substream in a very short time. We plan to study correlations among document • streams, to automatically detect the occurrence of new topics composed of multi-word concepts. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 29. The end Thanks for your attention • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24
  • 30. The end Thanks for your attention • Any question? • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24