SlideShare a Scribd company logo
1 of 30
Download to read offline
A Genetic Algorithm for Dynamic
   Modelling and Prediction of
  Activity in Document Streams
                Lourdes Araujo,JJ Merelo
           lurdes@lsi.uned.es, jj@merelo.net


         Dpto. Lenguajes y Sistemas Inform´ ticos
                                           a
      Universidad Nacional de Educaci´ n a Distancia
                                      o
     Dpto. Arquitectura y Tecnolog´a de Computadores
                                  ı
                 Universidad de Granada
                           Spain


                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.1/24
Why
     Document
 •
     metadata, such as
     arrival time help
     organize document
     streams.
     Temporal
 •
     information help
     make sense of
     document streams
     such as e-mails and
     news items.
     Its study combines
 •
     content analysis and
     time series mode-
     lling.       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.2/24
Showing interest
      Hypothesis: Explosions in interest match points
  •
      in time where arrival intensity increases sharply.
      In general, arrival time is quite irregular.
  •


                               Y

               #document arrivals




                                                                                        X
                                               Time



                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.3/24
Regularizing irregularity
      A cost function, that reflects
  •
      how difficult is hiking from
      one state to another, is
      introduced.
      Intervals of similar frequency
  •
      should be grouped in a sin-
      gle state, so change of sta-
      te will be penalyzed. But we
      shouldn’t overdo it.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.4/24
Kleinberg’s model
      The document stream is modeled as an infinite
  •
      state automaton, A, which emits messages with
      different frequencies.
      Each state has a frequency assigned.
  •

      Bursts are indicated by transitions from a lower
  •
      to a higher state.
      Frequency changes are controlled by assigning
  •
      costs to state changes, avoiding small explosions
      and making identification of real explosions
      easier.



                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.5/24
Infinite state automaton model
      Generation of time sequence
  •
      based on a exponential
      distribution.
       • Time interval x between
          message i and i + 1
          follows exponential
          distribution function
          f (x) = αe−αx , for α > 0.
       • Expected value for the
          interval is α−1 .




                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.6/24
First things first: two state mo-
del
      Basic model 2-State probabilistic automata A: q0
  •
      (low emission rate) y q1 (high).
                                                          q1
                                 q0



      n + 1 messages, n intervals: Bayes procedure
  •
      used to fit to a conditional probability of a state
      sequence: q = (qi1 , · · · , qin ):
                                                                n
                          1−p
          c(q|x) = b ln (     )+(     −ln fit (xt ))
                           p      t=1

      where b = state transitions, 1st term: low number
      of transitions, 2nd : states fit the sequence
                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.7/24
To the infinite and beyond
      Given a sequence of intervals x =
  •
      (x1 , x2 , · · · , xn ), a sequence q = (qi1 , · · · , qin )
      that minimizes
                        n−1                                           n
           c(q|x) =              τ (it , it+1 ) +                             −ln fit (xt )
                        t=0                                         t=1

      must be found
      f is related to the resolution of discrete rates
  •
      within continuous emission rates, and τ the
      facility of changing state.


                         A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.8/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                       s,γ
      with k states.




                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                        s,γ
      with k states.
      We will use a evolutionary algorithm to find Ak .
  •
                                                      s,γ




                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Infinite is a bit too much
      A∗ that minimizes c(q|x) is restricted to Ak
  •
       s,γ                                        s,γ
      with k states.
      We will use a evolutionary algorithm to find Ak .
  •
                                                      s,γ

      Finally!
  •




                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
Individual representation
      n integer sequence,1 < qij < E, representing
  •
      automaton state and id i of last document in
      sequence.
      i arrives at 0 ≤ ti ≤ T (intervals xi = ti − ti−1 ).
  •

                                     ···
            t1            t2                                                    tn
       | qt1 , tk1 | qtk1 +1 , tk2 | · · ·                                  | qtf , tn |

      Fitness function = cost function.
  •

      Initial population: documents chosen at random
  •
      that split the document stream in intervals, with
      random states.


                     A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.10/24
Crossover
           g11                                       g1i                                                  g1f1
                          ···                                                          ···

    q11 , (t1 , · · · )          q1i , (t − n1 , · · · , t, · · · t + m1 )                        q1f1 , (· · · , tn )
                          ···                                                          ···


           g21                                       g2j                                                  g2f2
                          ···                                                          ···

    q21 , (t1 , · · · )          q2j , (t − n2 , · · · , t, · · · t + m2 )                        q2f2 , (· · · , tn )
                          ···                                                          ···


                                                      c.p.
    g11                         g1i−1                                     g2j+1                                   g2f2
                  ···                                                                              ···

    q11                         q1i−1                                      q2j+1                                  q2f2

 (t1 , · · · )            (· · · , t − n1 − 1)                   (t + m2 + 1, · · · )                         (· · · , tn )
                                                        ?
                  ···                                                                              ···


                                                      c.p.
    g21                         g2j−1                                      g1i+1                                  g1f1
                  ···                                                                              ···

    q21                         q2j−1                                      q1i+1                                  q1f1

 (t1 , · · · )            (· · · , t − n2 − 1)                   (t + m1 + 1, · · · )                         (· · · , tn )
                                                       ?
                  ···                                                                              ···


                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.11/24
Mutation
      Several mutation
  •
      operators
       • Increment state by
         one
       • Merge two genes,
         state taken randomly
       • Split a gene in two:
         one with original
         state, another ±1.




                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.12/24
Effect of crossover
                500


                400
Generation N.




                300


                       stream a
                200    stream b
                       stream c

                100
                  10    20              30                          40                       50
                                  Crossover rate %




                                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.13/24
Effect of mutation
                500


                400
Generation N.



                300


                200

                                                             stream a
                100                                          stream b
                                                             stream c

                 0
                  0   5   10      15       20                         25               30
                            Mutation rate %




                              A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.14/24
Effect of population size
                500
                                                                   stream a
                                                                   stream b
                400                                                stream c
Generation N.



                300


                200


                100


                 0
                 100   200        300                        400                      500
                             Population size




                               A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.15/24
Effect of number of generations
                9e+05

                8e+05

                7e+05
Cost function




                6e+05
                               stream a
                5e+05          stream b
                               stream c
                4e+05

                3e+05

                2e+05
                     0   100   200       300                        400                500
                               Generation N.




                                A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.16/24
Time results
    State n.                      Viterbi                                          Evo. Alg
               Ex. time                     Cost        Ex. time                Cost (Av. Cost, Std. dev.)
         15    2319.36                 277402           1678.61                277712 (279385.6, 980.11)
         20    3117.28                 277306           2182.12                277528 (278980.4, 1114.91)
         25    3835.37                 277260           2033.81                277270 (279472.6, 1116.03)


                                                      Time comparison
                           4000
                           3000
               time (s.)

                           2000
                           1000




                                                                     Evolutionary algorithm

                                                                     Viterbi
                           0




                                  15                          20                              25

                                                            states



                                       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.17/24
Predicting the state of new arri-
vals
      Main point of this work:
  •
      to predict whether buzz
      is going up or down.
      Several possible
  •
      approaches: using
      Viterbi algorithm over
      the whole sequence, and
      reusing evolutionary
      algorithms.
      Easy approach for a sin-
  •
      gle state: assume current
      trend continues.

                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.18/24
Local approximation: results
 Previous substream          A. T.            Old s.             New s.               Trend
 · · · 38 38 39 41 49 49                                                                   ↓
                                52                12                   0
 · · · 41 49 49 52 68 69                                                                   ↑
                                69                 3                   4
 · · · 88 89 90 90 91 92                                                                  →
                                95                 0                   0




                       A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.19/24
But it breaks down after a while
 date             GA                                       approx.
 0(2004-04-02)    7(0.694669)
 ···              ···
 74(2004-06-15)   14(0.797281)
 75(2004-06-16)   24(0.970706)
 76(2004-06-17)   19(0.87973)
 77(2004-06-18)   19(0.87973)                              19(0.87973)
 78(2004-06-19)   0(0.605263)                              19(0.87973)
 79(2004-06-20)   0(0.605263)                              19(0.87973)



                  A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.20/24
Fast GA for modelling new arri-
vals
      Using results of previous fitting
  •

      Chromosome extended, and last gene mutation
  •
      probability higher.
                          1

                                                                          GA fit
                                                                          approx. fit
                         0,9
             Frequency


                         0,8




                         0,7




                         0,6
                               0                                    100
                                              50                                           150
                                                        Time




                                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.21/24
Fast GA: Results
 Subst. len. New Subs. len.             T. w/out seed                           T. w/ seed
  219900          100                                                      141.45 (79.09)
                                              3895.28
  219000         1000                                                      144.75 (81.96)
  210000         10000                                                     166.73 (79.32)

 Subst. Len. New Subs. len.               T. w/out seed                      T. w/ seed
   3032            100                                                              54.6
   2632            500                                                           92.247
                                               5048.49
   2132           1000                                                           294.97
   1132           2000                                                           570.41

                    A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.22/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.
     Extending this EA allows fitting a stream which
 •
     is an extension of a previously fitted substream in
     a very short time.




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
Conclusions

     The presented system dynamically detects
 •
     changes on the trends of interest on a document
     stream.
     An EA allows to deal with very large sequences
 •
     of documents in a reasonable time.
     Extending this EA allows fitting a stream which
 •
     is an extension of a previously fitted substream in
     a very short time.
     We plan to study correlations among document
 •
     streams, to automatically detect the occurrence of
     new topics composed of multi-word concepts.
                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
The end




     Thanks for your attention
 •




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24
The end




     Thanks for your attention
 •

     Any question?
 •




                   A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24

More Related Content

Similar to Dynamic modelling of document streams

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filternozomuhamada
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejatknowdiff
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataHopeBay Technologies, Inc.
 
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Michael Maroun
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
First paper with the NITheCS affiliation
First paper with the NITheCS affiliationFirst paper with the NITheCS affiliation
First paper with the NITheCS affiliationRene Kotze
 
On the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemOn the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemijfcstjournal
 
Module v sp
Module v spModule v sp
Module v spVijaya79
 
SSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesSSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesDaniel Fagerstrom
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and youJiahao Chen
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisAnissa ATMANI
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...IFPRI-EPTD
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detectionRavi Kiran B.
 

Similar to Dynamic modelling of document streams (20)

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejat
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series Data
 
Presentation
PresentationPresentation
Presentation
 
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...Closed-form Solutions of Generalized Greenshield Relations for the Social and...
Closed-form Solutions of Generalized Greenshield Relations for the Social and...
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
stochastic processes-2.ppt
stochastic processes-2.pptstochastic processes-2.ppt
stochastic processes-2.ppt
 
First paper with the NITheCS affiliation
First paper with the NITheCS affiliationFirst paper with the NITheCS affiliation
First paper with the NITheCS affiliation
 
On the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic systemOn the principle of optimality for linear stochastic dynamic system
On the principle of optimality for linear stochastic dynamic system
 
Module v sp
Module v spModule v sp
Module v sp
 
Fourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdfFourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdf
 
SSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-SpacesSSVM07 Spatio-Temporal Scale-Spaces
SSVM07 Spatio-Temporal Scale-Spaces
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Molecular models, threads and you
Molecular models, threads and youMolecular models, threads and you
Molecular models, threads and you
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detection
 

More from Juan J. Merelo

Acta de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésActa de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésJuan J. Merelo
 
Ciencia y videojuegos v4
Ciencia y videojuegos v4Ciencia y videojuegos v4
Ciencia y videojuegos v4Juan J. Merelo
 
Como triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónComo triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónJuan J. Merelo
 
Benchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationBenchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationJuan J. Merelo
 
Benchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsBenchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsJuan J. Merelo
 
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantesJuan J. Merelo
 
Creación de panorámicas con Hugin
Creación de panorámicas con HuginCreación de panorámicas con Hugin
Creación de panorámicas con HuginJuan J. Merelo
 
Introducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceIntroducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceJuan J. Merelo
 
Introducción al 7º hackathon UGR
Introducción al 7º hackathon UGRIntroducción al 7º hackathon UGR
Introducción al 7º hackathon UGRJuan J. Merelo
 
Nuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXINuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXIJuan J. Merelo
 
Open Access and Copyleft
Open Access and CopyleftOpen Access and Copyleft
Open Access and CopyleftJuan J. Merelo
 
Luminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceLuminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceJuan J. Merelo
 
Enforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesEnforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesJuan J. Merelo
 
Evostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceEvostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceJuan J. Merelo
 
Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Juan J. Merelo
 
Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Juan J. Merelo
 
Redes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaRedes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaJuan J. Merelo
 
¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?Juan J. Merelo
 
Presentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoPresentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoJuan J. Merelo
 
El software libre contado a los universitarios
El software libre contado a los universitariosEl software libre contado a los universitarios
El software libre contado a los universitariosJuan J. Merelo
 

More from Juan J. Merelo (20)

Acta de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergésActa de defunción de juan monserrat vergés
Acta de defunción de juan monserrat vergés
 
Ciencia y videojuegos v4
Ciencia y videojuegos v4Ciencia y videojuegos v4
Ciencia y videojuegos v4
 
Como triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatónComo triunfar con tu proyecto en un hackatón
Como triunfar con tu proyecto en un hackatón
 
Benchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computationBenchmarking languages for evolutionary computation
Benchmarking languages for evolutionary computation
 
Benchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithmsBenchmarking languages for evolutionary algorithms
Benchmarking languages for evolutionary algorithms
 
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
8º hackatón de proyectos libres de la UGR: Ayuda para los participantes
 
Creación de panorámicas con Hugin
Creación de panorámicas con HuginCreación de panorámicas con Hugin
Creación de panorámicas con Hugin
 
Introducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con LuminanceIntroducción a HDR y Tonemapping con Luminance
Introducción a HDR y Tonemapping con Luminance
 
Introducción al 7º hackathon UGR
Introducción al 7º hackathon UGRIntroducción al 7º hackathon UGR
Introducción al 7º hackathon UGR
 
Nuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXINuevas tecnologías, Modas y docencia en el siglo XXI
Nuevas tecnologías, Modas y docencia en el siglo XXI
 
Open Access and Copyleft
Open Access and CopyleftOpen Access and Copyleft
Open Access and Copyleft
 
Luminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminanceLuminance 2014 presentaciión sobre luminance
Luminance 2014 presentaciión sobre luminance
 
Enforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence TechniquesEnforcing Corporate Security Policies via Computational Intelligence Techniques
Enforcing Corporate Security Policies via Computational Intelligence Techniques
 
Evostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conferenceEvostar 2014 Introduction to the conference
Evostar 2014 Introduction to the conference
 
Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014Presentación Open Data Day en Granada, 2014
Presentación Open Data Day en Granada, 2014
 
Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón. Introducción al uso de git, el sistema de control de fuentes más molón.
Introducción al uso de git, el sistema de control de fuentes más molón.
 
Redes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisaRedes sociales-en-un-rato-piiisa
Redes sociales-en-un-rato-piiisa
 
¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?¿Necesitas a la oficina de software libre de la Universidad de Granada?
¿Necesitas a la oficina de software libre de la Universidad de Granada?
 
Presentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadinoPresentación 8º CUSL/6º CUSL granadino
Presentación 8º CUSL/6º CUSL granadino
 
El software libre contado a los universitarios
El software libre contado a los universitariosEl software libre contado a los universitarios
El software libre contado a los universitarios
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Dynamic modelling of document streams

  • 1. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams Lourdes Araujo,JJ Merelo lurdes@lsi.uned.es, jj@merelo.net Dpto. Lenguajes y Sistemas Inform´ ticos a Universidad Nacional de Educaci´ n a Distancia o Dpto. Arquitectura y Tecnolog´a de Computadores ı Universidad de Granada Spain A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.1/24
  • 2. Why Document • metadata, such as arrival time help organize document streams. Temporal • information help make sense of document streams such as e-mails and news items. Its study combines • content analysis and time series mode- lling. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.2/24
  • 3. Showing interest Hypothesis: Explosions in interest match points • in time where arrival intensity increases sharply. In general, arrival time is quite irregular. • Y #document arrivals X Time A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.3/24
  • 4. Regularizing irregularity A cost function, that reflects • how difficult is hiking from one state to another, is introduced. Intervals of similar frequency • should be grouped in a sin- gle state, so change of sta- te will be penalyzed. But we shouldn’t overdo it. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.4/24
  • 5. Kleinberg’s model The document stream is modeled as an infinite • state automaton, A, which emits messages with different frequencies. Each state has a frequency assigned. • Bursts are indicated by transitions from a lower • to a higher state. Frequency changes are controlled by assigning • costs to state changes, avoiding small explosions and making identification of real explosions easier. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.5/24
  • 6. Infinite state automaton model Generation of time sequence • based on a exponential distribution. • Time interval x between message i and i + 1 follows exponential distribution function f (x) = αe−αx , for α > 0. • Expected value for the interval is α−1 . A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.6/24
  • 7. First things first: two state mo- del Basic model 2-State probabilistic automata A: q0 • (low emission rate) y q1 (high). q1 q0 n + 1 messages, n intervals: Bayes procedure • used to fit to a conditional probability of a state sequence: q = (qi1 , · · · , qin ): n 1−p c(q|x) = b ln ( )+( −ln fit (xt )) p t=1 where b = state transitions, 1st term: low number of transitions, 2nd : states fit the sequence A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.7/24
  • 8. To the infinite and beyond Given a sequence of intervals x = • (x1 , x2 , · · · , xn ), a sequence q = (qi1 , · · · , qin ) that minimizes n−1 n c(q|x) = τ (it , it+1 ) + −ln fit (xt ) t=0 t=1 must be found f is related to the resolution of discrete rates • within continuous emission rates, and τ the facility of changing state. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.8/24
  • 9. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 10. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. We will use a evolutionary algorithm to find Ak . • s,γ A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 11. Infinite is a bit too much A∗ that minimizes c(q|x) is restricted to Ak • s,γ s,γ with k states. We will use a evolutionary algorithm to find Ak . • s,γ Finally! • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.9/24
  • 12. Individual representation n integer sequence,1 < qij < E, representing • automaton state and id i of last document in sequence. i arrives at 0 ≤ ti ≤ T (intervals xi = ti − ti−1 ). • ··· t1 t2 tn | qt1 , tk1 | qtk1 +1 , tk2 | · · · | qtf , tn | Fitness function = cost function. • Initial population: documents chosen at random • that split the document stream in intervals, with random states. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.10/24
  • 13. Crossover g11 g1i g1f1 ··· ··· q11 , (t1 , · · · ) q1i , (t − n1 , · · · , t, · · · t + m1 ) q1f1 , (· · · , tn ) ··· ··· g21 g2j g2f2 ··· ··· q21 , (t1 , · · · ) q2j , (t − n2 , · · · , t, · · · t + m2 ) q2f2 , (· · · , tn ) ··· ··· c.p. g11 g1i−1 g2j+1 g2f2 ··· ··· q11 q1i−1 q2j+1 q2f2 (t1 , · · · ) (· · · , t − n1 − 1) (t + m2 + 1, · · · ) (· · · , tn ) ? ··· ··· c.p. g21 g2j−1 g1i+1 g1f1 ··· ··· q21 q2j−1 q1i+1 q1f1 (t1 , · · · ) (· · · , t − n2 − 1) (t + m1 + 1, · · · ) (· · · , tn ) ? ··· ··· A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.11/24
  • 14. Mutation Several mutation • operators • Increment state by one • Merge two genes, state taken randomly • Split a gene in two: one with original state, another ±1. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.12/24
  • 15. Effect of crossover 500 400 Generation N. 300 stream a 200 stream b stream c 100 10 20 30 40 50 Crossover rate % A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.13/24
  • 16. Effect of mutation 500 400 Generation N. 300 200 stream a 100 stream b stream c 0 0 5 10 15 20 25 30 Mutation rate % A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.14/24
  • 17. Effect of population size 500 stream a stream b 400 stream c Generation N. 300 200 100 0 100 200 300 400 500 Population size A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.15/24
  • 18. Effect of number of generations 9e+05 8e+05 7e+05 Cost function 6e+05 stream a 5e+05 stream b stream c 4e+05 3e+05 2e+05 0 100 200 300 400 500 Generation N. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.16/24
  • 19. Time results State n. Viterbi Evo. Alg Ex. time Cost Ex. time Cost (Av. Cost, Std. dev.) 15 2319.36 277402 1678.61 277712 (279385.6, 980.11) 20 3117.28 277306 2182.12 277528 (278980.4, 1114.91) 25 3835.37 277260 2033.81 277270 (279472.6, 1116.03) Time comparison 4000 3000 time (s.) 2000 1000 Evolutionary algorithm Viterbi 0 15 20 25 states A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.17/24
  • 20. Predicting the state of new arri- vals Main point of this work: • to predict whether buzz is going up or down. Several possible • approaches: using Viterbi algorithm over the whole sequence, and reusing evolutionary algorithms. Easy approach for a sin- • gle state: assume current trend continues. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.18/24
  • 21. Local approximation: results Previous substream A. T. Old s. New s. Trend · · · 38 38 39 41 49 49 ↓ 52 12 0 · · · 41 49 49 52 68 69 ↑ 69 3 4 · · · 88 89 90 90 91 92 → 95 0 0 A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.19/24
  • 22. But it breaks down after a while date GA approx. 0(2004-04-02) 7(0.694669) ··· ··· 74(2004-06-15) 14(0.797281) 75(2004-06-16) 24(0.970706) 76(2004-06-17) 19(0.87973) 77(2004-06-18) 19(0.87973) 19(0.87973) 78(2004-06-19) 0(0.605263) 19(0.87973) 79(2004-06-20) 0(0.605263) 19(0.87973) A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.20/24
  • 23. Fast GA for modelling new arri- vals Using results of previous fitting • Chromosome extended, and last gene mutation • probability higher. 1 GA fit approx. fit 0,9 Frequency 0,8 0,7 0,6 0 100 50 150 Time A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.21/24
  • 24. Fast GA: Results Subst. len. New Subs. len. T. w/out seed T. w/ seed 219900 100 141.45 (79.09) 3895.28 219000 1000 144.75 (81.96) 210000 10000 166.73 (79.32) Subst. Len. New Subs. len. T. w/out seed T. w/ seed 3032 100 54.6 2632 500 92.247 5048.49 2132 1000 294.97 1132 2000 570.41 A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.22/24
  • 25. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 26. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 27. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. Extending this EA allows fitting a stream which • is an extension of a previously fitted substream in a very short time. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 28. Conclusions The presented system dynamically detects • changes on the trends of interest on a document stream. An EA allows to deal with very large sequences • of documents in a reasonable time. Extending this EA allows fitting a stream which • is an extension of a previously fitted substream in a very short time. We plan to study correlations among document • streams, to automatically detect the occurrence of new topics composed of multi-word concepts. A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.23/24
  • 29. The end Thanks for your attention • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24
  • 30. The end Thanks for your attention • Any question? • A Genetic Algorithm for Dynamic Modelling and Prediction of Activity in Document Streams– p.24/24