SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Detecting influenza epidemics using search engine query data                                                   1




                                 Detecting influenza epidemics using
                                 search engine query data
                                 Jeremy Ginsberg1, Matthew H. Mohebbi1, Rajan S. Patel1, Lynnette Brammer2,
                                 Mark S. Smolinski1 & Larry Brilliant1
                                 Google Inc. 2Centers for Disease Control and Prevention
                                 1




                                 Epidemics of seasonal influenza are a major public health concern, causing
                                 tens of millions of respiratory illnesses and 250,000 to 500,000 deaths
                                 worldwide each year1. In addition to seasonal influenza, a new strain of
                                 influenza virus against which no prior immunity exists and that demonstrates
                                 human-to-human transmission could result in a pandemic with millions
                                 of fatalities2. Early detection of disease activity, when followed by a rapid
                                 response, can reduce the impact of both seasonal and pandemic influenza3,4.
                                 One way to improve early detection is to monitor health-seeking behavior
                                 in the form of online web search queries, which are submitted by millions
                                 of users around the world each day. Here we present a method of analyzing
                                 large numbers of Google search queries to track influenza-like illness in
                                 a population. Because the relative frequency of certain queries is highly
                                 correlated with the percentage of physician visits in which a patient presents
                                 with influenza-like symptoms, we can accurately estimate the current level of
                                 weekly influenza activity in each region of the United States, with a reporting
                                 lag of about one day. This approach may make it possible to utilize search
                                 queries to detect influenza epidemics in areas with a large population of web
                                 search users.




                                 This paper was originally published in Nature Vol 457, 19 February 2009,
                                 doi:10.1038/nature07634
                                 http://dx.doi.org/10.1038/nature07634
Detecting influenza epidemics using search engine query data                                                                                               2

Traditional surveillance systems, including those employed by           ÎČ1 is the multiplicative coefficient, and Δ is the error term.
the U.S. Centers for Disease Control and Prevention (CDC) and           logit(P) is the natural log of P/(1-P).
the European Influenza Surveillance Scheme (EISS), rely on
both virologic and clinical data, including influenza-like illness      Publicly available historical data from the CDC’s U.S. Influenza
(ILI) physician visits. CDC publishes national and regional data        Sentinel Provider Surveillance Network12 was used to help
from these surveillance systems on a weekly basis, typically            build our models. For each of the nine surveillance regions of
with a 1-2 week reporting lag.                                          the United States, CDC reported the average percentage of
                                                                        all outpatient visits to sentinel providers that were ILI-related
In an attempt to provide faster detection, innovative                   on a weekly basis. No data were provided for weeks outside
surveillance systems have been created to monitor indirect              of the annual influenza season, and we excluded such dates
signals of influenza activity, such as call volume to telephone         from model fitting, though our model was used to generate
triage advice lines5 and over-the-counter drug sales6. About            unvalidated ILI estimates for these weeks.
90 million American adults are believed to search online for
information about specific diseases or medical problems each            We designed an automated method of selecting ILI-related
year7, making web search queries a uniquely valuable source             search queries, requiring no prior knowledge about influenza.
of information about health trends. Previous attempts at using          We measured how effectively our model would fit the CDC
online activity for influenza surveillance have counted search          ILI data in each region if we used only a single query as the
queries submitted to a Swedish medical website8, visitors to            explanatory variable Q. Each of the 50 million candidate
certain pages on a U.S. health website9, and user clicks on a           queries in our database was separately tested in this manner,
search keyword advertisement in Canada10. A set of Yahoo                to identify the search queries which could most accurately
search queries containing the words “flu” or “influenza” were           model the CDC ILI visit percentage in each region. Our
found to correlate with virologic and mortality surveillance            approach rewarded queries which exhibited regional variations
data over multiple years11.                                             similar to the regional variations in CDC ILI data: the chance
Our proposed system builds on these earlier works by utilizing          that a random search query can fit the ILI percentage in all
an automated method of discovering influenza-related search             nine regions is considerably less than the chance that a
queries. By processing hundreds of billions of individual               random search query can fit a single location (Supplementary
searches from five years of Google web search logs, our                 Figure 2).
system generates more comprehensive models for use in
influenza surveillance, with regional and state-level estimates       The automated query selection process produced a list of the
of influenza-like illness (ILI) activity in the United States.        highest scoring search queries, sorted by mean Z-transformed
Widespread global usage of online search engines may enable           correlation across the nine regions. To decide which queries
models to eventually be developed in international settings.          would be included in the ILI-related query fraction Q, we
                                                                      considered different sets of N top scoring queries. We
By aggregating historical logs of online web search queries           measured the performance of these models based on the
submitted between 2003 and 2008, we computed time series              sum of the queries in each set, and picked N such that we
of weekly counts for 50 million of the most common search             obtained the best fit against out-of-sample ILI data across the
queries in the United States. Separate aggregate weekly               nine regions (Figure 1).
counts were kept for every query in each state. No information
about the identity of any user was retained. Each time series           Combining the N=45 highest-scoring queries was found to
was normalized by dividing the count for each query in a                obtain the best fit. These 45 search queries, though selected
particular week by the total number of online search queries
submitted in that location during the week, resulting in a query                  0.95

fraction (Supplementary Figure 1).
                                                                                                                        45 queries
We sought to develop a simple model which estimates the
probability that a random physician visit in a particular region
                                                                     Mean correlation




is related to an influenza-like illness (ILI); this is equivalent                       0. 9
to the percentage of ILI-related physician visits. A single
explanatory variable was used: the probability that a random
search query submitted from the same region is ILI-related, as
determined by an automated method described below. We fit
a linear model using the log-odds of an ILI physician visit and                   0.85
the log-odds of an ILI-related search query:                                                   0   10   20   30   40       50         60   70   80   90   100
                                                                                                                  Number of queries

                                                                       Figure 1: An evaluation of how many top-scoring queries to include in the
                 logit(P) = ÎČ0 + ÎČ1 × logit(Q) + Δ
                                                                       ILI-related query fraction. Maximal performance at estimating out-of-sample
                                                                       points during cross-validation was obtained by summing the top 45 search
where P is the percentage of ILI physician visits, Q is                queries. A steep drop in model performance occurs after adding query 81,
the ILI-related query fraction, ÎČ0 is the intercept,                   which is “oscar nominations”.
Detecting influenza epidemics using search engine query data                                                                                        3

automatically, appeared to be consistently related to influenza-                                                 Top 45 Queries      Next 55 Queries
                                                                      Search Query Topic                         N Weighted          N Weighted
like illnesses. Other search queries in the top 100, not included     Influenza Complication                     11 18.15            5 3.40
in our model, included topics like “high school basketball”           Cold/Flu Remedy                            8 5.05              6 5.03
which tend to coincide with influenza season in the United            General Influenza Symptoms                 5    2.60           1   0.07
                                                                      Term for Influenza                         4    3.74           6 0.30
States (Table 1).                                                     Specific Influenza Symptom                 4    2.54           6 3.74
                                                                      Symptoms of an Influenza Complication      4    2.21           2 0.92
Using this ILI-related query fraction as the explanatory              Antibiotic Medication                      3    6.23           3 3.17
                                                                      General Influenza Remedies                 2    0.18           1   0.32
variable, we fit a final linear model to weekly ILI percentages       Symptoms of a Related Disease              2    1.66           2 0.77
between 2003 and 2007 for all nine regions together, thus             Antiviral Medication                       1    0.39           1   0.74
learning a single, region-independent coefficient. The                Related Disease                            1    6.66           3 3.77
                                                                      Unrelated to Influenza                     0 0.00              19 28.37
model was able to obtain a good fit with CDC-reported ILI                                                        45 49.40            55 50.60
percentages, with a mean correlation of 0.90 (min=0.80,
max=0.96, n=9 regions) (Figure 2).                                    Table 1: Topics found in search queries which were found to be most correlated
                                                                      with CDC ILI data. The top 45 queries were used in our final model; the next
                                                                      55 queries are presented for comparison purposes. The number of queries in
The final model was validated on 42 points per region of
                                                                      each topic is indicated, as well as query volume-weighted counts, reflecting
previously untested data from 2007-2008, which were                   the relative frequency of queries in each topic.
excluded from all prior steps. Estimates generated for these
42 points obtained a mean correlation of 0.97 (min=0.92,
max=0.99, n=9 regions) with the CDC-observed ILI                    This system is not designed to be a replacement for traditional
percentages.                                                        surveillance networks or supplant the need for laboratory-
                                                                    based diagnoses and surveillance. Notable increases in
Throughout the 2007-2008 influenza season, we used                  ILI-related search activity may indicate a need for public
preliminary versions of our model to generate ILI estimates,        health inquiry to identify the pathogen or pathogens involved.
and shared our results each week with the Epidemiology and          Demographic data, often provided by traditional surveillance,
Prevention Branch of Influenza Division at CDC to evaluate          cannot be obtained using search queries.
timeliness and accuracy. Figure 3 illustrates data available
at different points throughout the season. Across the nine           In the event that a pandemic-causing strain of influenza
regions, we were able to consistently estimate the current           emerges, accurate and early detection of ILI percentages may
ILI percentage 1-2 weeks ahead of the publication of reports         enable public health officials to mount a more effective early
by the CDC’s U.S. Influenza Sentinel Provider Surveillance           response. Though we cannot be certain how search engine
Network.                                                             users will behave in such a scenario, affected individuals may
                                                                     submit the same ILI-related search queries used in our model.
Because localized influenza surveillance is particularly useful      Alternatively, panic and concern among healthy individuals
for public health planning, we sought to further validate our        may cause a surge in the ILI-related query fraction and
model against weekly ILI percentages for individual states.          exaggerated estimates of the ongoing ILI percentage.
CDC does not make state-level data publicly available, but we
                                                                                     12
validated our model against state-reported ILI percentages
                                                                                     10
provided by the state of Utah, and obtained a correlation of
                                                                    ILI percentage




                                                                                     8
0.90 across 42 validation points (Supplementary Figure 3).
                                                                                     6

Google web search queries can be used to accurately estimate                         4

influenza-like illness percentages in each of the nine public                        2

health regions of the United States. Because search queries                          0
                                                                                          2004   2005        2006             2007           2008
can be processed quickly, the resulting ILI estimates were            Figure 2: A comparison of model estimates for the Mid-Atlantic Region (black)
consistently 1-2 weeks ahead of CDC ILI surveillance reports.         against CDC-reported ILI percentages (red), including points over which the
The early detection provided by this approach may become an           model was fit and validated. A correlation of 0.85 was obtained over 128
important line of defense against future influenza epidemics          points from this region to which the model was fit, while a correlation of 0.96
                                                                      was obtained over 42 validation points. 95% prediction intervals are  indicated.
in the United States, and perhaps eventually in international
settings.
                                                                    The search queries in our model are not, of course, exclusively
Up-to-date influenza estimates may enable public health             submitted by users who are experiencing influenza-like
officials and health professionals to better respond to             symptoms, and the correlations we observe are only
seasonal epidemics. If a region experiences an early, sharp         meaningful across large populations. Despite strong historical
increase in ILI physician visits, it may be possible to focus       correlations, our system remains susceptible to false alerts
additional resources on that region to identify the etiology of     caused by a sudden increase in ILI-related queries. An unusual
the outbreak, providing extra vaccine capacity or raising local     event, such as a drug recall for a popular cold or flu remedy,
media awareness as necessary.                                       could cause such a false alert.
Detecting influenza epidemics using search engine query data                                                                                                             4
                                                                                    5
Harnessing the collective intelligence of millions of users,
Google web search logs can provide one of the most timely,                         2.5

broad reaching influenza monitoring systems available today.                        0
                                                                                         Week 43   Week 47   Week 51     Week 3       Week 7     Week 11   Week 15   Week 19
While traditional systems require 1-2 weeks to gather and                                                     Data available as of February 4, 2008
process surveillance data, our estimates are current each
day. As with other syndromic surveillance systems, the data                         5

are most useful as a means to spur further investigation and                       2.5
collection of direct measures of disease activity.
                                                                                    0
                                                                                         Week 43   Week 47   Week 51     Week 3      Week 7     Week 11    Week 15   Week 19
                                                                                                               Data available as of March 3, 2008
This system will be used to track the spread of influenza-
like illness throughout the 2008-2009 influenza season                              5
in the United States. Results are freely available online at
                                                                                   2.5
http://www.google.org/flutrends.
                                                                                    0
                                                                                         Week 43   Week 47   Week 51     Week 3      Week 7     Week 11    Week 15   Week 19
                                                                                                              Data available as of March 31, 2008

Methods
                                                                                    5




                                                                  ILI percentage
Privacy. At Google, we recognize that privacy is important.                        2.5

None of the queries in our project’s database can be                                0
                                                                                         Week 43   Week 47   Week 51    Week 3       Week 7     Week 11    Week 15   Week 19
associated with a particular individual. Our project’s database                                                Data available as of May 12, 2008
retains no information about the identity, IP address, or
                                                                    Figure 3: ILI percentages estimated by our model (black) and provided by
specific physical location of any user. Furthermore, any            CDC (red) in the Mid-Atlantic region, showing data available at four points
original web search logs older than 9 months are being              in the 2007-2008 influenza season. During week 5, we detected a sharply
anonymized in accordance with Google’s Privacy Policy               increasing ILI percentage in the Mid-Atlantic region; similarly, on March 3, our
(http://www.google.com/privacypolicy.html).                         model indicated that the peak ILI percentage had been reached during week
                                                                    8, with sharp declines in weeks 9 and 10. Both results were later confirmed by
                                                                    CDC ILI data.
 Search query database. For the purposes of our database, a
 search query is a complete, exact sequence of terms issued by
 a Google search user; we don’t combine linguistic variations,       lags were considered, but ultimately not used in our modeling
 synonyms, cross-language translations, misspellings, or             process.
 subsequences, though we hope to explore these options
 in future work. For example, we tallied the search query            Each candidate search query was evaluated nine times, once
“indications of flu” separately from the search queries “flu         per region, using the search data originating from a particular
 indications” and “indications of the flu”.                          region to explain the ILI percentage in that region. With four
                                                                     cross-validation folds per region, we obtained 36 different
Our database of queries contains 50 million of the most              correlations between the candidate model’s estimates and
common search queries on all possible topics, without pre-           the observed ILI percentages. To combine these into a single
filtering. Billions of queries occurred infrequently and were        measure of the candidate query’s performance, we applied
excluded. Using the internet protocol (IP) address associated        the Fisher Z-transformation13 to each correlation, and took the
with each search query, the general physical location from           mean of the 36 Z-transformed correlations.
which the query originated can often be identified, including
the nearest major city if within the United States.                 Computation and pre-filtering. In total, we fit 450 million
                                                                    different models to test each of the candidate queries. We
Model data. In the query selection process, we fit per-query        used a distributed computing framework14 to efficiently
models using all weeks between September 28, 2003 and               divide the work among hundreds of machines. The amount
March 11, 2007 (inclusive) for which CDC reported a non-zero        of computation required could have been reduced by making
ILI percentage, yielding 128 training points for each region        assumptions about which queries might be correlated with
(each week is one data point). 42 additional weeks of data          ILI. For example, we could have attempted to eliminate
(March 18, 2007 through May 11, 2008) were reserved for final       non-influenza-related queries before fitting any models.
validation. Search query data before 2003 was not available         However, we were concerned that aggressive filtering might
for this project.                                                   accidentally eliminate valuable data. Furthermore, if the
                                                                    highest-scoring queries seemed entirely unrelated to influenza,
Automated query selection process. Using linear regression          it would provide evidence that our query selection approach
with 4-fold cross validation, we fit models to four 96-point        was invalid.
subsets of the 128 points in each region. Each per-query
model was validated by measuring the correlation between            Constructing the ILI-related query fraction. We concluded
the model’s estimates for the 32 held-out points and CDC’s          the query selection process by choosing to keep the
reported regional ILI percentage at those points. Temporal          search queries whose models obtained the highest mean
Detecting influenza epidemics using search engine query data                                                                           5

Z-transformed correlations across regions: these queries were       References
deemed to be “ILI-related”.
                                                                    1. World Health Organization. Influenza fact sheet.
To combine the selected search queries into a single                   http://www.who.int/mediacentre/factsheets/2003/fs211/
aggregate variable, we summed the query fractions on a                 en/ (2003).
regional basis, yielding our estimate of the ILI-related query
fraction Q, in each region. Note that the same set of queries       2. World Health Organization. WHO consultation on priority
was selected for each region.                                          public health interventions before and during an influenza
                                                                       pandemic. http://www.who.int/csr/disease/avian_
Fitting and validating a final model. We fit one final univariate      influenza/consultation/en/ (2004).
model, used for making estimates in any region or state
based on the ILI-related query fraction from that region            3. Ferguson, N. M. et al. Strategies for containing an emerging
or state. We regressed over 1152 points, combining all 128             influenza pandemic in Southeast Asia. Nature 437,
training points used in the query selection process from each          209–214 (2005).
of the nine regions. We validated the accuracy of this final
model by measuring its performance on 42 additional weeks           4. Longini, I. M. et al. Containing pandemic influenza at the
of previously untested data in each region, from the most              source. Science 309, 1083–1087 (2005).
recently available time period (March 18, 2007 through May 11,
2008). These 42 points represent approximately 25% of the           5. Espino, J., Hogan, W. & Wagner, M. Telephone triage:
total data available for the project, the first 75% of which was       A timely data source for surveillance of influenza-like
used for query selection and model fitting.                            diseases. AMIA: Annual Symposium Proceedings
                                                                       215–219 (2003).
State-level model validation. To evaluate the accuracy of
state-level ILI estimates generated using our final model, we       6. Magruder, S. Evaluation of over-the-counter
compared our estimates against weekly ILI percentages                  pharmaceutical sales as a possible early warning indicator
provided by the state of Utah. Because the model was fit using         of human disease. Johns Hopkins University APL Technical
regional data through March 11, 2007, we validated our Utah            Digest 24, 349–353 (2003).
ILI estimates using 42 weeks of previously untested data,
from the most recently available time period (March 18, 2007        7. Fox, S. Online Health Search 2006. Pew Internet &
through May 11, 2008).                                                 American Life Project (2006).

                                                                    8. Hulth, A., Rydevik, G. & Linde, A. Web Queries as a Source
Acknowledgements. We thank Lyn Finelli at the CDC                      for Syndromic Surveillance. PLoS ONE 4(2): e4378.
Influenza Division for her ongoing support and comments                doi:10.1371/journal.pone.0004378 (2009).
on this manuscript. We are grateful to Dr. Robert Rolfs and
Lisa Wyman at the Utah Department of Health and Monica              9. Johnson, H. et al. Analysis of Web access logs for
Patton at the CDC Influenza Division for providing ILI data. We        surveillance of influenza. MEDINFO 1202–1206 (2004).
thank Vikram Sahai for his contributions to data collection
and processing, and Craig Nevill-Manning, Alex Roetter, and         10. Eysenbach, G. Infodemiology: tracking flu-related searches
Kataneh Sarvian from Google for their support and comments              on the web for syndromic surveillance. AMIA: Annual
on this manuscript.                                                     Symposium Proceedings 244–248 (2006).

Author contributions. J.G. and M.H.M. conceived, designed,          11. Polgreen, P. M., Chen, Y., Pennock, D. M. & Forrest, N. D.
and implemented the system. J.G., M.H.M., and R.S.P. analysed           Using internet searches for influenza surveillance. Clinical
the results and wrote the paper. L.B. (CDC) contributed data.           Infectious Diseases 47, 1443–1448 (2008).
All authors edited and commented on the paper.
                                                                    12. http://www.cdc.gov/flu/weekly
Supplementary material. Figures and other supplementary
material is available at http://www.nature.com/nature/journal/      13. David, F. The moments of the z and F distributions.
v457/n7232/suppinfo/nature07634.html                                    Biometrika 36, 394–403 (1949).

                                                                    14. Dean, J. & Ghemawat, S. Mapreduce: Simplified data
                                                                        processing on large clusters. OSDI: Sixth Symposium on
                                                                        Operating System Design and Implementation (2004).

Weitere Àhnliche Inhalte

Was ist angesagt?

EHR- 2016 Eeshika Mitra
EHR- 2016 Eeshika MitraEHR- 2016 Eeshika Mitra
EHR- 2016 Eeshika MitraEeshika Mitra
 
HHC Integration of HIV Testing Within Routine Care
HHC Integration of HIV Testing Within Routine CareHHC Integration of HIV Testing Within Routine Care
HHC Integration of HIV Testing Within Routine CareAquilino Gabor
 
yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009Yukiko Yoneoka
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Jake Chen
 
Kenya Annual Malaria Report_2013_2014_final
Kenya Annual Malaria Report_2013_2014_finalKenya Annual Malaria Report_2013_2014_final
Kenya Annual Malaria Report_2013_2014_finalJames Gathecha Wangai
 
The Role of Information Communication Technology & Geoinformatics in Vector C...
The Role of Information Communication Technology & Geoinformatics in Vector C...The Role of Information Communication Technology & Geoinformatics in Vector C...
The Role of Information Communication Technology & Geoinformatics in Vector C...ANUMBA JOSEPH UCHE
 
Covid 19 surveillance-roadmap_final
Covid 19 surveillance-roadmap_finalCovid 19 surveillance-roadmap_final
Covid 19 surveillance-roadmap_finalMumbaikar Le
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataVasileios Lampos
 
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...SystemOne
 
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...ANUMBA JOSEPH UCHE
 
Rapport de la Covid-19 Task Force
Rapport de la Covid-19 Task ForceRapport de la Covid-19 Task Force
Rapport de la Covid-19 Task ForcePaperjam_redaction
 
Real-time Surveillance and Response for Malaria Elimination
Real-time Surveillance and Response for Malaria EliminationReal-time Surveillance and Response for Malaria Elimination
Real-time Surveillance and Response for Malaria EliminationRTI International
 

Was ist angesagt? (12)

EHR- 2016 Eeshika Mitra
EHR- 2016 Eeshika MitraEHR- 2016 Eeshika Mitra
EHR- 2016 Eeshika Mitra
 
HHC Integration of HIV Testing Within Routine Care
HHC Integration of HIV Testing Within Routine CareHHC Integration of HIV Testing Within Routine Care
HHC Integration of HIV Testing Within Routine Care
 
yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
 
Kenya Annual Malaria Report_2013_2014_final
Kenya Annual Malaria Report_2013_2014_finalKenya Annual Malaria Report_2013_2014_final
Kenya Annual Malaria Report_2013_2014_final
 
The Role of Information Communication Technology & Geoinformatics in Vector C...
The Role of Information Communication Technology & Geoinformatics in Vector C...The Role of Information Communication Technology & Geoinformatics in Vector C...
The Role of Information Communication Technology & Geoinformatics in Vector C...
 
Covid 19 surveillance-roadmap_final
Covid 19 surveillance-roadmap_finalCovid 19 surveillance-roadmap_final
Covid 19 surveillance-roadmap_final
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual data
 
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
 
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...
Integrating GIS and Remote Sensing Technology In Contact Tracing Of Ebola Vir...
 
Rapport de la Covid-19 Task Force
Rapport de la Covid-19 Task ForceRapport de la Covid-19 Task Force
Rapport de la Covid-19 Task Force
 
Real-time Surveillance and Response for Malaria Elimination
Real-time Surveillance and Response for Malaria EliminationReal-time Surveillance and Response for Malaria Elimination
Real-time Surveillance and Response for Malaria Elimination
 

Ähnlich wie Detecting influenza epidemics using search engine query data

Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...
Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...
Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...Gunther Eysenbach
 
Eysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceEysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceGunther Eysenbach
 
Running head UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docx
Running head  UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docxRunning head  UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docx
Running head UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docxjoellemurphey
 
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...Lisa Muthukumar
 
Poster presentation: Stigma Index Zambia
Poster presentation: Stigma Index ZambiaPoster presentation: Stigma Index Zambia
Poster presentation: Stigma Index Zambiagnpplus
 
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...Healthcare consultant
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Muhammad Habibi
 
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdf
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdfChapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdf
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdfaonetelecompune
 
Practicum Presentation
Practicum PresentationPracticum Presentation
Practicum Presentationguest231e1f
 
Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlateBitsytask
 
Are we ready for disruption in Translational Research through Digital Medicine?
Are we ready for disruption in Translational Research through Digital Medicine?Are we ready for disruption in Translational Research through Digital Medicine?
Are we ready for disruption in Translational Research through Digital Medicine?Ashish Atreja, MD, MPH
 
HIV/Aids Surveillance Systems: Are They Implemented Effectively?
HIV/Aids Surveillance Systems: Are They Implemented Effectively? HIV/Aids Surveillance Systems: Are They Implemented Effectively?
HIV/Aids Surveillance Systems: Are They Implemented Effectively? Xiaoming Zeng
 
SURVEILLANCE OF HEALTH EVENT
SURVEILLANCE OF HEALTH EVENTSURVEILLANCE OF HEALTH EVENT
SURVEILLANCE OF HEALTH EVENTAneesa K Ayoob
 
HIV Tracking System in Forsyth County, NC
HIV Tracking System in Forsyth County, NCHIV Tracking System in Forsyth County, NC
HIV Tracking System in Forsyth County, NCwebbmother
 
Central Line-associated Bloodstream Infections.Walden Universi
Central Line-associated Bloodstream Infections.Walden UniversiCentral Line-associated Bloodstream Infections.Walden Universi
Central Line-associated Bloodstream Infections.Walden UniversiMaximaSheffield592
 
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSIS
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSISNEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSIS
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSISijcsitcejournal
 
Neuro-Fuzzy Approach for Diagnosing and Control of Tuberculosi
Neuro-Fuzzy Approach for Diagnosing and Control of TuberculosiNeuro-Fuzzy Approach for Diagnosing and Control of Tuberculosi
Neuro-Fuzzy Approach for Diagnosing and Control of Tuberculosirinzindorjej
 

Ähnlich wie Detecting influenza epidemics using search engine query data (20)

Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...
Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...
Infodemiology, Infoveillance, Twitter- and Google-based Surveillance: The Inf...
 
Health IT Project
Health IT Project Health IT Project
Health IT Project
 
Eysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceEysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and Infoveillance
 
Using Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu OutbreakUsing Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu Outbreak
 
Introduction to-surveillance ŰŻ ۭۧŰȘم Ű§Ù„ŰšÙŠŰ·Ű§Ű±
Introduction to-surveillance ŰŻ ۭۧŰȘم Ű§Ù„ŰšÙŠŰ·Ű§Ű±Introduction to-surveillance ŰŻ ۭۧŰȘم Ű§Ù„ŰšÙŠŰ·Ű§Ű±
Introduction to-surveillance ŰŻ ۭۧŰȘم Ű§Ù„ŰšÙŠŰ·Ű§Ű±
 
Running head UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docx
Running head  UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docxRunning head  UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docx
Running head UNIT 8 PROJECT1UNIT 8 PROJECT2Unit 8 Proj.docx
 
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...
Automated Laboratory Reporting of Infectious Diseases in a Climate of Bioterr...
 
Poster presentation: Stigma Index Zambia
Poster presentation: Stigma Index ZambiaPoster presentation: Stigma Index Zambia
Poster presentation: Stigma Index Zambia
 
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...
Social media for tracking disease outbreaks–way of the future By.Dr.Mahboob a...
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
 
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdf
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdfChapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdf
Chapter 7 Discussion- Hundreds of hospitals- clinics- and health depar.pdf
 
Practicum Presentation
Practicum PresentationPracticum Presentation
Practicum Presentation
 
Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlate
 
Are we ready for disruption in Translational Research through Digital Medicine?
Are we ready for disruption in Translational Research through Digital Medicine?Are we ready for disruption in Translational Research through Digital Medicine?
Are we ready for disruption in Translational Research through Digital Medicine?
 
HIV/Aids Surveillance Systems: Are They Implemented Effectively?
HIV/Aids Surveillance Systems: Are They Implemented Effectively? HIV/Aids Surveillance Systems: Are They Implemented Effectively?
HIV/Aids Surveillance Systems: Are They Implemented Effectively?
 
SURVEILLANCE OF HEALTH EVENT
SURVEILLANCE OF HEALTH EVENTSURVEILLANCE OF HEALTH EVENT
SURVEILLANCE OF HEALTH EVENT
 
HIV Tracking System in Forsyth County, NC
HIV Tracking System in Forsyth County, NCHIV Tracking System in Forsyth County, NC
HIV Tracking System in Forsyth County, NC
 
Central Line-associated Bloodstream Infections.Walden Universi
Central Line-associated Bloodstream Infections.Walden UniversiCentral Line-associated Bloodstream Infections.Walden Universi
Central Line-associated Bloodstream Infections.Walden Universi
 
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSIS
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSISNEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSIS
NEURO-FUZZY APPROACH FOR DIAGNOSING AND CONTROL OF TUBERCULOSIS
 
Neuro-Fuzzy Approach for Diagnosing and Control of Tuberculosi
Neuro-Fuzzy Approach for Diagnosing and Control of TuberculosiNeuro-Fuzzy Approach for Diagnosing and Control of Tuberculosi
Neuro-Fuzzy Approach for Diagnosing and Control of Tuberculosi
 

Mehr von benj_2

Brevets et innovation contre le cancer -
Brevets et innovation contre le cancer -Brevets et innovation contre le cancer -
Brevets et innovation contre le cancer -benj_2
 
Etude Biomédicaments et Bioproduction 2023 en France et en Europe
Etude Biomédicaments et Bioproduction 2023 en France et en EuropeEtude Biomédicaments et Bioproduction 2023 en France et en Europe
Etude Biomédicaments et Bioproduction 2023 en France et en Europebenj_2
 
Résultats de la seconde vague du baromÚtre de la santé connectée 2024
Résultats de la seconde vague du baromÚtre de la santé connectée 2024Résultats de la seconde vague du baromÚtre de la santé connectée 2024
Résultats de la seconde vague du baromÚtre de la santé connectée 2024benj_2
 
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)benj_2
 
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©benj_2
 
Les français et le numérique en santé
Les français et  le numérique en santéLes français et  le numérique en santé
Les français et le numérique en santébenj_2
 
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en France
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en FrancePanorama 2023 de la filiÚre industrielle des dispositifs médicaux en France
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en Francebenj_2
 
Plateforme vaccins LEEM 2024
Plateforme vaccins LEEM 2024Plateforme vaccins LEEM 2024
Plateforme vaccins LEEM 2024benj_2
 
Talents de la e-santé  : 9 laurĂ©ats
Talents de la e-santé  : 9 laurĂ©atsTalents de la e-santé  : 9 laurĂ©ats
Talents de la e-santé  : 9 laurĂ©atsbenj_2
 
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...benj_2
 
Feuille de route décarbonation de l'industrie pharmaceutique
Feuille de route décarbonation de l'industrie pharmaceutiqueFeuille de route décarbonation de l'industrie pharmaceutique
Feuille de route décarbonation de l'industrie pharmaceutiquebenj_2
 
Programme CaRE
Programme CaREProgramme CaRE
Programme CaREbenj_2
 
Feuille de route santé-environnement de l'HAS
Feuille de route santé-environnement de l'HASFeuille de route santé-environnement de l'HAS
Feuille de route santé-environnement de l'HASbenj_2
 
Rapport du sénat sur l'intelligence économique en 2023
Rapport du sénat sur l'intelligence économique en 2023Rapport du sénat sur l'intelligence économique en 2023
Rapport du sénat sur l'intelligence économique en 2023benj_2
 
Guide euro-PCT procédure PCT devant l'OEB
Guide euro-PCT procédure PCT devant l'OEBGuide euro-PCT procédure PCT devant l'OEB
Guide euro-PCT procédure PCT devant l'OEBbenj_2
 
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques benj_2
 
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016benj_2
 
Télémédecine et autres prestations médicales électroniques
Télémédecine et autres prestations médicales électroniquesTélémédecine et autres prestations médicales électroniques
Télémédecine et autres prestations médicales électroniquesbenj_2
 
Charte de la visite médicale
Charte de la visite médicaleCharte de la visite médicale
Charte de la visite médicalebenj_2
 
Présentation de France eHealthTech
Présentation de France eHealthTechPrésentation de France eHealthTech
Présentation de France eHealthTechbenj_2
 

Mehr von benj_2 (20)

Brevets et innovation contre le cancer -
Brevets et innovation contre le cancer -Brevets et innovation contre le cancer -
Brevets et innovation contre le cancer -
 
Etude Biomédicaments et Bioproduction 2023 en France et en Europe
Etude Biomédicaments et Bioproduction 2023 en France et en EuropeEtude Biomédicaments et Bioproduction 2023 en France et en Europe
Etude Biomédicaments et Bioproduction 2023 en France et en Europe
 
Résultats de la seconde vague du baromÚtre de la santé connectée 2024
Résultats de la seconde vague du baromÚtre de la santé connectée 2024Résultats de la seconde vague du baromÚtre de la santé connectée 2024
Résultats de la seconde vague du baromÚtre de la santé connectée 2024
 
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)
Pour une AutoritĂ© française de l’Intelligence Artificielle (IA)
 
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©
3Ăšme Ă©dition de l’Observatoire du transfert de technologie en SantĂ©
 
Les français et le numérique en santé
Les français et  le numérique en santéLes français et  le numérique en santé
Les français et le numérique en santé
 
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en France
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en FrancePanorama 2023 de la filiÚre industrielle des dispositifs médicaux en France
Panorama 2023 de la filiÚre industrielle des dispositifs médicaux en France
 
Plateforme vaccins LEEM 2024
Plateforme vaccins LEEM 2024Plateforme vaccins LEEM 2024
Plateforme vaccins LEEM 2024
 
Talents de la e-santé  : 9 laurĂ©ats
Talents de la e-santé  : 9 laurĂ©atsTalents de la e-santé  : 9 laurĂ©ats
Talents de la e-santé  : 9 laurĂ©ats
 
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...
Décret n° 2023-1222 du 20 décembre 2023 relatif à la prescription électroniqu...
 
Feuille de route décarbonation de l'industrie pharmaceutique
Feuille de route décarbonation de l'industrie pharmaceutiqueFeuille de route décarbonation de l'industrie pharmaceutique
Feuille de route décarbonation de l'industrie pharmaceutique
 
Programme CaRE
Programme CaREProgramme CaRE
Programme CaRE
 
Feuille de route santé-environnement de l'HAS
Feuille de route santé-environnement de l'HASFeuille de route santé-environnement de l'HAS
Feuille de route santé-environnement de l'HAS
 
Rapport du sénat sur l'intelligence économique en 2023
Rapport du sénat sur l'intelligence économique en 2023Rapport du sénat sur l'intelligence économique en 2023
Rapport du sénat sur l'intelligence économique en 2023
 
Guide euro-PCT procédure PCT devant l'OEB
Guide euro-PCT procédure PCT devant l'OEBGuide euro-PCT procédure PCT devant l'OEB
Guide euro-PCT procédure PCT devant l'OEB
 
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques
Resultat Etude E-santé DREES sur l'utilisation des principaux outils numériques
 
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016
INPI : Le palmarÚs 2016 des déposants de brevets en France en 2016
 
Télémédecine et autres prestations médicales électroniques
Télémédecine et autres prestations médicales électroniquesTélémédecine et autres prestations médicales électroniques
Télémédecine et autres prestations médicales électroniques
 
Charte de la visite médicale
Charte de la visite médicaleCharte de la visite médicale
Charte de la visite médicale
 
Présentation de France eHealthTech
Présentation de France eHealthTechPrésentation de France eHealthTech
Présentation de France eHealthTech
 

KĂŒrzlich hochgeladen

Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Dipal Arora
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...tanya dube
 
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora
 
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...aartirawatdelhi
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls DelhiAlinaDevecerski
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 

KĂŒrzlich hochgeladen (20)

Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 â‚č5000 To 25K With AC Room 💚😋
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls ParganasđŸ©±7001035870đŸ©±Independent Girl ( Ac Rooms Avai...
 
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀ night ...
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 â˜Ș 24/7 Call Girls Delhi
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 

Detecting influenza epidemics using search engine query data

  • 1. Detecting influenza epidemics using search engine query data 1 Detecting influenza epidemics using search engine query data Jeremy Ginsberg1, Matthew H. Mohebbi1, Rajan S. Patel1, Lynnette Brammer2, Mark S. Smolinski1 & Larry Brilliant1 Google Inc. 2Centers for Disease Control and Prevention 1 Epidemics of seasonal influenza are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year1. In addition to seasonal influenza, a new strain of influenza virus against which no prior immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities2. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza3,4. One way to improve early detection is to monitor health-seeking behavior in the form of online web search queries, which are submitted by millions of users around the world each day. Here we present a method of analyzing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to utilize search queries to detect influenza epidemics in areas with a large population of web search users. This paper was originally published in Nature Vol 457, 19 February 2009, doi:10.1038/nature07634 http://dx.doi.org/10.1038/nature07634
  • 2. Detecting influenza epidemics using search engine query data 2 Traditional surveillance systems, including those employed by ÎČ1 is the multiplicative coefficient, and Δ is the error term. the U.S. Centers for Disease Control and Prevention (CDC) and logit(P) is the natural log of P/(1-P). the European Influenza Surveillance Scheme (EISS), rely on both virologic and clinical data, including influenza-like illness Publicly available historical data from the CDC’s U.S. Influenza (ILI) physician visits. CDC publishes national and regional data Sentinel Provider Surveillance Network12 was used to help from these surveillance systems on a weekly basis, typically build our models. For each of the nine surveillance regions of with a 1-2 week reporting lag. the United States, CDC reported the average percentage of all outpatient visits to sentinel providers that were ILI-related In an attempt to provide faster detection, innovative on a weekly basis. No data were provided for weeks outside surveillance systems have been created to monitor indirect of the annual influenza season, and we excluded such dates signals of influenza activity, such as call volume to telephone from model fitting, though our model was used to generate triage advice lines5 and over-the-counter drug sales6. About unvalidated ILI estimates for these weeks. 90 million American adults are believed to search online for information about specific diseases or medical problems each We designed an automated method of selecting ILI-related year7, making web search queries a uniquely valuable source search queries, requiring no prior knowledge about influenza. of information about health trends. Previous attempts at using We measured how effectively our model would fit the CDC online activity for influenza surveillance have counted search ILI data in each region if we used only a single query as the queries submitted to a Swedish medical website8, visitors to explanatory variable Q. Each of the 50 million candidate certain pages on a U.S. health website9, and user clicks on a queries in our database was separately tested in this manner, search keyword advertisement in Canada10. A set of Yahoo to identify the search queries which could most accurately search queries containing the words “flu” or “influenza” were model the CDC ILI visit percentage in each region. Our found to correlate with virologic and mortality surveillance approach rewarded queries which exhibited regional variations data over multiple years11. similar to the regional variations in CDC ILI data: the chance Our proposed system builds on these earlier works by utilizing that a random search query can fit the ILI percentage in all an automated method of discovering influenza-related search nine regions is considerably less than the chance that a queries. By processing hundreds of billions of individual random search query can fit a single location (Supplementary searches from five years of Google web search logs, our Figure 2). system generates more comprehensive models for use in influenza surveillance, with regional and state-level estimates The automated query selection process produced a list of the of influenza-like illness (ILI) activity in the United States. highest scoring search queries, sorted by mean Z-transformed Widespread global usage of online search engines may enable correlation across the nine regions. To decide which queries models to eventually be developed in international settings. would be included in the ILI-related query fraction Q, we considered different sets of N top scoring queries. We By aggregating historical logs of online web search queries measured the performance of these models based on the submitted between 2003 and 2008, we computed time series sum of the queries in each set, and picked N such that we of weekly counts for 50 million of the most common search obtained the best fit against out-of-sample ILI data across the queries in the United States. Separate aggregate weekly nine regions (Figure 1). counts were kept for every query in each state. No information about the identity of any user was retained. Each time series Combining the N=45 highest-scoring queries was found to was normalized by dividing the count for each query in a obtain the best fit. These 45 search queries, though selected particular week by the total number of online search queries submitted in that location during the week, resulting in a query 0.95 fraction (Supplementary Figure 1). 45 queries We sought to develop a simple model which estimates the probability that a random physician visit in a particular region Mean correlation is related to an influenza-like illness (ILI); this is equivalent 0. 9 to the percentage of ILI-related physician visits. A single explanatory variable was used: the probability that a random search query submitted from the same region is ILI-related, as determined by an automated method described below. We fit a linear model using the log-odds of an ILI physician visit and 0.85 the log-odds of an ILI-related search query: 0 10 20 30 40 50 60 70 80 90 100 Number of queries Figure 1: An evaluation of how many top-scoring queries to include in the logit(P) = ÎČ0 + ÎČ1 × logit(Q) + Δ ILI-related query fraction. Maximal performance at estimating out-of-sample points during cross-validation was obtained by summing the top 45 search where P is the percentage of ILI physician visits, Q is queries. A steep drop in model performance occurs after adding query 81, the ILI-related query fraction, ÎČ0 is the intercept, which is “oscar nominations”.
  • 3. Detecting influenza epidemics using search engine query data 3 automatically, appeared to be consistently related to influenza- Top 45 Queries Next 55 Queries Search Query Topic N Weighted N Weighted like illnesses. Other search queries in the top 100, not included Influenza Complication 11 18.15 5 3.40 in our model, included topics like “high school basketball” Cold/Flu Remedy 8 5.05 6 5.03 which tend to coincide with influenza season in the United General Influenza Symptoms 5 2.60 1 0.07 Term for Influenza 4 3.74 6 0.30 States (Table 1). Specific Influenza Symptom 4 2.54 6 3.74 Symptoms of an Influenza Complication 4 2.21 2 0.92 Using this ILI-related query fraction as the explanatory Antibiotic Medication 3 6.23 3 3.17 General Influenza Remedies 2 0.18 1 0.32 variable, we fit a final linear model to weekly ILI percentages Symptoms of a Related Disease 2 1.66 2 0.77 between 2003 and 2007 for all nine regions together, thus Antiviral Medication 1 0.39 1 0.74 learning a single, region-independent coefficient. The Related Disease 1 6.66 3 3.77 Unrelated to Influenza 0 0.00 19 28.37 model was able to obtain a good fit with CDC-reported ILI 45 49.40 55 50.60 percentages, with a mean correlation of 0.90 (min=0.80, max=0.96, n=9 regions) (Figure 2). Table 1: Topics found in search queries which were found to be most correlated with CDC ILI data. The top 45 queries were used in our final model; the next 55 queries are presented for comparison purposes. The number of queries in The final model was validated on 42 points per region of each topic is indicated, as well as query volume-weighted counts, reflecting previously untested data from 2007-2008, which were the relative frequency of queries in each topic. excluded from all prior steps. Estimates generated for these 42 points obtained a mean correlation of 0.97 (min=0.92, max=0.99, n=9 regions) with the CDC-observed ILI This system is not designed to be a replacement for traditional percentages. surveillance networks or supplant the need for laboratory- based diagnoses and surveillance. Notable increases in Throughout the 2007-2008 influenza season, we used ILI-related search activity may indicate a need for public preliminary versions of our model to generate ILI estimates, health inquiry to identify the pathogen or pathogens involved. and shared our results each week with the Epidemiology and Demographic data, often provided by traditional surveillance, Prevention Branch of Influenza Division at CDC to evaluate cannot be obtained using search queries. timeliness and accuracy. Figure 3 illustrates data available at different points throughout the season. Across the nine In the event that a pandemic-causing strain of influenza regions, we were able to consistently estimate the current emerges, accurate and early detection of ILI percentages may ILI percentage 1-2 weeks ahead of the publication of reports enable public health officials to mount a more effective early by the CDC’s U.S. Influenza Sentinel Provider Surveillance response. Though we cannot be certain how search engine Network. users will behave in such a scenario, affected individuals may submit the same ILI-related search queries used in our model. Because localized influenza surveillance is particularly useful Alternatively, panic and concern among healthy individuals for public health planning, we sought to further validate our may cause a surge in the ILI-related query fraction and model against weekly ILI percentages for individual states. exaggerated estimates of the ongoing ILI percentage. CDC does not make state-level data publicly available, but we 12 validated our model against state-reported ILI percentages 10 provided by the state of Utah, and obtained a correlation of ILI percentage 8 0.90 across 42 validation points (Supplementary Figure 3). 6 Google web search queries can be used to accurately estimate 4 influenza-like illness percentages in each of the nine public 2 health regions of the United States. Because search queries 0 2004 2005 2006 2007 2008 can be processed quickly, the resulting ILI estimates were Figure 2: A comparison of model estimates for the Mid-Atlantic Region (black) consistently 1-2 weeks ahead of CDC ILI surveillance reports. against CDC-reported ILI percentages (red), including points over which the The early detection provided by this approach may become an model was fit and validated. A correlation of 0.85 was obtained over 128 important line of defense against future influenza epidemics points from this region to which the model was fit, while a correlation of 0.96 was obtained over 42 validation points. 95% prediction intervals are  indicated. in the United States, and perhaps eventually in international settings. The search queries in our model are not, of course, exclusively Up-to-date influenza estimates may enable public health submitted by users who are experiencing influenza-like officials and health professionals to better respond to symptoms, and the correlations we observe are only seasonal epidemics. If a region experiences an early, sharp meaningful across large populations. Despite strong historical increase in ILI physician visits, it may be possible to focus correlations, our system remains susceptible to false alerts additional resources on that region to identify the etiology of caused by a sudden increase in ILI-related queries. An unusual the outbreak, providing extra vaccine capacity or raising local event, such as a drug recall for a popular cold or flu remedy, media awareness as necessary. could cause such a false alert.
  • 4. Detecting influenza epidemics using search engine query data 4 5 Harnessing the collective intelligence of millions of users, Google web search logs can provide one of the most timely, 2.5 broad reaching influenza monitoring systems available today. 0 Week 43 Week 47 Week 51 Week 3 Week 7 Week 11 Week 15 Week 19 While traditional systems require 1-2 weeks to gather and Data available as of February 4, 2008 process surveillance data, our estimates are current each day. As with other syndromic surveillance systems, the data 5 are most useful as a means to spur further investigation and 2.5 collection of direct measures of disease activity. 0 Week 43 Week 47 Week 51 Week 3 Week 7 Week 11 Week 15 Week 19 Data available as of March 3, 2008 This system will be used to track the spread of influenza- like illness throughout the 2008-2009 influenza season 5 in the United States. Results are freely available online at 2.5 http://www.google.org/flutrends. 0 Week 43 Week 47 Week 51 Week 3 Week 7 Week 11 Week 15 Week 19 Data available as of March 31, 2008 Methods 5 ILI percentage Privacy. At Google, we recognize that privacy is important. 2.5 None of the queries in our project’s database can be 0 Week 43 Week 47 Week 51 Week 3 Week 7 Week 11 Week 15 Week 19 associated with a particular individual. Our project’s database Data available as of May 12, 2008 retains no information about the identity, IP address, or Figure 3: ILI percentages estimated by our model (black) and provided by specific physical location of any user. Furthermore, any CDC (red) in the Mid-Atlantic region, showing data available at four points original web search logs older than 9 months are being in the 2007-2008 influenza season. During week 5, we detected a sharply anonymized in accordance with Google’s Privacy Policy increasing ILI percentage in the Mid-Atlantic region; similarly, on March 3, our (http://www.google.com/privacypolicy.html). model indicated that the peak ILI percentage had been reached during week 8, with sharp declines in weeks 9 and 10. Both results were later confirmed by CDC ILI data. Search query database. For the purposes of our database, a search query is a complete, exact sequence of terms issued by a Google search user; we don’t combine linguistic variations, lags were considered, but ultimately not used in our modeling synonyms, cross-language translations, misspellings, or process. subsequences, though we hope to explore these options in future work. For example, we tallied the search query Each candidate search query was evaluated nine times, once “indications of flu” separately from the search queries “flu per region, using the search data originating from a particular indications” and “indications of the flu”. region to explain the ILI percentage in that region. With four cross-validation folds per region, we obtained 36 different Our database of queries contains 50 million of the most correlations between the candidate model’s estimates and common search queries on all possible topics, without pre- the observed ILI percentages. To combine these into a single filtering. Billions of queries occurred infrequently and were measure of the candidate query’s performance, we applied excluded. Using the internet protocol (IP) address associated the Fisher Z-transformation13 to each correlation, and took the with each search query, the general physical location from mean of the 36 Z-transformed correlations. which the query originated can often be identified, including the nearest major city if within the United States. Computation and pre-filtering. In total, we fit 450 million different models to test each of the candidate queries. We Model data. In the query selection process, we fit per-query used a distributed computing framework14 to efficiently models using all weeks between September 28, 2003 and divide the work among hundreds of machines. The amount March 11, 2007 (inclusive) for which CDC reported a non-zero of computation required could have been reduced by making ILI percentage, yielding 128 training points for each region assumptions about which queries might be correlated with (each week is one data point). 42 additional weeks of data ILI. For example, we could have attempted to eliminate (March 18, 2007 through May 11, 2008) were reserved for final non-influenza-related queries before fitting any models. validation. Search query data before 2003 was not available However, we were concerned that aggressive filtering might for this project. accidentally eliminate valuable data. Furthermore, if the highest-scoring queries seemed entirely unrelated to influenza, Automated query selection process. Using linear regression it would provide evidence that our query selection approach with 4-fold cross validation, we fit models to four 96-point was invalid. subsets of the 128 points in each region. Each per-query model was validated by measuring the correlation between Constructing the ILI-related query fraction. We concluded the model’s estimates for the 32 held-out points and CDC’s the query selection process by choosing to keep the reported regional ILI percentage at those points. Temporal search queries whose models obtained the highest mean
  • 5. Detecting influenza epidemics using search engine query data 5 Z-transformed correlations across regions: these queries were References deemed to be “ILI-related”. 1. World Health Organization. Influenza fact sheet. To combine the selected search queries into a single http://www.who.int/mediacentre/factsheets/2003/fs211/ aggregate variable, we summed the query fractions on a en/ (2003). regional basis, yielding our estimate of the ILI-related query fraction Q, in each region. Note that the same set of queries 2. World Health Organization. WHO consultation on priority was selected for each region. public health interventions before and during an influenza pandemic. http://www.who.int/csr/disease/avian_ Fitting and validating a final model. We fit one final univariate influenza/consultation/en/ (2004). model, used for making estimates in any region or state based on the ILI-related query fraction from that region 3. Ferguson, N. M. et al. Strategies for containing an emerging or state. We regressed over 1152 points, combining all 128 influenza pandemic in Southeast Asia. Nature 437, training points used in the query selection process from each 209–214 (2005). of the nine regions. We validated the accuracy of this final model by measuring its performance on 42 additional weeks 4. Longini, I. M. et al. Containing pandemic influenza at the of previously untested data in each region, from the most source. Science 309, 1083–1087 (2005). recently available time period (March 18, 2007 through May 11, 2008). These 42 points represent approximately 25% of the 5. Espino, J., Hogan, W. & Wagner, M. Telephone triage: total data available for the project, the first 75% of which was A timely data source for surveillance of influenza-like used for query selection and model fitting. diseases. AMIA: Annual Symposium Proceedings 215–219 (2003). State-level model validation. To evaluate the accuracy of state-level ILI estimates generated using our final model, we 6. Magruder, S. Evaluation of over-the-counter compared our estimates against weekly ILI percentages pharmaceutical sales as a possible early warning indicator provided by the state of Utah. Because the model was fit using of human disease. Johns Hopkins University APL Technical regional data through March 11, 2007, we validated our Utah Digest 24, 349–353 (2003). ILI estimates using 42 weeks of previously untested data, from the most recently available time period (March 18, 2007 7. Fox, S. Online Health Search 2006. Pew Internet & through May 11, 2008). American Life Project (2006). 8. Hulth, A., Rydevik, G. & Linde, A. Web Queries as a Source Acknowledgements. We thank Lyn Finelli at the CDC for Syndromic Surveillance. PLoS ONE 4(2): e4378. Influenza Division for her ongoing support and comments doi:10.1371/journal.pone.0004378 (2009). on this manuscript. We are grateful to Dr. Robert Rolfs and Lisa Wyman at the Utah Department of Health and Monica 9. Johnson, H. et al. Analysis of Web access logs for Patton at the CDC Influenza Division for providing ILI data. We surveillance of influenza. MEDINFO 1202–1206 (2004). thank Vikram Sahai for his contributions to data collection and processing, and Craig Nevill-Manning, Alex Roetter, and 10. Eysenbach, G. Infodemiology: tracking flu-related searches Kataneh Sarvian from Google for their support and comments on the web for syndromic surveillance. AMIA: Annual on this manuscript. Symposium Proceedings 244–248 (2006). Author contributions. J.G. and M.H.M. conceived, designed, 11. Polgreen, P. M., Chen, Y., Pennock, D. M. & Forrest, N. D. and implemented the system. J.G., M.H.M., and R.S.P. analysed Using internet searches for influenza surveillance. Clinical the results and wrote the paper. L.B. (CDC) contributed data. Infectious Diseases 47, 1443–1448 (2008). All authors edited and commented on the paper. 12. http://www.cdc.gov/flu/weekly Supplementary material. Figures and other supplementary material is available at http://www.nature.com/nature/journal/ 13. David, F. The moments of the z and F distributions. v457/n7232/suppinfo/nature07634.html Biometrika 36, 394–403 (1949). 14. Dean, J. & Ghemawat, S. Mapreduce: Simplified data processing on large clusters. OSDI: Sixth Symposium on Operating System Design and Implementation (2004).