Environmental air pollution studies fail to consider the fact that air pollution is a spatio-temporal problem.
The volume and complexity of the data have created the need to explore various machine learning models,
however, those models have advantages and disadvantages when applied to regional air pollution analysis,
furthermore, most environmental problems are global distribution problems. This research addressed
spatio-temporal problem using decentralized computational technique named Online Scalable SVM
Ensemble Learning Method (OSSELM). Evaluation criteria for computational air pollution analysis
includes: accuracy, real time & prediction, spatio-temporal and decentralised analysis, we assert that these
criteria can be improved using the proposed OSSELM. Special consideration is given to distributed
ensemble to resolve spatio-temporal data collection problem (i.e. the data collected from multiple
monitoring stations dispersed over a geographical location). Moreover, the experimental results
demonstrated that the proposed OSSELM produced impressive results compare to SVM ensemble for air
pollution analysis in Auckland region.
This document presents an overview of air pollution monitoring using remote sensing and GIS technologies. It discusses how satellite remote sensing can provide synoptic views of large areas and monitor multiple pollutants simultaneously. It also describes some common air pollutants and sources. Two case studies are then presented on using these methods to map ambient air pollution zones and monitor air quality in specific regions.
Effects of Wind Direction on VOC Concentrations in Southeast KansasSergio A. Guerra
Twenty-four-hour whole-air samples were collected in evacuated stainless steel canisters and analyzed for volatile organic compounds (VOC) at selected sites in southeast Kansas from March 1999 to October 2000. The purpose was to assess the influence on air quality of four industrial facilities that burn hazardous waste located in the communities of Coffeyville, Chanute, Independence and Fredonia. Fifteen of the VOC analytes were found at concentrations above the detection limit and above levels observed in the blanks. Data were analyzed to investigate whether sampling site and date had a significant effect on VOC concentration. Results indicate that site and/or date were significant factors for many of the VOCs. To further investigate the temporal factor, sampling days were divided into four classifications based on wind direction: predominantly north winds, predominantly south winds, calm/variable winds and
other winds. Results from statistical analyses show that wind direction was a significant factor for benzene, toluene, o-xylene, naphthalene, and carbon tetrachloride. Data from upwind and downwind samples were analyzed for the four cities of interest in the study area, to investigate the effect of the four targeted sources on VOC concentrations. Results from Fredonia showed higher concentrations of toluene, ethyl benzene, styrene, methyl chloride, and trichloroethylene in the upwind samples, although none of the results were statistically significant. Chanute also showed higher concentrations of the same compounds and m,p-xylene in the upwind samples; results were significant at the 0.05 level for toluene, ethylbenzene, and xylene. These results indicate that sources other than those targeted in the sampling network may be contributing to
the VOC levels. Results from Independence showed higher concentrations of ethylebenzene and styrene in the downwind samples; results were statistically significant. These results indicate that the source targeted in the sampling network may be contributing to the VOC levels at those sampling sites.
An Analytical Survey on Prediction of Air Quality Indexijtsrd
A drastic increase of modernization gives birth to many industries and automobiles, which intern becomes the very common reason for the environmental issues like Air and water pollutions. Air pollution is the immediate affecting factors in our life, which contaminates the air that we breathe to cause serious health hazards. So it is very important to predict the Air quality index for the future coming days so that proper prompt action can be taken by the concern authorities to curb the same. The air quality reading for the different gases can be collect through the physical sensors and these readings can be used to predict the future Air quality index. Machine learning is acting as the catalyst in this prediction scenario to predict the accurate Air quality index for the future instance. Most of the learning systems need a huge amount of the data for the learning purpose and it is not possible to provide this every time. So it is a need to predict the air quality index by using considerable less amount of past instance data, This paper mainly concentrates on analyzing the past work in prediction of air quality index using machine learning and try to evaluate their flaws and to estimate the new possible way of prediction using machine learning. Suraj Kapse | Akshay Kurumkar | Vighnesh Manthapurvar | Prof. Rajesh Tak "An Analytical Survey on Prediction of Air Quality Index" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28072.pdf Paper URL: https://www.ijtsrd.com/engineering/information-technology/28072/an-analytical-survey-on-prediction-of-air-quality-index/suraj-kapse
Conference on the Environment- GUERRA presentation Nov 19, 2014Sergio A. Guerra
This document discusses innovative dispersion modeling practices to achieve reasonable conservatism in regulatory modeling demonstrations. It presents a case study evaluating the Emissions and Meteorological Variability Processor (EMVAP) and approaches to establish background concentrations. The case study models SO2 concentrations from a power plant using 1) constant emissions, 2) variable emissions, and 3) EMVAP. EMVAP provides more realistic concentrations while accounting for emission variability. Using the 50th percentile monitored background concentration when combining with modeled values provides statistical conservatism compared to using high percentile values.
Effects of Wind Direction on Trace Metal Concentration in Southeast KansasSergio A. Guerra
This study investigated the effects of wind direction on trace metal concentrations in particulate matter in southeast Kansas. Air quality monitoring was conducted from March to October 2000 at nine sites. Concentrations of six metals (beryllium, chromium, arsenic, cadmium, barium, and lead) were analyzed. The results showed that wind direction significantly impacted chromium, barium, and lead concentrations, with calm/variable winds associated with higher chromium and lead levels and north winds linked to higher barium. However, upwind/downwind comparisons found no significant differences near targeted industrial sources. Mean levels of chromium, arsenic, and cadmium exceeded EPA risk levels at some sites.
Using low cost particle sensors for characterisation of urban air pollution: ...IES / IAQM
A presentation from RTCA17, held on 24th-25th October 2017.
The production of low-cost air quality sensors is a fast growing field, which is offering exciting possibilities for the expert and non-expert alike. Their low cost brings the technology into the financial reach of non-professional communities, for example, schools and interested/concerned individuals. For the research and consultant communities, they bring the possibility of high-density spatial mapping. To be useful, these devices need quality assurance and quality checking (QA/QC) to be undertaken under environmental conditions relevant to their location of deployment. In this talk, Francis will discuss findings from field campaigns conducted in Birmingham, UK and Nairobi, Kenya using the Alphasense OPC-N2 optical particle counter for measuring PM10 and PM2.5. Overall, the OPC-N2 devices were found to measure accurately ambient airborne particle mass concentration provided they were correctly calibrated. Future applications and directions will be discussed.
What does the future hold for low cost air pollution sensors? - Dr Pete EdwardsIES / IAQM
- Low-cost air pollution sensors have the potential to revolutionize air pollution measurement by enabling more widespread monitoring. However, several challenges must be addressed including sensor-to-sensor variability, complex interference from other pollutants, and factory calibrations not being applicable to real-world conditions.
- New calibration methods using machine learning algorithms that account for multivariate responses show promise in addressing these challenges. One study used support vector regression and random forest to calibrate NO and NO2 sensors, achieving accurate results with errors below 5 ppb after deployment in urban areas.
- For low-cost sensors to provide reliable data, calibration and evaluation methods must ensure data quality is sufficient for the intended application. Significant progress has been made
Use of Probabilistic Statistical Techniques in AERMOD Modeling EvaluationsSergio A. Guerra
The advent of the short term National Ambient Air Quality Standards (NAAQS) prompted modelers to reassess the common practices in dispersion modeling analyses. The probabilistic nature of the new short term standards also opens the door to alternative modeling techniques that are based on probability. One of these is the Monte Carlo technique that can be used to account for emission variability in permit modeling.
Currently, it is assumed that a given emission unit is in operation at its maximum capacity every hour of the year. This assumption may be appropriate for facilities that operate at full capacity most of the time. However, in most cases, emission units operate at variable loads that produce variable emissions. Thus, assuming constant maximum emissions is overly conservative for facilities such as power plants that are not in operation all the time and which exhibit high concentrations during very short periods of time.
Another element of conservatism in NAAQS demonstrations relates to combining predicted concentrations from the AMS/EPA Regulatory Model (AERMOD) with observed (monitored) background concentrations. Normally, some of the highest monitored observations are added to the AERMOD results yielding a very conservative combined concentration.
A case study is presented to evaluate the use of alternative probabilistic methods to complement the shortcomings of current dispersion modeling practices. This case study includes the use of the Monte Carlo technique and the use of a reasonable background concentration to combine with the AERMOD predicted concentrations. The use of these methods is in harmony with the probabilistic nature of the NAAQS and can help demonstrate compliance through dispersion modeling analyses, while still being protective of the NAAQS.
This document presents an overview of air pollution monitoring using remote sensing and GIS technologies. It discusses how satellite remote sensing can provide synoptic views of large areas and monitor multiple pollutants simultaneously. It also describes some common air pollutants and sources. Two case studies are then presented on using these methods to map ambient air pollution zones and monitor air quality in specific regions.
Effects of Wind Direction on VOC Concentrations in Southeast KansasSergio A. Guerra
Twenty-four-hour whole-air samples were collected in evacuated stainless steel canisters and analyzed for volatile organic compounds (VOC) at selected sites in southeast Kansas from March 1999 to October 2000. The purpose was to assess the influence on air quality of four industrial facilities that burn hazardous waste located in the communities of Coffeyville, Chanute, Independence and Fredonia. Fifteen of the VOC analytes were found at concentrations above the detection limit and above levels observed in the blanks. Data were analyzed to investigate whether sampling site and date had a significant effect on VOC concentration. Results indicate that site and/or date were significant factors for many of the VOCs. To further investigate the temporal factor, sampling days were divided into four classifications based on wind direction: predominantly north winds, predominantly south winds, calm/variable winds and
other winds. Results from statistical analyses show that wind direction was a significant factor for benzene, toluene, o-xylene, naphthalene, and carbon tetrachloride. Data from upwind and downwind samples were analyzed for the four cities of interest in the study area, to investigate the effect of the four targeted sources on VOC concentrations. Results from Fredonia showed higher concentrations of toluene, ethyl benzene, styrene, methyl chloride, and trichloroethylene in the upwind samples, although none of the results were statistically significant. Chanute also showed higher concentrations of the same compounds and m,p-xylene in the upwind samples; results were significant at the 0.05 level for toluene, ethylbenzene, and xylene. These results indicate that sources other than those targeted in the sampling network may be contributing to
the VOC levels. Results from Independence showed higher concentrations of ethylebenzene and styrene in the downwind samples; results were statistically significant. These results indicate that the source targeted in the sampling network may be contributing to the VOC levels at those sampling sites.
An Analytical Survey on Prediction of Air Quality Indexijtsrd
A drastic increase of modernization gives birth to many industries and automobiles, which intern becomes the very common reason for the environmental issues like Air and water pollutions. Air pollution is the immediate affecting factors in our life, which contaminates the air that we breathe to cause serious health hazards. So it is very important to predict the Air quality index for the future coming days so that proper prompt action can be taken by the concern authorities to curb the same. The air quality reading for the different gases can be collect through the physical sensors and these readings can be used to predict the future Air quality index. Machine learning is acting as the catalyst in this prediction scenario to predict the accurate Air quality index for the future instance. Most of the learning systems need a huge amount of the data for the learning purpose and it is not possible to provide this every time. So it is a need to predict the air quality index by using considerable less amount of past instance data, This paper mainly concentrates on analyzing the past work in prediction of air quality index using machine learning and try to evaluate their flaws and to estimate the new possible way of prediction using machine learning. Suraj Kapse | Akshay Kurumkar | Vighnesh Manthapurvar | Prof. Rajesh Tak "An Analytical Survey on Prediction of Air Quality Index" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28072.pdf Paper URL: https://www.ijtsrd.com/engineering/information-technology/28072/an-analytical-survey-on-prediction-of-air-quality-index/suraj-kapse
Conference on the Environment- GUERRA presentation Nov 19, 2014Sergio A. Guerra
This document discusses innovative dispersion modeling practices to achieve reasonable conservatism in regulatory modeling demonstrations. It presents a case study evaluating the Emissions and Meteorological Variability Processor (EMVAP) and approaches to establish background concentrations. The case study models SO2 concentrations from a power plant using 1) constant emissions, 2) variable emissions, and 3) EMVAP. EMVAP provides more realistic concentrations while accounting for emission variability. Using the 50th percentile monitored background concentration when combining with modeled values provides statistical conservatism compared to using high percentile values.
Effects of Wind Direction on Trace Metal Concentration in Southeast KansasSergio A. Guerra
This study investigated the effects of wind direction on trace metal concentrations in particulate matter in southeast Kansas. Air quality monitoring was conducted from March to October 2000 at nine sites. Concentrations of six metals (beryllium, chromium, arsenic, cadmium, barium, and lead) were analyzed. The results showed that wind direction significantly impacted chromium, barium, and lead concentrations, with calm/variable winds associated with higher chromium and lead levels and north winds linked to higher barium. However, upwind/downwind comparisons found no significant differences near targeted industrial sources. Mean levels of chromium, arsenic, and cadmium exceeded EPA risk levels at some sites.
Using low cost particle sensors for characterisation of urban air pollution: ...IES / IAQM
A presentation from RTCA17, held on 24th-25th October 2017.
The production of low-cost air quality sensors is a fast growing field, which is offering exciting possibilities for the expert and non-expert alike. Their low cost brings the technology into the financial reach of non-professional communities, for example, schools and interested/concerned individuals. For the research and consultant communities, they bring the possibility of high-density spatial mapping. To be useful, these devices need quality assurance and quality checking (QA/QC) to be undertaken under environmental conditions relevant to their location of deployment. In this talk, Francis will discuss findings from field campaigns conducted in Birmingham, UK and Nairobi, Kenya using the Alphasense OPC-N2 optical particle counter for measuring PM10 and PM2.5. Overall, the OPC-N2 devices were found to measure accurately ambient airborne particle mass concentration provided they were correctly calibrated. Future applications and directions will be discussed.
What does the future hold for low cost air pollution sensors? - Dr Pete EdwardsIES / IAQM
- Low-cost air pollution sensors have the potential to revolutionize air pollution measurement by enabling more widespread monitoring. However, several challenges must be addressed including sensor-to-sensor variability, complex interference from other pollutants, and factory calibrations not being applicable to real-world conditions.
- New calibration methods using machine learning algorithms that account for multivariate responses show promise in addressing these challenges. One study used support vector regression and random forest to calibrate NO and NO2 sensors, achieving accurate results with errors below 5 ppb after deployment in urban areas.
- For low-cost sensors to provide reliable data, calibration and evaluation methods must ensure data quality is sufficient for the intended application. Significant progress has been made
Use of Probabilistic Statistical Techniques in AERMOD Modeling EvaluationsSergio A. Guerra
The advent of the short term National Ambient Air Quality Standards (NAAQS) prompted modelers to reassess the common practices in dispersion modeling analyses. The probabilistic nature of the new short term standards also opens the door to alternative modeling techniques that are based on probability. One of these is the Monte Carlo technique that can be used to account for emission variability in permit modeling.
Currently, it is assumed that a given emission unit is in operation at its maximum capacity every hour of the year. This assumption may be appropriate for facilities that operate at full capacity most of the time. However, in most cases, emission units operate at variable loads that produce variable emissions. Thus, assuming constant maximum emissions is overly conservative for facilities such as power plants that are not in operation all the time and which exhibit high concentrations during very short periods of time.
Another element of conservatism in NAAQS demonstrations relates to combining predicted concentrations from the AMS/EPA Regulatory Model (AERMOD) with observed (monitored) background concentrations. Normally, some of the highest monitored observations are added to the AERMOD results yielding a very conservative combined concentration.
A case study is presented to evaluate the use of alternative probabilistic methods to complement the shortcomings of current dispersion modeling practices. This case study includes the use of the Monte Carlo technique and the use of a reasonable background concentration to combine with the AERMOD predicted concentrations. The use of these methods is in harmony with the probabilistic nature of the NAAQS and can help demonstrate compliance through dispersion modeling analyses, while still being protective of the NAAQS.
Advanced Modeling Techniques for Permit Modeling - Turning challenges into o...Sergio A. Guerra
Advance modeling techniques can be used in AERMOD to refine the inputs that are entered in the model to get more accurate results. This presentation covers:
-AERMOD’s Temporal Mismatch Limitation
-Building Downwash Limitations in BPIP/PRIME
-Advanced Modeling Techniques to Overcome these Limitations
Solutions include:
Equivalent Building Dimensions (EBD)
Emission Variability Processor (EMVAP)
Updated ambient ratio method (ARM2)
Pairing AERMOD values with the 50th % background concentrations in cumulative analyses.
Pairing aermod concentrations with the 50th percentile monitored valueSergio A. Guerra
This document proposes a new method for combining modeled concentrations from AERMOD with monitored background concentrations.
The current practice of adding the maximum or 98th percentile monitored concentration is overly conservative. Instead, the document suggests using the 50th percentile (median) monitored concentration.
Pairing the 98th percentile modeled concentration with the 50th percentile monitored concentration results in a combined 99th percentile concentration. This provides a more conservative estimate than the form of the short-term air quality standards, while avoiding the mismatch of temporal pairing in AERMOD and the influence of exceptional events.
The proposed method is presented as a simple, protective approach for demonstrating compliance with air quality standards when considering both modeled and monitored background concentrations.
Innovative Dispersion Modeling Practices to Achieve a Reasonable Level of Con...Sergio A. Guerra
The document discusses innovative modeling practices to achieve reasonable conservatism in AERMOD modeling demonstrations. It presents a case study evaluating three modeling techniques: EMVAP, which assigns random emission rates over iterations; ARM2, which calculates NOx to NO2 conversion based on plume entrapment; and using the 50th percentile monitored background concentration. The case study found lower modeled concentrations using EMVAP and ARM2 compared to current practices, demonstrating these techniques can provide more realistic results while still protecting air quality standards. Pairing the 98th percentile predicted concentration with the 50th percentile monitored background provided a statistically conservative but reasonable level of conservatism.
INNOVATIVE DISPERSION MODELING PRACTICES TO ACHIEVE A REASONABLE LEVEL OF CON...Sergio A. Guerra
Presentation delivered at the Board meeting for the Upper Midwest section of the Air and Waste Management Association meeting on September 16, 2014.
Innovative dispersion modeling techniques are presented including ARM2, EMVAP and the 50th percentile background concentration. Case study involves peaking engines that are used 250 hour per year. These intermittent sources are required to undergo a modeling evaluation in many states. Current modeling techniques grossly overestimate the emissions from these sporadic sources.
This document summarizes the development and deployment of low-cost sensor networks to monitor urban air quality. It outlines the objective to better understand air pollution in megacities using dense sensor arrays. It then describes current research at MIT using nonparametric regression techniques and particle counter modeling to improve data from low-cost sensors. Finally, it provides examples of sensor networks deployed in Hawaii, Boston, and Delhi to measure pollutants like PM, O3, CO, and VOCs.
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim MillsIES / IAQM
Google cars are equipped with sensors that measure multiple air pollutants like NO, NO2, O3 every second as they drive through cities, recording over 10,000 measurements per hour. The data is transmitted in real-time to be analyzed. The AQMesh network in London uses over 100 static sensor pods across the city to monitor air quality at a high temporal resolution. Each pod contains multiple sensors that take readings every minute, resulting in over 2.6 billion readings analyzed per year. This big data is used to inform the public and politicians, encourage behavior change, evaluate policy interventions, and replicate air quality monitoring in other cities.
The document discusses the Industrial Source Complex Short Term (ISCST3) dispersion model. ISCST3 was developed by the EPA to assess air quality impacts from industrial point, area, and volume sources using steady-state Gaussian plume algorithms. The document evaluates ISCST3's performance in simulating short-term (1 hour) and long-term (annual) air pollutant concentrations by comparing predicted and measured NO2, SO2, and particulate levels near an industrial area. While predictions of long-term NO2 and particulate concentrations showed good accuracy, short-term and SO2 predictions were less accurate, possibly due to imprecise emission estimations. Overall, the ISCST3 model provides a useful
Presentation includes information related to gently sloping terrain, AERMINUTE, and EPA formula height.
Presented at the 27th Annual Conference on the Environment on November 13, 2012.
Water spray geoengineering to clean air pollution for mitigating severe haze ...CLEEN_Ltd
CLEEN's MMEA program organised an international seminar on cleaner air - Outdoor and indoor air quality together with Zhejiang University and assistant organizer Insigma group.
This is one of the keynote presentations in the seminar.
More info in www.mmea.fi
The cleantech field is expanding rapidly and Finnish companies are committed to working for a better environment in the fields of energy efficiency, air quality and monitoring. The world-class Cleantech know-how from Finland and the cooperation with Chinese partners and the results were highlighted in the MMEA seminar. Some of the leading Finnish cleantech companies together with Finnish and Chinese research institutions were present at the event. The seminars focused on cooperation between Finland and China concerning indoor and outdoor air quality and solutions to make them better.
Air quality challenges and business opportunities in China: Fusion of environ...CLIC Innovation Ltd
MMEA (The Measurement, Monitoring and Environmental Efficiency Assessment) research program final seminar presentation by Dr. Ari Karppinen, Finnish Meteorological Institute
Evaluation of procedures to improve solar resource assessments presented WREF...Gwendalyn Bender
This document evaluates two methods for improving solar resource assessments by combining long-term satellite data with short-term ground observations: 1) Correcting for bias in satellite-derived irradiance time series using local aerosol optical depth data, and 2) Applying a statistical Model Output Statistics correction using on-site observations. The methods are tested at a site in Israel, with the aerosol optical depth correction significantly reducing annual bias in direct and global irradiance estimates. A minimum of 9 months of on-site observations is found to be needed to positively impact the Model Output Statistics correction applied to satellite data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
AIR DISPERSION MODELING HIGHLIGHTS FROM 2012 ACESergio A. Guerra
Presentation includes some highlights from the dispersion modeling papers presented at the Annual AWMA conference in San Antonio, TX. Topics covered include: EMVAP, distance limitations of AERMOD, and two case studies comparing predicted and monitoring data,
Presented at the A&WMA UMS Board Meeting on August 21, 2012.
Computer model simulations are widely used in the investigation of complex hydrological systems. In particular, hydrological models are tools that help both to better understand hydrological processes and to predict extreme events such as floods and droughts. Usually, model parameters need to be estimated through calibration, in order to constrain model outputs to observed variables.
Relevant model parameters used for calibration are usually selected based on expert knowledge of the modeller or by using a local one-at-a-time (OAT) sensitivity analysis (SA). However, in case of complex models those approaches may not result in proper identification of the most sensitive parameters for model calibration. In particular local OAT SA methods are only effective for assessing the relative importance of input factors when the model is linear, monotonic, and additive, which is rarely the case for complex environmental models. In contrast Global Sensitivity Analysis (GSA)
is a formal method for statistical evaluation of relevant parameters that contribute significantly to model performance. GSA techniques explore the entire feasible space of each model parameter, and they do not require any assumptions on the model nature (such as linearity or additivity).
In this work we apply the GSA to LISFLOOD, a fully-distributed hydrological model used for flood forecasting at Pan-European scale within the European Flood Awareness System (EFAS). Two case studies are considered, snowmelt- and evapotranspiration-driven catchments, to identify sensitive parameters for both types of hydrological regimes. Results of the GSA will then be used for selecting parameters that need to be estimated during model calibration. Considering the large
number of parameters of a fully-distributed model, a two-step GSA framework is applied. First, we implement the computationally efficient screening method of Morris. This method requires a limited number of simulations and produces a qualitative ranking and selection of important factors. As a second step, we apply the variance-based method of Sobol, only to the subset of factors determined as important during the previous screening. The method of Sobol provides quantitative estimates for first order and total order sensitivity indexes of input factors.
The calibration results after the GSA will be described for both case studies and compared against those obtained by using only prior expert knowledge
Determination of the corrosion rate of a mic influenced pipeline using four c...GeraldoRossoniSisqui
1) Gasunie used data from 4 pig runs over 5 years on a pipeline affected by MIC to determine the corrosion rate. 3 approaches were used: the first calculated a single rate for all defects, the second calculated individual rates, and the third used maximum likelihood estimation.
2) The third approach, which took into account measurement errors and ensured positive rates, estimated an average rate of 0.20 mm/yr with initiation largely at installation.
3) Analyzing shallow and deep defects separately found average rates of 0.23 mm/yr and 0.25 mm/yr respectively, indicating depth does not significantly impact the rate.
This document summarizes a study that evaluated the relationship between carbon dioxide (CO2) levels, window area, and total room volume in office buildings in Jos, Nigeria. Objective data on CO2 levels, temperature, humidity and other factors was collected from four naturally ventilated offices. A linear relationship was found between the ratio of operable window area to total office volume and CO2 levels, with lower ratios corresponding to higher CO2 levels. Using regression analysis, the study determined that a ratio of 0.0229 would be required to achieve an acceptable indoor CO2 level of 509 ppm in similar east-facing, naturally ventilated office buildings in Jos. The average ratio in the offices studied was 0.0023.
Technical Considerations of Adopting AERMOD into Australia and New Zealand BREEZE Software
To ease the adoption of AERMOD into Australia and New Zealand, several technical aspects that are important to the use of AERMOD are discussed in this presentation. These technical aspects include mixing height calculation techniques, surface parameter sensitivities and considerations, urban option applicability, and selection of terrain.
Directional Analysis and Filtering for Dust Storm detection in NOAA-AVHRR Ima...Pioneer Natural Resources
This document proposes techniques for detecting dust storms and determining their direction of transport using NOAA-AVHRR satellite imagery. It introduces a two-part approach using image processing algorithms: 1) A visualization technique uses filters, edge detectors and classification to locate dust sources. 2) An automation technique performs power spectrum analysis to detect dust storm direction by analyzing texture orientation in image blocks. The goal is to automatically detect and track dust storms for applications like hazard monitoring.
This study developed a new three-dimensional variational data assimilation (3DVAR) system to assimilate MODIS aerosol optical depth (AOD) observations. The system analyzes the 3D mass concentrations of 14 aerosol species in the WRF-Chem model as part of a one-step minimization procedure. Assimilating AOD observations from MODIS improved forecasts of AOD and surface PM10 concentrations compared to the control run without data assimilation, as shown through comparisons with AERONET and CALIPSO observations of a major dust storm over East Asia in March 2010.
This document describes research using genetic programming (GP) and artificial neural networks (ANN) to develop short-term air quality forecast models for Pune, India. 36 models were developed using daily average meteorological and pollutant concentration data from 2005-2008 to predict concentrations of SOx, NOx, and particulate matter one day in advance. The models were designed to be robust in situations where complete input data is unavailable. Performance of the GP and ANN models was evaluated based on correlation, error, and other statistical measures. The research found that the GP models generally performed better than the ANN models, especially in cases with incomplete data, and had the advantage of generating equation-based forecasts.
Advanced Modeling Techniques for Permit Modeling - Turning challenges into o...Sergio A. Guerra
Advance modeling techniques can be used in AERMOD to refine the inputs that are entered in the model to get more accurate results. This presentation covers:
-AERMOD’s Temporal Mismatch Limitation
-Building Downwash Limitations in BPIP/PRIME
-Advanced Modeling Techniques to Overcome these Limitations
Solutions include:
Equivalent Building Dimensions (EBD)
Emission Variability Processor (EMVAP)
Updated ambient ratio method (ARM2)
Pairing AERMOD values with the 50th % background concentrations in cumulative analyses.
Pairing aermod concentrations with the 50th percentile monitored valueSergio A. Guerra
This document proposes a new method for combining modeled concentrations from AERMOD with monitored background concentrations.
The current practice of adding the maximum or 98th percentile monitored concentration is overly conservative. Instead, the document suggests using the 50th percentile (median) monitored concentration.
Pairing the 98th percentile modeled concentration with the 50th percentile monitored concentration results in a combined 99th percentile concentration. This provides a more conservative estimate than the form of the short-term air quality standards, while avoiding the mismatch of temporal pairing in AERMOD and the influence of exceptional events.
The proposed method is presented as a simple, protective approach for demonstrating compliance with air quality standards when considering both modeled and monitored background concentrations.
Innovative Dispersion Modeling Practices to Achieve a Reasonable Level of Con...Sergio A. Guerra
The document discusses innovative modeling practices to achieve reasonable conservatism in AERMOD modeling demonstrations. It presents a case study evaluating three modeling techniques: EMVAP, which assigns random emission rates over iterations; ARM2, which calculates NOx to NO2 conversion based on plume entrapment; and using the 50th percentile monitored background concentration. The case study found lower modeled concentrations using EMVAP and ARM2 compared to current practices, demonstrating these techniques can provide more realistic results while still protecting air quality standards. Pairing the 98th percentile predicted concentration with the 50th percentile monitored background provided a statistically conservative but reasonable level of conservatism.
INNOVATIVE DISPERSION MODELING PRACTICES TO ACHIEVE A REASONABLE LEVEL OF CON...Sergio A. Guerra
Presentation delivered at the Board meeting for the Upper Midwest section of the Air and Waste Management Association meeting on September 16, 2014.
Innovative dispersion modeling techniques are presented including ARM2, EMVAP and the 50th percentile background concentration. Case study involves peaking engines that are used 250 hour per year. These intermittent sources are required to undergo a modeling evaluation in many states. Current modeling techniques grossly overestimate the emissions from these sporadic sources.
This document summarizes the development and deployment of low-cost sensor networks to monitor urban air quality. It outlines the objective to better understand air pollution in megacities using dense sensor arrays. It then describes current research at MIT using nonparametric regression techniques and particle counter modeling to improve data from low-cost sensors. Finally, it provides examples of sensor networks deployed in Hawaii, Boston, and Delhi to measure pollutants like PM, O3, CO, and VOCs.
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim MillsIES / IAQM
Google cars are equipped with sensors that measure multiple air pollutants like NO, NO2, O3 every second as they drive through cities, recording over 10,000 measurements per hour. The data is transmitted in real-time to be analyzed. The AQMesh network in London uses over 100 static sensor pods across the city to monitor air quality at a high temporal resolution. Each pod contains multiple sensors that take readings every minute, resulting in over 2.6 billion readings analyzed per year. This big data is used to inform the public and politicians, encourage behavior change, evaluate policy interventions, and replicate air quality monitoring in other cities.
The document discusses the Industrial Source Complex Short Term (ISCST3) dispersion model. ISCST3 was developed by the EPA to assess air quality impacts from industrial point, area, and volume sources using steady-state Gaussian plume algorithms. The document evaluates ISCST3's performance in simulating short-term (1 hour) and long-term (annual) air pollutant concentrations by comparing predicted and measured NO2, SO2, and particulate levels near an industrial area. While predictions of long-term NO2 and particulate concentrations showed good accuracy, short-term and SO2 predictions were less accurate, possibly due to imprecise emission estimations. Overall, the ISCST3 model provides a useful
Presentation includes information related to gently sloping terrain, AERMINUTE, and EPA formula height.
Presented at the 27th Annual Conference on the Environment on November 13, 2012.
Water spray geoengineering to clean air pollution for mitigating severe haze ...CLEEN_Ltd
CLEEN's MMEA program organised an international seminar on cleaner air - Outdoor and indoor air quality together with Zhejiang University and assistant organizer Insigma group.
This is one of the keynote presentations in the seminar.
More info in www.mmea.fi
The cleantech field is expanding rapidly and Finnish companies are committed to working for a better environment in the fields of energy efficiency, air quality and monitoring. The world-class Cleantech know-how from Finland and the cooperation with Chinese partners and the results were highlighted in the MMEA seminar. Some of the leading Finnish cleantech companies together with Finnish and Chinese research institutions were present at the event. The seminars focused on cooperation between Finland and China concerning indoor and outdoor air quality and solutions to make them better.
Air quality challenges and business opportunities in China: Fusion of environ...CLIC Innovation Ltd
MMEA (The Measurement, Monitoring and Environmental Efficiency Assessment) research program final seminar presentation by Dr. Ari Karppinen, Finnish Meteorological Institute
Evaluation of procedures to improve solar resource assessments presented WREF...Gwendalyn Bender
This document evaluates two methods for improving solar resource assessments by combining long-term satellite data with short-term ground observations: 1) Correcting for bias in satellite-derived irradiance time series using local aerosol optical depth data, and 2) Applying a statistical Model Output Statistics correction using on-site observations. The methods are tested at a site in Israel, with the aerosol optical depth correction significantly reducing annual bias in direct and global irradiance estimates. A minimum of 9 months of on-site observations is found to be needed to positively impact the Model Output Statistics correction applied to satellite data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
AIR DISPERSION MODELING HIGHLIGHTS FROM 2012 ACESergio A. Guerra
Presentation includes some highlights from the dispersion modeling papers presented at the Annual AWMA conference in San Antonio, TX. Topics covered include: EMVAP, distance limitations of AERMOD, and two case studies comparing predicted and monitoring data,
Presented at the A&WMA UMS Board Meeting on August 21, 2012.
Computer model simulations are widely used in the investigation of complex hydrological systems. In particular, hydrological models are tools that help both to better understand hydrological processes and to predict extreme events such as floods and droughts. Usually, model parameters need to be estimated through calibration, in order to constrain model outputs to observed variables.
Relevant model parameters used for calibration are usually selected based on expert knowledge of the modeller or by using a local one-at-a-time (OAT) sensitivity analysis (SA). However, in case of complex models those approaches may not result in proper identification of the most sensitive parameters for model calibration. In particular local OAT SA methods are only effective for assessing the relative importance of input factors when the model is linear, monotonic, and additive, which is rarely the case for complex environmental models. In contrast Global Sensitivity Analysis (GSA)
is a formal method for statistical evaluation of relevant parameters that contribute significantly to model performance. GSA techniques explore the entire feasible space of each model parameter, and they do not require any assumptions on the model nature (such as linearity or additivity).
In this work we apply the GSA to LISFLOOD, a fully-distributed hydrological model used for flood forecasting at Pan-European scale within the European Flood Awareness System (EFAS). Two case studies are considered, snowmelt- and evapotranspiration-driven catchments, to identify sensitive parameters for both types of hydrological regimes. Results of the GSA will then be used for selecting parameters that need to be estimated during model calibration. Considering the large
number of parameters of a fully-distributed model, a two-step GSA framework is applied. First, we implement the computationally efficient screening method of Morris. This method requires a limited number of simulations and produces a qualitative ranking and selection of important factors. As a second step, we apply the variance-based method of Sobol, only to the subset of factors determined as important during the previous screening. The method of Sobol provides quantitative estimates for first order and total order sensitivity indexes of input factors.
The calibration results after the GSA will be described for both case studies and compared against those obtained by using only prior expert knowledge
Determination of the corrosion rate of a mic influenced pipeline using four c...GeraldoRossoniSisqui
1) Gasunie used data from 4 pig runs over 5 years on a pipeline affected by MIC to determine the corrosion rate. 3 approaches were used: the first calculated a single rate for all defects, the second calculated individual rates, and the third used maximum likelihood estimation.
2) The third approach, which took into account measurement errors and ensured positive rates, estimated an average rate of 0.20 mm/yr with initiation largely at installation.
3) Analyzing shallow and deep defects separately found average rates of 0.23 mm/yr and 0.25 mm/yr respectively, indicating depth does not significantly impact the rate.
This document summarizes a study that evaluated the relationship between carbon dioxide (CO2) levels, window area, and total room volume in office buildings in Jos, Nigeria. Objective data on CO2 levels, temperature, humidity and other factors was collected from four naturally ventilated offices. A linear relationship was found between the ratio of operable window area to total office volume and CO2 levels, with lower ratios corresponding to higher CO2 levels. Using regression analysis, the study determined that a ratio of 0.0229 would be required to achieve an acceptable indoor CO2 level of 509 ppm in similar east-facing, naturally ventilated office buildings in Jos. The average ratio in the offices studied was 0.0023.
Technical Considerations of Adopting AERMOD into Australia and New Zealand BREEZE Software
To ease the adoption of AERMOD into Australia and New Zealand, several technical aspects that are important to the use of AERMOD are discussed in this presentation. These technical aspects include mixing height calculation techniques, surface parameter sensitivities and considerations, urban option applicability, and selection of terrain.
Directional Analysis and Filtering for Dust Storm detection in NOAA-AVHRR Ima...Pioneer Natural Resources
This document proposes techniques for detecting dust storms and determining their direction of transport using NOAA-AVHRR satellite imagery. It introduces a two-part approach using image processing algorithms: 1) A visualization technique uses filters, edge detectors and classification to locate dust sources. 2) An automation technique performs power spectrum analysis to detect dust storm direction by analyzing texture orientation in image blocks. The goal is to automatically detect and track dust storms for applications like hazard monitoring.
This study developed a new three-dimensional variational data assimilation (3DVAR) system to assimilate MODIS aerosol optical depth (AOD) observations. The system analyzes the 3D mass concentrations of 14 aerosol species in the WRF-Chem model as part of a one-step minimization procedure. Assimilating AOD observations from MODIS improved forecasts of AOD and surface PM10 concentrations compared to the control run without data assimilation, as shown through comparisons with AERONET and CALIPSO observations of a major dust storm over East Asia in March 2010.
This document describes research using genetic programming (GP) and artificial neural networks (ANN) to develop short-term air quality forecast models for Pune, India. 36 models were developed using daily average meteorological and pollutant concentration data from 2005-2008 to predict concentrations of SOx, NOx, and particulate matter one day in advance. The models were designed to be robust in situations where complete input data is unavailable. Performance of the GP and ANN models was evaluated based on correlation, error, and other statistical measures. The research found that the GP models generally performed better than the ANN models, especially in cases with incomplete data, and had the advantage of generating equation-based forecasts.
A Smart air pollution detector using SVM ClassificationIRJET Journal
This document summarizes a research paper that proposes a smart air pollution detector using an SVM classification model. It begins with an abstract that describes the need to control rising air pollution levels in developing countries like India. It then discusses particulate matter (PM) and its health risks when concentrated. The paper proposes to regularly check PM concentration levels using machine learning techniques. It reviews related work applying models like naive Bayes, SVM and regression to predict air quality. It then describes the existing systems' limitations and proposes a system that classifies PM2.5 levels using logistic regression and forecasts levels using an SVM model for improved accuracy. The paper analyzes the results and concludes machine learning can accurately predict future pollution levels to help people be aware and take action
A Deep Learning Based Air Quality PredictionDereck Downing
The document discusses using deep learning techniques to predict air quality. Specifically, it proposes using a Long Short-Term Memory (LSTM) model to predict hourly air quality index values. The LSTM model is trained on historical air quality and meteorological data. The proposed LSTM model is found to outperform existing models at predicting air quality, as measured by a lower root mean square error (RMSE) value for predictions. The document aims to develop techniques for accurately forecasting air quality to help address increasing air pollution issues.
The document presents information on emerging technologies for air quality monitoring. It discusses various air pollutants like particulate matter, carbon monoxide, nitrogen oxides, and sulfur dioxides. It also describes different air sampling processes and the application of air quality index (AQI) to report daily air quality levels. The document outlines the objectives to analyze air quality data from pollution control boards and use sensors to provide cautionary values to alert people and improve air quality. It discusses literature review on indoor air quality and wireless sensor networks for air monitoring.
The document presents a new method called GA-KELM for predicting air quality index (AQI) values. It introduces extreme learning machines (ELM) and discusses their limitations. It then proposes using a genetic algorithm to optimize the number of hidden nodes, thresholds, and weights in a kernel extreme learning machine (KELM) model in order to improve prediction accuracy. Experimental results on real-world datasets show the GA-KELM method trains faster and more accurately predicts AQI values than other methods like CMAQ, SVM, and DBN-BP.
IRJET- A Comprehensive Review on Air Pollution Detection using Data Mingi...IRJET Journal
This document provides a comprehensive review of data mining techniques that can be used to predict benzene (C6H6) levels in the air without using sensors. It first discusses several data mining algorithms like decision trees, support vector machines, and random forests. It then reviews literature where researchers have used techniques like neural networks, neuro-fuzzy systems, and ensemble methods to predict various air pollutants. Most studies found that tree-based models like random forests performed best. The review aims to evaluate techniques for predicting benzene, a hazardous air pollutant linked to cancer, to improve public health.
Analysis Of Air Pollutants Affecting The Air Quality Using ARIMAIRJET Journal
This document discusses analyzing air pollutants affecting air quality using the ARIMA time series model. It begins with an abstract describing the decreasing air quality due to factors like traffic and industry. It then discusses predicting and forecasting the Air Quality Index using time series models like ARIMA. The document reviews literature on previous studies analyzing air pollution data using techniques like neural networks and random forests. It describes preprocessing time series air pollution data to address missing values and assess stationarity before deploying the ARIMA model to make predictions.
IRJET- Analysis and Prediction of Air QualityIRJET Journal
This document discusses air pollution prediction using machine learning techniques. It begins with an introduction to air pollution, defining it as the introduction of harmful substances into the atmosphere. The six main criteria air pollutants are then described: ozone, particulate matter, carbon monoxide, nitrogen dioxide, sulfur dioxide, and lead. Common machine learning techniques for air pollution prediction are also introduced, including supervised learning algorithms like random forests and support vector machines, and unsupervised learning algorithms like k-means clustering. The document concludes that machine learning provides opportunities for improved air pollution prediction by analyzing historical pollution data.
This document summarizes a time series analysis of air pollution data from Richmond, Virginia conducted using R. The analysis examined particulate matter (PM2.5 and PM10), lead, carbon monoxide, and ozone over 2010-2013. Correlation between pollutants was low. Univariate time series models like ARIMA were fitted to each pollutant and compared to 2013 data. ARIMA predicted PM2.5 and lead levels accurately but not other pollutants. The analysis aimed to apply methods from a Bulgarian air pollution study to a US city.
Air Pollution Prediction via Differential Evolution Strategies with Random Fo...IRJET Journal
This document discusses using a hybrid machine learning technique combining differential evolution and random forest methods to predict air pollution levels. It analyzes data on various pollutants from two cities in India - Delhi and Patna. The proposed approach is experimentally validated to achieve better performance compared to independent classifiers and multi-label classifiers in terms of accuracy, area under the curve, success index and correlation. Differential evolution is used to initialize population and optimize candidate solutions. Random forest creates an ensemble of decision trees to make predictions. The hybrid method is tested on predicting carbon monoxide, nitrogen dioxide and benzene levels using data from a monitoring station in Delhi.
Towards an accurate Ground-Level Ozone PredictionIJECEIAES
This paper motivation is to find the most accurate technique to predict the ground level ozone at Al Jahra station, Kuwait. The data on the meteorological variables (air temperature, relative humidity, solar radiation, direction and speed of wind) and concentration of seven pollutants of environment (SO2, NO2, NO, CO2, CO, NMHC, and CH4) were applied to forecast the ozone concentration in atmosphere. In this report, three methods (PLS regression, support vector machine (SVM), and multiple least-square regression) were used to predict ground-level ozone. We used Fifteen parameters to evaluate the performance of methods. Multiple least-square regression, partial least square regression (PLS regression), and SVM using linear and radial kernels were the best performers with MAE (mean absolute error) of 9.17x 10-03, 9.72 x 10-03, 9.64 x 10-03, and 9.12 x 10-03, respectively. SVM with polynomial kernel had MAE of 5.46 x 10-02. These results show that these methods could be used to predict ground-level ozone concentrations at Al Jahra station in Kuwait.
Smog triggered due to air pollutants and fog. Deep Learning techniques are
applied to predict the smog severity. This paper presents deep learning based
predictive model for various air pollutants (NO2, NOx, CO, SO2, O3, PM2.5, PM10) for
metropolitan area Air pollution dataset. Central Pollution Control Board (CPCB) is
monitoring air, water, waste, etc through nationwide programs. Through National Air
Quality Monitoring program, the primary and secondary air pollutants are captured
and available in online. In this paper, two traditional predictive models along with
deep learning technique Long-Short Term Memory (LSTM) are used for predicting the
air pollutants. Before training the model, the missing values and noise in dataset were
imputed using mean value. Then, the models are built with LR, ARIMA, and LSTM.
Finally, the models performance is measured using Mean Absolute Error and Root
Mean Square Error (RMSE). LSTM performed better than LR and ARIMA
The document discusses using big data technologies for environmental forecasting and climate prediction at the Barcelona Supercomputing Center (BSC). It outlines three key areas: 1) Developing capabilities for air quality forecasting using data streaming; 2) Implementing simultaneous analytics and high-performance computing for climate predictions; 3) Developing analytics as a service using platforms like the Earth System Grid Federation to provide climate data and services to users. The BSC is working on several projects applying big data, including operational air quality and dust forecasts, high-resolution city-scale air pollution modeling, and decadal climate predictions using workflows and remote data analysis.
PPT.pdf internship demo on machine lerningMisbanausheen1
The document summarizes an internship project to predict air quality using linear regression. The intern collected air quality data on carbon monoxide and ozone levels from the World Weather Repository. Using Python and scikit-learn, a linear regression model was trained to predict ozone levels based on carbon monoxide levels. The model achieved a prediction score of [number]%. The intern concluded the project demonstrated the potential of machine learning techniques like linear regression for making accurate predictions to benefit various sectors.
APPLYING FIXED BOX MODEL TO PREDICT THE CONCENTRATIONS OF (PM10) IN A PART OF...IAEME Publication
This document describes applying a fixed box model to predict concentrations of particulate matter (PM10) in part of Al-Kut City, Iraq. The model divides the area into a rectangular box and estimates pollutant concentration based on emission sources, wind speed, and mixing height. Input parameters were calculated based on road, vehicle, construction, and domestic emission inventories. Calculated emission capacities were adjusted based on field surveys. The model was run under two mixing height scenarios and results were similar to portable equipment measurements, indicating the fixed box model is easy to use for evaluating air pollution in Al-Kut City.
Forecasting Municipal Solid Waste Generation Using a Multiple Linear Regressi...IRJET Journal
- The document describes developing a multiple linear regression model to forecast municipal solid waste generation based on factors like population, population density, education levels, access to services, and income levels.
- The model was developed using data from various municipalities in Italy. Exploratory data analysis was conducted to determine linear relationships between waste generation and predictors.
- The linear regression model achieved a high R-squared value of 91.81%, indicating a close fit to the data. Various error metrics like MAE, MSE and RMSE were calculated to evaluate model performance.
- The regression model provides a simple yet accurate means of predicting municipal solid waste that requires minimal data and can be generalized to other locations.
IRJET - Prediction of Air Pollutant Concentration using Deep LearningIRJET Journal
This document describes a study that used artificial neural networks to predict PM2.5 air pollutant concentration levels in Manali, Chennai, India. The study collected data on PM2.5 levels as well as factors like NO2, SO2, CO, relative humidity, and wind speed over a three year period. An artificial neural network model with feed-forward backpropagation was developed using this data, with 6 input nodes (for each pollutant/factor) and 1 output node (for PM2.5 levels). The model was trained on 70% of the data and tested on the remaining 15%, achieving a correlation coefficient between predicted and observed PM2.5 of 0.8. The study demonstrated that artificial neural
Climate Visibility Prediction Using Machine LearningIRJET Journal
The document describes a research project that aims to develop a machine learning model to predict visibility distance based on climatic indicators. The researchers collected a dataset containing measurements of variables like temperature, humidity, wind speed, and atmospheric pressure along with corresponding visibility distance readings. They plan to preprocess the data, select important features, and train regression algorithms like decision trees, XGBoost, and linear regression to predict visibility. Accurately predicting visibility can benefit various sectors by improving safety and decision making. The researchers hope to contribute to knowledge in this area and provide a useful tool for organizations.
Climate Visibility Prediction Using Machine LearningIRJET Journal
The document describes a research project that aims to develop a machine learning model to predict visibility distance based on climatic indicators. The researchers collected a dataset containing measurements of variables like temperature, humidity, wind speed, and atmospheric pressure along with corresponding visibility distance readings. They plan to preprocess the data, select important features, and train regression algorithms like decision trees, XGBoost, and linear regression to predict visibility. Accurately predicting visibility can benefit various sectors by improving safety and decision making. The researchers intend to evaluate model performance, tune hyperparameters, and ultimately deploy the optimal model for real-world visibility forecasting.
Ähnlich wie ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR POLLUTION ANALYSIS (20)
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR POLLUTION ANALYSIS
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
DOI: 10.5121/ijdkp.2017.7602 21
ONLINE SCALABLE SVM ENSEMBLE LEARNING
METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR
POLLUTION ANALYSIS
Shahid Ali and Simon Dacey
Department of Computing, Unitec Institute of Technology, Auckland, New Zealand
ABSTRACT
Environmental air pollution studies fail to consider the fact that air pollution is a spatio-temporal problem.
The volume and complexity of the data have created the need to explore various machine learning models,
however, those models have advantages and disadvantages when applied to regional air pollution analysis,
furthermore, most environmental problems are global distribution problems. This research addressed
spatio-temporal problem using decentralized computational technique named Online Scalable SVM
Ensemble Learning Method (OSSELM). Evaluation criteria for computational air pollution analysis
includes: accuracy, real time & prediction, spatio-temporal and decentralised analysis, we assert that these
criteria can be improved using the proposed OSSELM. Special consideration is given to distributed
ensemble to resolve spatio-temporal data collection problem (i.e. the data collected from multiple
monitoring stations dispersed over a geographical location). Moreover, the experimental results
demonstrated that the proposed OSSELM produced impressive results compare to SVM ensemble for air
pollution analysis in Auckland region.
KEYWORDS
SVM, Ensemble, Spatio-temporal, Aggregation, Scalable
1. INTRODUCTION
Machine learning algorithms must keep with latest developments in environmental and other
applications. Machine learning algorithms performance is dependent on sufficient and reliable
data coming from various sources. With internet revolution, nowadays data is easily accessible
through electronic sources. However, environmental data is sometimes short and contains missing
data because of equipment failure, human error or incorrectly setting up monitoring stations
dimensions etc. Therefore, present machine learning models for air pollution prediction analysis
do not represent the true picture of air pollution. Further, the existing online models are
confronted with the problem of accommodating huge amount of spatiot-emporal data. In this
regard scalability of machine learning model and aggregating various model decision plays an
important role in online spatio-temporal air pollution prediction. This chapter investigates Online
Scalable SVM Ensemble Learning method (OSSELM) towards online spatio-temporal Nitrogen
dioxide (NO2), carbon dioxide (CO) and ozone (O3) prediction for Auckland over its monitoring
stations Auckland wide.
Our daily activities result in release of chemicals and particles in air that we breath. These air
pollutants in air can cause hazy days, unpleasant smells and have adverse effects on our health.
Air quality is dependent on the amount of pollution released in air by human and natural
activities, the degree of diffusion because of wind and weather effects and chemical reaction
among the pollutants [1]. The key pollutants in Auckland region that impact the quality of air
includes, NO2, CO and O3.
In Auckland, New Zealand these pollutants are measured by Auckland Regional Council (ARC)
because they are known to endanger human well-being and health. The pollutants concentrations
in New Zealand are measured to national standards or to regional air quality targets. On average,
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
22
every Aucklander breathes 11,000 litres of air every day. However, New Zealand’s air quality is
comparatively good to other nations, but we have the worst asthma death rate [2]. Asthmatics
people are sensitive to poor air quality, which in Auckland primarily caused by motor vehicles
and other sources such as domestic fires. Hence, the long-term exposure to air pollution results in
increase in cardiovascular and lungs diseases resulting in heavy medical costs to public [8] beside
damages to other living organisms such as vegetation.
Air pollution threats to public health lead to development of several machine learning models to
predict air quality for future. However, air pollution monitoring is a complex task. Monitoring
stations consisted of sensor devices which collect each pollutant concentration every minute
across whole year. Knowledge discovery from such a large amount of data is again complex, time
consuming and quite expensive task. Air pollution which is a spatio-temporal problem, and
spatio-temporal data is always in huge amount. Hence, machine learning practitioners are
confronted with the problem of handling of this large amount of data and computational time
constraint to solve air pollution problem. Yet, machine learning practitioners have to face missing
values into the data problem in time-series problem, with missing values a machine learning
model would produce biased results towards specific characteristics of spatio-temporal prediction
tasks. Hence, keeping the above problems encountered in machine learning, we proposed a
method Online scalable SVM ensemble (OSSELM) for air pollution prediction for future.
OSSELM will have the ability to handle large amount of spatio-temporal and will have the ability
to conduct environmental prediction on missing data. The proposed method will predict air
pollution status for whole region based on CO, NO2 and ground level O3 concentrations.
Ensembles of SVM models have been used in a wide range of applications such as in medical,
engineering and environmental studies [3]. Ensembles are considered as one of the powerful and
successful approaches for prediction tasks. There are several studies carried out (e.g. [4] & [5]) to
understand ensembles and to enlighten their effectiveness in various applications. Ensemble SVM
works on the principle of divide and conquer strategy and various local model’s decisions are
aggregated through aggregations method to get into the cause of problem. Through sub division
strategy training time is heavily reduced resulting in more models to be trained for prediction
task. For example, in training of p classifiers on subsamples size of n/p will result in Ω(n2
/p)
approximate complexity. The decreased in complexity will help in dealing with big data sets and
nonlinear kernels.
Air pollution data is often time dependent and spatially distributed. Currently there are multiple
monitoring stations collecting air pollution data from locally to global geographical scales. Data
quality and quantity collected from these multiple monitoring stations depends on the tools used
and design of monitoring network etc. In Auckland region, New Zealand CO, NO2 and O3 data is
monitored extensively by Auckland Regional Council (ARC) consisting of 20 monitoring stations
Auckland wide on hourly basis over whole year. However, some of the pollutants monitoring
stations were having missing data which created challenge on CO, (NO2 and O3 prediction for
future air pollution status in Auckland. The data we use of CO, NO2 and O3 for prediction
consisted of 20 monitoring stations i.e. fifteen urban stations, four rural stations and one industrial
station. From the literature, it was quite evident that various traditional prediction models tend to
predict air pollution for a specific location, in this chapter we proposed a new decentralised online
scalable SVM ensemble method for air pollution prediction for the future state of whole region
considering the geographical characteristics of spatio-temporal CO, (NO2) and (O3).
The remainder of this research is organized as follows: Section 2 discusses the previous work on
online air pollution prediction studies. Section 3 provides methodology for the research.
The information regarding data set, experimental setup and modelling evaluation criteria is
provided in section 4. Section 5 focuses on experimental results and discussion and finally in
section 6 conclusion to this research is provided.
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
23
2. RELATED WORK
In this section, we consider significant works on machine learning and computational methods for
air pollution prediction and analysis. The previous work presents a mixture of methods such as
neural networks, support vector machines and other approaches towards air pollution forecasting.
Perhaps the earlier work on computational air pollution prediction was started in earliest 19th
century. In 1980s a Group Method of Data Handling (GMDH) algorithm was proposed [6]. This
algorithm had the ability to not divide the available data into training and testing data for
determining the structure of the partial polynomials or determining the number of intermediate
variables. The GMDH algorithm was applied for the short term prediction of air pollution
concentrations. For this study SO2 time series data was deployed. Based on the wind velocity and
wind direction in Tokushima in Japan a few hours advance SO2 was developed. The obtained
results were compared with a linear regression model and linear autoregressive model, where
GMDH algorithm outperformed the other models.
As most of the country population is resided in the cities and their numerous external activities
from one place to another result in air population in urban areas. In this regard, a system for
monitoring and forecasting air pollution in urban areas was proposed [7]. The proposed system
deployed low-cost-air quality monitoring motes that were easily available through an array of
gaseous and meteorological sensors. These motes were then transferred to an intelligent sensing
platform which consisted of several modules. These modules were responsible for receiving and
storage of data and further converting data into useful information for forecasting the pollutants.
Three machine learning algorithms such as support vector machines (SVM), artificial neural
network (ANN) and M5P model trees were deployed. The results depicted that multivariate
modeling with M5P algorithm provided the best forecasting accuracy for SO2 compare to other to
other algorithms.
Nitrogen oxide emission from vehicles emission results in considerable health issues. In this
regard, an accurate online support vector regression model (AOVSR) proposed for the emission
prediction of nitrogen oxide (NOx) [9]. It was quite evident from the results that AOVSR
performance on small sample data was quite efficient in comparison to support vector regression
model and artificial neural network. The proposed model had the capability to predict NOx
emission accurately under certain conditions when parameters were modified. The overall
efficiency and prediction accuracy of proposed model was good as the proposed model had the
ability to update the parameters by itself with respect to change in time and with change of other
parameters.
Particulate matter (PM) is consisted of solid and liquid particles which remains in air for a while,
creating a great threat to human health. This provides an opportunity for the researchers to
consider causes of PM emission, depending of its level of concentration in air at a certain time
and place. Hence, a method for PM emission prediction based on least square support vector
machine (LS-SVM) algorithm was proposed [10]. LS-SVM algorithm was based on the principal
of reconstruct phase space which was derived from the Takens embedding theorem. In this
method, the data was divided into two parts; training and testing. The learning model was
obtained by window moving having width n, along the axis time. The results of LS-SVM
demonstrated better predication of PM by numerical experiments.
The fast growing of industrial activities resulting in air pollution problem, which is a major
concern for public health. An innovative wireless sensor network for air pollution monitoring
system (WAPMS) proposed [11]. The proposed system makes use of air quality index for air
pollution monitoring in Mauritius. To improve the efficiency of WAPMS a new algorithm for
data aggregation named Recursive Quartiles (RCQ) was implemented. This new algorithm RCQ
had the ability to eliminate duplicate and invalid readings, which resulted in reduction of data
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
24
transmission to centralised station. To handle with any privacy and management issues WAPMS
was equipped with hierarchical routing protocol, which caused the motes to sleep in idle time.
It is very important to know the causes of air pollution to avoid further loss to humans and other
living organisms. In this regard to know the sources of air pollution principal components
analysis was deployed [12]. Along with PCA tree based ensemble learning were constructed for
the prediction of urban air quality. It is identified through that PCA vehicles emission and fuel
combustion are the main two sources for air pollution presence. Various tree based ensemble
learning i.e., decision tree forest, single decision tree and decision treeboost generalization and
predictive performance was evaluated and compared with conventional machine learning
approaches such as SVM. The miss-classification rate for single decision tree was 8.32%, 4.12%
for decision tree forest, 5.62% for decision treeboost and 6.18% for support vector machines. The
classification accuracy of decision tree forest and decision treeboost ensembles was
comparatively high to classification accuracy of SVM classification and regression, this was
successfully done by deploying bagging and boosting algorithms with these trees based ensemble
models.
As mentioned earlier that incomplete or missing data results in biased results because of missing
data machine learning algorithms confronts the problem of inaccurate prediction performance. In
this regard to handle missing environmental data spatial data aided incremental support vector
regression (SalncSVR) model for spatio-temporal PM 2.5 was proposed [13]. In the proposed
method, spatial data was used for the training of temporal prediction model. PM 2.5 data was
obtained through 13 monitoring stations of Auckland, New Zealand. The results of SalncSVR
model were compared with temporal lncSVR model, where SalncSVR model resulted in better
prediction statistics.
Furthermore, some authors have focused their efforts for forecasting air pollution by machine
learning approaches such as neural networks, support vector machines and kernel based
algorithms. However, to reduce error rate between the model and raw data a mixed approach of
consisting support vector machines and kernel functions was proposed for forecasting urban air
quality [14]. The kernel functions that were considered for pollutants concentration forecasting of
PM 2.5, SO2 and O3 were consisted of Gaussian, Polynomial and Spline. The application of SVM
along with these kernels resulted in good accuracy modelling for pollutant concentration
forecasting.
In the past for the effective air pollution forecasting data mining techniques were deployed as
well. Artificial neural network model consisting of data mining techniques based on Feed
Forward Neural Networks (FFNN) and Multilayer Perceptron (MLP)neural network models were
applied for urban and industrial air pollution impacts area [15]. The obtained air pollution patterns
shown greater accuracy and lower error rate with MLP neural network model.
Traffic and environmental data is also presented as a time series. Due to dynamic nature of real
time data predicting and improving performance of such a task is a great challenge. With such
aim a new type of ensembles based on bagging algorithm was proposed to improve the predictive
performance of a real time data [16]. Diversity is very important in ensemble creation. In this
regard diversity was created through bagged regression tress. However, this study focused on the
diversity creation through bagging but fail to highlight the aggregation strategy of decision
making of various ensemble models.
Lastly in terms of air pollution daily prediction a method using support vector machines and
wavelet decomposition was proposed [17]. The measured time series data was decomposed into
wavelet representation and from there wavelet coefficients were predicted. From these wavelet
coefficient values the final daily air pollution forecast was prepared. The forecast approach was
proposed by applying neural network of SVM type applying a regression mode. However, the
study of this work was limited to Gaussian kernel.
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
25
From the literature review and to our knowledge we argue the above researches were only
attentive on prediction of air pollutant concentrations and to study the impacts of a single
pollutant at certain time for research purposes. Spatio-temporal air pollution data is continuously
transmitted, streaming asynchronously from multiple locations and hence imposes certain
challenges: (i) air pollution problem is spatio-temporal; (ii) dynamic flow of big data streams; (iii)
data streams are physically distributed at different places; and (iv) size of the problem could be
different, depending on the size of region and number of monitoring stations in that region.
Further, it was quite evident from the literature that centralised computing was applied for air
pollution analysis. Air pollution data is physically distributed and decentralized monitored
through various monitoring stations. Therefore, we strongly argue that centralised computing has
certain downsides: (i) centralized data analysis leads to resource challenges, (ii) Online decision
making over huge amount of data is difficult, and (iii) system was not scalable, while the problem
is scalable.
Hence, to address the above confronts we proposed an Online Scalable SVM Ensemble Learning
Method (OSSELM) for air pollution prediction on pollutant concentrations of CO, NO2 and O3 for
whole region. The main novelties of the proposed OSSELM are: (i) capable of conducting spatio-
temporal analysis, (ii) combining distributed monitoring stations knowledge (fusion) through
ensemble learning, (iii) online learning enables to accommodate huge amount of data, and (iv)
scalability of system enables regional data analysis and knowledge discovery.
3. PROPOSED ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD
(OSSELM)
The common problems associated to solve air pollution problem with centralised computing are:
centralised data analysis leads to resource challenges, online decision making to over huge
amount of data is difficult and one of the biggest problem of centralised computing is its failure to
accommodate large amount of data [22]. All the above challenges lead us for the development of
distributed SVM ensemble technique. One of the key contribution of distributed SVM ensemble
technique is its cost effective computation and high processing speed for obtaining results. The
computation task in this technique is achieved by divide and conquer mechanism, hence reduces
the communication load and achieving high computing capacity.
The proposed Online Scalable SVM Ensemble Learning Method (OSSELM) provides solution to
spatio-temporal air pollution analysis. OSSELM can handle large amount of regions air pollution
data and has the scalability ability to merge knowledge of multiple monitoring stations. Further
the proposed method can handle online data streams and system can detect air pollution problem
of the whole region and triggering an alarm for potential incident at a certain location. Figure 1
shows the illustration of proposed decnetralised solution for spatio-temporal air pollution
analysis.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
26
Figure 1. Illustration of Proposed Decentralised Solution for Spatio-temporal Air Pollution Analysis
Figure 1 shows that air pollution analysis is consisted of knowledge from multiple monitoring
stations. Each region is composed of various monitoring stations according to the size of region.
Based on multiple monitoring stations data SVM ensemble is constructed, which is a collection of
numerous SVM ensembles; finally, their knowledge is transferred to the area center for
knowledge aggregation and decision support.
3.1 SYSTEM DESIGN
Spatio-temporal air pollution data is obtained from multiple monitoring stations situated in
various regions of an area. Local SVMs are modeled on spatial and temporal dimensions. For
future decision making regarding air pollution problem knowledge from aspects of various SVMs
are integrated. Figure 2 shows the design of OSSELM for spatio-temporal air pollution analysis
consisted of four steps.
Figure 2. Design of OSSELM for Spatio-temporal Air Pollution Analysis
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
27
Step 1: To subsample the region data when it is huge Bag of Little Bootstraps (BLB) is applied.
Step 2: For training of SVM ensembles Boosting method is applied.
Step 3: Multiple individually trained SVM ensembles are combined to build local decision
model.
Step 4: Multiple local decision models are aggregated to get the status of air pollution problem
for the whole region.
This research follows earlier research and brief detail of above steps can be found in that
research having reference [22].
4. DATA SET AND EXPERIMENTAL SETUP
In this section, the proposed OSSELM for air pollution prediction in Auckland region will be
experimented on daily hourly CO, NO2 and O3 pollutant concentrations collected from 20
monitoring stations in Auckland. Explicitly, we conduct the following cases for: (i) to conduct
spatio-temporal prediction for 20 monitoring stations in Auckland; (ii) use ensemble learning to
combine multiple stations distributed knowledge; (iii) to accommodate huge amount of data; and
(iv) to perform air pollution prediction in real time.
4.1 DATA SET
The data set was obtained from 20 sites comprising of urban, rural and industrial air quality
monitoring network of the Auckland Regional Council (ARC), New Zealand. These stations are
constantly monitoring air quality pollutants concentrations such as CO, NO2 and O3 each hour
every day whole year. The CO, NO2 and O3 were recorded on an hourly basis 24 hour a day from
1st January 2010 to 31st December 2010. According to the data the recoding of CO, NO2 and O3
concentrations, each station has different number of samples depending on the meteorological
parameters wind speed, air temperature humidity and evaporation rate.
Table 1 shows air monitoring sites in Auckland region as of April 2010. Table 1 shows that in
Auckland region key air pollutants are monitored at various sites to establish the levels of air
quality. The current ARC air quality monitoring network comprises of 12 permanent, 1 industrial
and 7 mobile sites for pollutant monitoring. All the sites are within the Auckland Urban Airshed.
The network extends from Pukekohe in the South to Takapuna in the North, and from Henderson
in the West to Botany Downs in the East as shown in Table 1. All sites record CO, NO2, and O3.
The air quality sites based on 2010 data include: Peak Traffic and CBD sites: Khyber Pass Road
(Newmarket) and Queen Street (Auckland CBD) Residential: Dominion Road, Takapuan,
Manurewa, Hobson Street, Otara, Henderson, Pakuranga, Dominion road, Newton, Botany
Downs, Glen Eden, Saint Marys Bay, Pukekohe Industrial: Penrose (Gavin Street).
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
28
Table 1. Air Monitoring Sites in Auckland Region (as of April 2010)
4.2 EXPERIMENTAL SETUP
The EPI (Environmental Performance Indicators) is a form of Air Quality index (AQI), which are
used widely around the world. Currently, EPIs are not widely used and councils now rarely use
them. However, back in 2010 Auckland council was using EPIs for monitoring of air pollutant
concentrations. The EPI values are different for each air pollutant concentration i.e. CO, NO2 and
O3, resulting in different classes. The data of this study includes five classes for each pollutant
CO, NO2 and O3 total of 15 classes for air pollution monitoring in Auckland region. For
classification purposes, each pollutant class concentration was represented as with their name
along with their level of intensity as shown in Table 2, Table 3 and Table 4 respectively.
Table 2. CO Concentrations and Representations
EPI Class Concentration Representation
Excellent 0-1 CO1
Good 1-3.3 CO2
Acceptable 3.3-6.6 CO3
Alert 6.66-10 CO4
Action >10 CO5
Table 2 shows the CO concentrations and its representations according to EPI classes. CO
concentration between 0 to 1 represented as CO1 and classed as excellent and so on. Whereas,
Location District Council
Area
CO NO2 O3
Queen Street III Auckland Y Y Y
Dominion Road Auckland Y Y Y
Takapuan North Shore Y Y Y
Manurewa Manukau Y Y Y
Hobson Street Auckland Y Y Y
Khyber Pass Road (A) Auckland Y Y Y
Henderson Waitakere Y Y Y
Pakuranga Manukau Y Y Y
Queen Street II Auckland Y Y Y
Dominion Road II Auckland Y Y Y
Newton Auckland Y Y Y
Botany Downs Manukau Y Y Y
Penrose IV (A - B) Auckland Y Y Y
Musick Point Auckland Y Y Y
Penrose IV (C) Auckland Y Y Y
Penrose IV (D) Auckland Y Y Y
Glen Eden Waitakere Y Y Y
Saint Mary’s Bay Auckland Y Y Y
Pukekohe Franklin Y Y Y
Number of Sites monitoring
Parameters
20 20 20
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
29
CO concentration greater than 10 is represented as CO5 and classed as action, meaning
immediate action is required to stop further CO pollution spreading.
Table 3. NO2 Concentrations and Representations
EPI Class Concentration Representation
Excellent 0-10 NO1
Good 10-33 NO2
Acceptable 33-66 NO3
Alert 66-100 NO4
Action >100 NO5
Table 3 shows that NO2 concentrations and its representations according to EPI classes. NO2
concentration between 0 to 10 represented as NO1 and classed as excellent and so on. Whereas,
NO2 concentration greater than 100 is represented as N05 and classed as action, meaning
immediate action is required to stop further NO pollution spreading.
Table 4. O3 Concentrations and Representations
EPI Class Concentration Representation
Excellent 0-20 OZ1
Good 20-50 OZ2
Acceptable 50-70 0Z3
Alert 70-100 OZ4
Action >100 OZ5
Table 4 shows that O3 concentrations and its representations according to EPI classes. O3
concentration between 0 to 20 represented as OZ1 and classed as excellent and so on. Whereas,
O3 concentration greater than 100 is represented as OZ5 and classed as action, meaning
immediate action is required to stop further O3 pollution spreading.
The data of CO monitoring in Auckland region for this study can be represented as according to
EPIs classes: (Class ”CO1”: excellent (meaning, air quality is considered fantastic and no risk at
all to people), Class ” CO2”: good (meaning, air quality is considered satisfactory and there is
little to people health), Class ” CO3”: acceptable (meaning, air quality is acceptable, however,
there is risk to people health), Class ” CO4”: alert (meaning, air quality is not acceptable and
there is serious risk to people health), Class ” CO5”: action (meaning, air quality is deteriorating
and a quick response is required). NO2 five classes can be represented as: (Class ”NO1”: excellent
(meaning, air quality is considered fantastic and no risk at all to people), Class ” NO2”: good
(meaning, air quality is considered satisfactory and there is little to people health), Class ” NO3”:
acceptable (meaning, air quality is acceptable, however, there is risk to people health), Class ”
NO4”: alert (meaning, air quality is not acceptable and there is serious risk to people health),
Class ” NO5”: action (meaning, air quality is deteriorating and a quick response is required).
Lastly, O3 classes can be represented as: (Class ”OZ1”: excellent (meaning, air quality is
considered fantastic and no risk at all to people), Class ” OZ2”: good (meaning, air quality is
considered satisfactory and there is little to people health), Class ” OZ3”: acceptable (meaning, air
quality is acceptable, however, there is risk to people health), Class ” OZ4”: alert (meaning, air
quality is not acceptable and there is serious risk to people health), Class ” OZ5”: action
(meaning, air quality is deteriorating and a quick response is required).
According to 2010 data set, CO pollution concentrations was recorded from six stations
(Takapuna, Khyber Pass Road, Henderson, Pakuranga, Queen Street II and Glen Eden) and
consisted of 52560 one hourly observations. NO2 was recorded from 10 monitoring stations
(Khyber Pass Road, Musick Point, Penrose II (B), Takapuna, Henderson, Queen Street II, Glen
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
30
Eden, Patumahoe, Waiheke and Hellensville) and consisted of 87600 one hourly observations. O3
one hourly observations was recorded from four monitoring stations (Patumahoe, Whangaparaoa,
Musick Point, Waiheke) and consisted of one hourly 35040 observations.
OSSELM experiments for air pollution prediction on CO, NO2 and O3 hourly observations are
implemented in Weka 3.6.5 software. We run Weka on Windows 7 Enterprise with system
configuration Intel Core i5 processor (3.2GHz) with 4GB 1067 MHz DDR3 of RAM.
4.3 MODELLING PERFORMANCE EVALUATION CRITERIA
We have studied and compare the OSSELM experiments on the parameters like Mean Absolute
Error (MAE), Root Mean Square Error (RMSE), Correctly Classified Instances (CCI),
Incorrectly Classified Instances (ICI), Percentage Classification Accuracy (PCA) and Kappa
Statistics (KS). Mean Absolute Error is used to find out how close the predictions results were to
subsequent results. It is calculated based on the average of absolute errors in the predictions. To
measure the differences between the values predicted by a model or an estimator to the values
observed by changing the values in modelled or estimator are measured through Root Mean
Square Error. To measure the inter-agreement of the categorical items Kappa Statistics is used
i.e. it is an index which compares the agreement which could happen by chance. Kappa Statistics
consists of values ranges from plus one (+1) representing perfectly agreement via zero (0)
representing no agreement above that expected by chance, and negative one (-1) representing
complete disagreement. This is important to mention here that the prediction performance of
OSSELM in our experiments is based on whole data sets of CO, NO2 and O3 collected from
multiple monitoring stations distributed across Auckland region.
5. RESULTS AND DISCUSSIONS
In this section, we will record proposed OSSELM air pollution prediction performance against
SVM ensemble on missing, training, testing and then on unseen data. The OSSELM performance
is discussed briefly below.
5.1 OSSELM PERFORMANCE ON MISSING AND IMPUTED MISSING DATA
Air pollution data is mostly confronted with the problem of missing data. As air pollution data is
24/7 collected from multiple monitoring stations at various locations the major reasons for
missing values could be equipment failure, power outages, pollutant concentration level is below
than detection limits, direction of monitoring station or because of some other unknown reasons
[16]. Computational air pollution studies on missing data will not give us the comprehensive
knowledge of underlying problem. We argue that any computational method designed for air
pollution analysis and prediction should perform similarly on missing and complete data set. In
this regard, we test the performance of OSSELM method on missing data of CO, NO2 and O3 and
its results were compared with SVM ensemble. The comparative statistics of SVM ensemble and
OSSELM performances on missing data are shown in Table 5 and Table 6 respectively.
Table 5. SVM Ensemble Performance on Missing Data
Table 5 shows the SVM ensemble performance on missing data with bagging algorithm resulted
in percentage of classification accuracy (PCA) of 41.0701%. Mean absolute error recorded as
0.092 and root mean square error was 0.215. The kappa statistics value was 0.252. SVM
ensemble performance on missing data with adboostM1 algorithm resulted in percentage of
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
31
classification accuracy (PCA) of 79.215%. Mean absolute error recorded as 0.029 and root mean
square error was 0.215. The kappa statistics value was 0.252. Table.1.6 shows the OSSELM
performance on missing data with bagging algorithm resulted in percentage of classification
accuracy (PCA) of 79.215%. Mean absolute error recorded as 0.029 and root mean square error
was 0.169. The kappa statistics value was 0.739. OSSELM performance on missing data with
adboostM1 algorithm resulted in percentage of classification accuracy (PCA) of 78.88%. Mean
absolute error recorded as 0.093 and root mean square error was 0.190. The kappa statistics value
was 0.734.
To solve air pollution prediction problem multiple learners are trained in ensemble learning (EL).
Ensemble contains various base learners which results in much stronger generalization ability.
Bagging and boosting algorithms are implemented in SVM ensemble for spatio-temporal air
pollution with base classifiers such as Single Decision Tree (SDT) and Decision Stump (DS) with
their parameters. SDT along with bagging is used for classification of air pollutants and has many
features such as to exclude insignificant features, ability to deal with collinear and unbalanced
data [17]. DS along with boosting is used for classification of air pollutants and has a number
features such producing a decision tree with one single split, further the resulting tree can be
deployed to classify unseen data [10]. Various SVM ensemble model decisions on various data
chunk are aggregated by averaging method for final decision.
Table 6. OSSELM Performance on Missing Data
In Table 6 SVM ensemble model with bagging algorithm and SDT as a base classifier provides
the air pollution prediction model on missing data resulting in 41.701 percentage of accuracy.
SVM ensemble provides the high number of ICI even more than the CCI. This shows that SVM
ensemble is not suitable to conduct prediction on missing data or due to missing data SVM
ensemble performance is reduced. Furthermore, our previous experiments on SVM ensemble for
Spatio-temporal air pollution prediction in chapter 5 resulted that SVM ensemble with bagging
algorithm suitable when the number of observations is not too large, for this experiment the
number of observations is 175199. The Kappa Statistic value for SVM ensemble on bagging
algorithm is far from 1 (i.e. 0.2521), which indicates that SVM ensemble with bagging algorithm
along with SDT does not provide the perfect agreement for classification on air pollution
pollutants concentrations. SVM ensemble based on adaBoostM1 algorithm provides better results
on missing air pollution data compare to bagging having high percentage model accuracy of
correctly classified instances of 79.215 percentage.
OSSELM consists of bagging and adaboostM1 algorithms for spatio-temporal air pollution
prediction with base classifiers such as REP Tree (RT), Random Tree (RandT), NB Tree (NT),
J48, Decision Stump (DS) along with their parameters. In OSSELM base classifiers such as
Random Tree (RandT), REP Tree (RT) and NB Tree (NT) are used with bagging algorithm for air
pollution prediction and OSSELM performance based on bagging algorithm is recorded. RANDT
is extensively used for classification, it takes the input feature vector and classifies it with the tree
in forest and outputs the class label that has the majority of vote [18]. RT is an ensemble learning
classifier and creates various individual learners consisted of multiple trees and selects the best
tree from all the trees generated. It uses bagging concept and produces random set of data to
construct decision tree [19]. NB is considered as best classification method in prediction tasks
[20].
Further, in OSSELM base classifiers such as J48, Decision Stump and NB Tree are used with
adBoostM1 algorithm for air pollution prediction (do you want to mention here that best results
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
32
were obtained these classifiers) and OSSELM performance based on adaBoostM1 algorithm is
recorded. J48 is a predictive base classifier, it targets the value of a new sample based on the
discrete attribute values of the given sample data. One of the key benefit of using J48 classifier is
that it cut down the search time for the sorted elements [21]. Various SVM ensemble model
decisions in OSSELM on various data chunk are aggregated by majority of voting method for
final decision.
From Table 6 comparing the performance of OSSELM with SVM ensemble we can say the
OSSELM prediction based on bagging algorithm performed well having accuracy of 79.215
percentage compare to SVM ensemble which is 41.701 percentage. However, OSSELM
prediction based on bagging and adaBoostM1 algorithms is considerably same on missing air
pollution data i.e.79.215 percentage and 78.88 respectively. It was noticed that SVM ensemble
based on adaBoostM1 algorithms resulted in better model accuracy of 79.215 percentage, almost
same accuracy of 79.215 percentage and 78.88 respectively with OSSELM with bagging and
adBoostM1 algorithms. However, it is noticeable that the Kappa Statistics value for SVM
ensemble based on bagging and adaBoostM1 algorithms is 0.2521 compare to 0.7349 of
OSSELM with bagging and adaBoostM1 algorithms, it indicates that the value of OSSELM is
much closer to 1 (0.7349), which further indicates that perfect agreement for classification of data
items is only possible with OSSELM. The above descriptive evaluation statistics Figure 3 shows
that the performance of OSSELM on missing data is noticeable enhanced the accuracy of air
pollution prediction on various pollutants classes and performed relatively better than SVM
ensemble.
Figure 3. OSSELM vs SVM Ensemble Performance on Missing data
To further validate OSSELM prediction accuracy and other evaluation measures, we used Series
Mean Method (SMM) for imputation of missing data values for air pollution concentrations of
CO, NO2 and O3 as shown in Table 7.
13. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
33
Table 7. Imputation of Missing data with Series Mean Method (SMM)
Table 7 is about the imputation of missing data with series mean method (SMM). Table.1.7
shows the missing values of air pollutant concentrations which are imputed with SMM. The
percentage of missing values for CO, NO2 and O3 recorded as 2.8 percent, 4.4 percent and 2.1
percent respectively. The observed mean (ObsMean), observed median (ObsMed) and observed
standard deviation are 16.642, 8.627 and 19.007 respectively. The imputed values for CO, NO2
and O3 recorded slightly changes in the mean, median and standard deviation indicating the
imputed pollutant concentrations values recorded well by the SMM. In this section, we highlight
the performance evaluation of SVM ensemble and OSSELM on missing data, in the next section
we will assess OSSELM and SVM ensemble model prediction performances on imputed missing
data of CO, NO2 and O3.
The available data of CO, NO2 and O3 consisted of 175200 observations. The observations are
separated into training and testing data by using filters in Weka software. The size of the training
data is set to 70 percent i.e. 122640 observations of the available data, whilst the test set
contained 30 percent i.e. 52560 of the observations. The proposed OSSELM and SVM ensemble
are trained and tested using the same exact data to allow for paired comparisons.
5.2 OSSELM PERFORMANCE ON TRAINING DATA
The comparative statistics of SVM ensemble and OSSELM on training data are shown in Table 8
and Table 9 respectively.
Table 8. SVM Ensemble Performance on Training Data
Table 8 shows the SVM ensemble performance on training data with bagging algorithm resulted
in percentage of classification accuracy (PCA) of 71.004%. Mean absolute error recorded as
0.275 and root mean square error was 0.368. The kappa statistics value was 0.495. SVM
ensemble performance on training data with adboostM1 algorithm resulted in percentage of
classification accuracy (PCA) of 70.677%. Mean absolute error recorded as 0.348 and root mean
square error was 0.389. The kappa statistics value was 0.49.
Table 9. OSSELM Performance on Training Data
Table 9 shows the OSSELM performance on training data with bagging algorithm resulted in
percentage of classification accuracy (PCA) of 80.587%. Mean absolute error recorded as 0.129
14. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
34
and root mean square error was 0.348. The kappa statistics value was 0.688. OSSELM
performance on training data with adboostM1 algorithm resulted in percentage of classification
accuracy (PCA) of 80.556%. Mean absolute error recorded as 0.194 and root mean square error
was 0.298. The kappa statistics value was 0.687.
SVM ensemble with bagging algorithm provided the highest percentage of classification accuracy
i.e. 71.004 compare to SVM ensemble with adaBoostM1 algorithm of 70.6774 percent.
Experiment results show that OSSELM provided the highest percentage of classification accuracy
both with bagging and adaBoostM1 algorithms i.e. 80.587 and 80.556 respectively.
The number of incorrectly identified instances is high in SVM ensemble both with bagging and
adaBoostM1 algorithms resulting in lower value of Kappa Statistic value (i.e. 0.495 and 0.490
respectively). However, OSSELM comparatively perform well to SVM ensemble resulting in
lower number of incorrectly identified instances. The Kappa Statistic value for OSSELM for
bagging and adaBoostM1 is closer to 1 (i.e. 0.688 and 0.687 respectively) which indicates
OSSELM resulted in perfect agreement for classification of various classes of CO, NO2 and O3
concentrations.
Mean absolute error for SVM ensemble with bagging is lower compare with SVM ensemble
with adaBoostM1 algorithm i.e. 0.275 and 0.348 respectively. Comparing mean absolute values
of OSSELM with bagging and adaBoostM1 algorithms resulted in lowest i.e. 0.129 and 0.194
respectively. This indicates that OSSELM with bagging and adaBoostM1 algorithms provided
closer predictions for air pollution observations and resulted in lower root mean square error to
SVM ensemble.
The above performance evaluation results clearly indicate as shown in Figure 4 that OSSELM
outperform SVM ensemble on statistical ground resulting in better prediction results for our
analysis. Now we will conduct experiments on the unseen testing data and will record statistics
for both OSSELM and SVM ensemble.
Figure 4. OSSELM vs SVM Ensemble Performance on Training data
5.3 OSSELM PERFORMANCE ON TESTING DATA
The testing data of CO, NO2 and O3 consisted of 52560 observations i.e. 30 percent of 175200
observations. The prediction performance of OSSELM and SVM ensemble is tested on these
15. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
35
unseen observations. The comparative statistics of OSSELM and SVM ensemble on testing data
are shown in Table 10 and Table 11.
Table 10. SVM Ensemble Performance on Testing Data
Table 10 shows the SVM ensemble performance on testing data with bagging algorithm resulted
in percentage of classification accuracy (PCA) of 71.362%. Mean absolute error recorded as
0.270 and root mean square error was 0.366. The kappa statistics value was 0.490. SVM
ensemble performance on testing data with adboostM1 algorithm resulted in percentage of
classification accuracy (PCA) of 71.362%. Mean absolute error recorded as 0.270 and root mean
square error was 0.366. The kappa statistics value was 0.499.
Table 11. OSSELM Performance on Testing Data
Table 11 shows the OSSELM performance on testing data with bagging algorithm resulted in
percentage of classification accuracy (PCA) of 80.673%. Mean absolute error recorded as 0.131
and root mean square error was 0.337. The kappa statistics value was 0.490. OSSELM
performance on training data with adboostM1 algorithm resulted in percentage of classification
accuracy (PCA) of 80.453%. Mean absolute error recorded as 0.193 and root mean square error
was 0.298. The kappa statistics value was 0.683.
Table 10 shows that SVM ensemble percentage of accuracy on testing data with bagging and
adaBoostM1 algorithms is same i.e.71.362 respectively. How- ever, OSSELM percentage of
accuracy is recorded consistent as 80.673 and 80.453 with bagging and adaBoostM1 algorithms
as shown in Table 11, which is better than SVM ensemble performance. Kappa statistic value for
SVM ensemble with bagging and adaBoostM1 algorithms for both recorded as 0.499, resulting
in far from 1. Whereas, OSSELM value for Kappa Statistic with bagging and adaBoostM1
algorithm is recorded as 0.687 and 0.683 respectively, which is closer to 1 resulting in perfect
agreement for classification of various classes of CO, NO2 and O3 concentrations. OSSELM
values for mean absolute error recorded lowest i.e. 0.131 and 0.193 respectively, for bagging and
adaBoostM1 algorithms in contrast to SVM ensemble values. Hence, OSSELM with bagging
and adaBoostM1 algorithms provided closer predictions for air pollution observations. Similarly,
OSSELM values for root mean square error recorded lowest to SVM ensemble.
Based on the above statistics results described Figure 5 clearly indicates that OSSELM
outperform SVM ensemble in classification accuracy and other statistical performance measures.
The proposed OSSELM successfully predicted the various classes of CO, NO2 and O3.
16. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
36
Figure 5. OSSELM vs SVM Ensemble Performance
6. CONCLUSION
Air pollution data is often collected from multiple monitoring stations located at various regions.
The size of data of each monitoring stations varies. Air pollution prediction for a data which is
distributed across various regions is a challenging task. The proposed OSSELM has the potential
of merging knowledge of multiple air pollution monitoring stations distributed across various
regions. Local SVMs are modelled on spatial and temporal dimensions, respectively. Further, for
future air pollution prediction, the knowledge from various aspects of SVMs is aggregated. The
problem of spatio-temporal air pollution prediction based on CO, NO2 and O3 is deployed with
OSSELM and SVM ensemble method and their performances are recorded. The overall results of
OSSELM are shown in Table 12 and the overall SVM ensemble results are shown in Table 13.
Table 12. Overall OSSELM Performance
Bagging AdaBoostM1
Missing Training Testing Missing Training Testing
MAE 0.029 0.129 0.131 0.093 0.194 0.193
RMSE 0.169 0.348 0.337 0.19 0.298 0.298
KS 0.739 0.688 0.687 0.734 0.687 0.683
PCA 79.215 80.587 80.673 78.881 80.556 80.453
Table 13. Overall SVM Ensemble Performance
Bagging AdaBoostM1
Missing Training Testing Missing Training Testing
MAE 0.092 0.275 0.27 0.029 0.348 0.270
RMSE 0.215 0.368 0.366 0.169 0.389 0.366
KS 0.252 0.495 0.499 0.2521 0.49 0.499
PCA 41.701 71.004 71.362 79.215 70.677 71.362
By comparing the overall results of OSSELM in Table 12 with overall results of SVM ensemble
in Table 13 it is quite evident that the proposed OSSELM has strong capability of classifying
spatio-temporal air pollution problem with missing, training and testing data of various stations
with bagging and adboostM1 algorithms.
17. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
37
This eliminates the limitation of some of the proposed models in spatio-temporal domains where
missing data creates a question mark on the validity of the performance of model. The proposed
OSSELM performance recorded same with missing and imputed missing data. The main
objectives of the proposed model achieved by; (i) successfully constructing ensemble learning
based classification of various classes of CO, NO2 and O3; (ii) scalable SVM models based on
ensemble learning are developed and their performances are evaluated statistically and compared
with SVM ensemble approach as a benchmark, (iii) the proposed OSSELM resulted in
compressive air pollution prediction for whole region, which in our case is Auckland, moreover,
most of the environmental authorities i.e. Auckland Regional Council (ARC) and The National
Institute of Water and Atmospheric Research (NIWA), are interested in knowing the air quality
status in whole region not to a specific location. With this we can conclude, that the proposed
OSSELM performed well in different conditions i.e. with missing data and without missing data,
and can be used by environmental authorities as a tool in air quality prediction and further
analysis.
ACKNOWLEDGEMENTS
The authors would like to thanks Sreenivas Sremath Tirumala for his technical support.
REFERENCES
[1] Xue, Y., Tian, H., Yan, J., Zhou, Z., Wang, J., Nie, L., . . . others (2016). Temporal trends and spatial
variation characteristics of primary air pollutants emissions from coal-fired industrial boilers in
beijing, china. Environmental Pollution, 213, 717-726.
[2] Soeren Mattke, P. S. J. H. M. L. G. L., Edward Kelley. (2006). Health care quality indicators project:
Initial indicators report, oecd health working papers no. 22. Organisation for Economic Co-operation
and Development (OECD), 1-152.
[3] Wang, Q., & Zhang, L. (2010). Ensemble learning based on multi-task class labels. In Advances in
knowledge discovery and data mining (Vol. 6119, p. 464-475).
[4] Dietterich, T. (2000). Ensemble methods in machine learning. In Multiple classifier systems (Vol.
1857, p. 1-15). Springer Berlin/Heidelberg.
[5] Brown, G. (2009). Ensemble learning.
[4] Tamura, H., & Kondo, T. (1980). Heuristics free group method of data handling algorithm of
generating optimal partial polynomials with application to air pollution prediction. International
Journal of Systems Science, 11(9), 1095-1111.
[5] Shaban, K. B., Kadri, A., & Rezk, E. (2016, April). Urban air pollution monitoring system with
forecasting models. IEEE Sensors Journal, 16(8), 2598-2606.
[6] Perera, F. P. (2017). Multiple threats to child health from fossil fuel combustion: Impacts of air
pollution and climate change. Environmental Health Perspectives, 125(2), 141.
[7] Zhou, J., Ji, Y., Qiao, Z., Si, F., & Xu, Z. (2013, July). Nitrogen oxide emission modeling for boiler
combustion using accurate online support vector regression. In 2013 10th international conference on
fuzzy systems and knowledge discovery (fskd) (p. 989-993).
[8] H. Li, Z., & Yang, J. (2010, July). Pm-25 forecasting use reconstruct phase space ls-svm. In 2010 the
2nd conference on environmental science and information application technology (Vol. 1, p. 143-
146).
[9] Khedo, K. K., Perseedoss, R., Mungur, A., et al. (2010). A wireless sensor network air pollution
monitoring system. arXiv preprint arXiv:1005.1737.
[10] Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air
quality using ensemble learning methods. Atmospheric Environment, 80, 426–437.
[11] Song, L., Pang, S., Longley, I., Olivares, G., & Sarrafzadeh, A. (2014, July). Spatiotemporal pm2.5
prediction by spatial data aided incremental support vector regression. In 2014 international joint
conference on neural networks (ijcnn) (p. 623630).
18. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
38
[12] Sotomayor-Olmedo, A., Aceves-Fernandez,´ M. A., Gorrostieta-Hurtado, E., Pedraza-Ortega, C.,
Ramos-Arregu´ın, J. M., & Vargas-Soto, J. E. (2013). Forecast urban air pollution in mexico city by
using support vector machines: a kernel performance approach.
[13] Christy, S., & Khanaa, V. (2016, February). Impacts of ambient air quality data analysis in urban and
industrial area helps in policy making. International Journal on Recent Innovation Trends in
Computing and Communication, 153-157.
[14] Oliveira, M., & Torgo, L. (2014). Ensembles for time series forecasting. In Acml.
[15] Osowski, S., & Garanty, K. (2007). Forecasting of the daily meteorological pollution using
wavelets and support vector machine. Engineering Applications of Artificial Intelligence, 20(6), 745-
755.
[16] Carslaw, D. C., & Ropkins, K. (2012). Openair an r package for air quality data analysis.
Environmental Modelling & Software, 27, 52-61.
[17] Coops, N. C., Waring, R. H., Beier, C., Roy-Jauvin, R., & Wang, T. (2011). Modeling the occurrence
of 15 coniferous tree species throughout the pacific northwest of north america using a hybrid
approach of a generic process-based growth model and decision tree analysis. Applied Vegetation
Science, 14(3), 402-414.
[18] Sewaiwar, P., & Verma, K. K. (2015). Comparative study of various decision tree classification
algorithm using weka.
[19] Kalmegh, S. (2015). Analysis of weka data mining algorithm reptree, simple cart and random tree for
classification of indian news. International Journal of Innovative Science, Engineering and
Technology, 2(2), 438-46.
[20] Vadhanam, B. R. J., Mohan, S., Ramalingam, V., & Sugumaran, V. (2016). Performance comparison
of various decision tree algorithms for classification of advertisement and non advertisement videos.
Indian Journal of Science and Technology, 9(48).
[21] Luan, C.,& Dong, G. (2017). Experimental identification of hard data sets for classification and
feature selection methods with insights on method selection. arXivpreprint arXiv:1703.08283.
[22] Ali, S., Tirumala, S. S., & Sarrafzadeh, A. (2014, Dec). Svm aggregation modelling for spatio-
temporal air pollution analysis. In 17th ieee international multi topic conference 2014 (p. 249-254).
AUTHORS
Shahid Ali
Shahid obtained his Master in Computer Sciences from the Auckland University of
Technology, Auckland in 2006. He got industry experience as an IT Analyst and
Business Analyst. He is a part time lecturer in Computer Sciences Department at
Unitec. He is in his final year Doctorate having thesis entitled, “SVM Aggregation
modeling for spatio-temporal air pollution analysis”. He also hold some Microsoft
and Cisco certifications i.e. MCP, MCSA, MCSE, CCNA, CCNP, CCDP.
http://dmli.info/index.php/member.html
Simon Dacey
Dr. Simon Dacey is full time Computer Sciences lecturer at Unitec Institute of
Technology and worked in software development for 17 years on applications as
diverse as a firing control system for anti-submarine warfare and a database
application for the management of wetland resources in Indonesia. He obtained his
MSc in Applied Remote Sensing from the University of Cranfield, UK (1995).
From 1998 to 2002 he worked as a lecturer at UCOL in Palmerston North, North
Zealand. Since January 2002, he has been a full-time lecturer at the Department of
Computing, Unitec Institute of Technology. His research interests include Geographical Information
Systems, Remote Sensing, Digital Image Processing, Geographical Positioning Systems, and Database
Management Systems.
http://www.unitec.ac.nz/about-us/contact-us/staff-directory/simon-dacey