SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Predicting rainfall using ensemble of Ensembles.∗†
Prolok Sundaresan, Varad Meru, and Prateek Jain‡
University of California, Irvine
{sunderap,vmeru,prateekj}@uci.edu
Abstract
Regression is an approach for modeling the relationship between data X
and the dependent variable y. In this report, we present our experiments
with multiple approaches, ranging from Ensemble of Learning to Deep
Learning Networks on the weather modeling data to predict the rainfall.
The competition was held on the online data science competition portal
‘Kaggle’. The results for weighted ensemble of learners gave us a top-10
ranking, with the testing root-mean-squared error being 0.5878.
1 Introduction
The task of this in-class Kaggle competition was to predict the amount of rainfall
at a particular location using satellite data. We wanted to try various algorithms
and ensembles for regression to experiment and learn. The report is structured
in the following manner. The section 2 describes the dataset contents and the
latent structure found using latent variable analysis and clustering. This was
done by Prolok and Prateek. The section 3 describes various models used in
the project in detail. The Neural Network/Deep Learning section was done
by Varad. Random Forests was done by Prolok and Prateek. The work on
Gradient Boosting was done by Prateek and Varad. The section 4 described
the ensemble of ensembles technique used by us. The ensemble sits on top of
different ensembles and learners which were done in section 3. The work on the
final ensemble was done by all the three members. The section 5 presents our
learning and conclusion.
2 Understanding The Data
Visualizing the data was a difficult task since the data was in 91 dimensions.
In order to look for patterns in the data and visualized it, we applied SVD
technique to reduce the dimensionality of the features to 2 principle dimensions.
Then we applied k means clustering with k=5 on the data with 91 dimensions
∗The online competition is available at the Kaggle website https://inclass.kaggle.com/
c/how-s-the-weather. The name of the team was skynet
†This work was does as a part of the project for CS 273: Machine Learning, Fall 2014,
taught by Prof. Alexander Ihler.
‡Prolok Sundaresan: Student# 66008474, Varad Meru: Student# 26648958, Prateek Jain:
Student# 28321844
1
and plotted the assignments in the 2 dimensional transformed feature space.
We saw patterns in the data. Especially some points were densely clustered and
some were sparse.
To visualize it better, we transformed the feature in 3 dimensional space,
with the first 3 principle components, and saw that the points were clustered
around 3 planes.
Figure 1: Visualizing the data in 3 dimensions
3 Machine Learning Models
3.1 Mixture of Experts
As seen from our visualization in Figure 1, we could identify two highly dense
areas of the feature data on either side of a region of sparsely distributed data.
The idea behind using the mixture of experts approach was, that intuitively, it
would be difficult for a single regressor to fit the dataset, since the distribution
is non-uniform. We decided to split the data into clusters. To cluster the data,
we used several initialization of the k means algorithm with the kmeans++. We
used number of clusters as one of the parameters of our model which we tried
to change.
Since each of the clusters got a subset of a points from the original dataset,
number of data points per cluster was not a very large number. Our concern
with this was that any model we chose would overfit the data in its cluster.
Therefore, we used the ensemble method of gradient boosting for each of the
clusters. Since, in gradient boosting, we start with an underfitting model and
2
(a) Cluster assignments of Data Points
(b) Mixture of Experts Error
Figure 2: Visualizing the principle components of Data
3
then gradually add complexity, the chances of overfitting would be less in this
model. We decided to use Decision stumps as our regressors for the boosting
algorithm.
For evaluating the prediction for the validation split and the test data, we
first check which cluster the data point belongs to. We did this, by creating a K
nearest neighbor classifier on the center of the 3 clusters created in the previous
step. Then, the classifier predicts the cluster assignment for each test point,
and we use the array of boosting regressors corresponding to that cluster on the
data point, to get its corresponding prediction.
The parameters of the model we modified were the number of clusters and
the number of regressors used for boosting. We found that though the test error
reduced considerably on increasing the regressors for boosting, the validation
error increased after a certain point as can be seen from Figure 3. We got
minimum validation error for 700 regressors.
3.2 Neural Networks
We implemented various types of neural networks, ranging from single layer
networks to 3-layer sigmoidal neural networks.
Single Layer Network
Figure 3: Single Layer Architecture.
We build the neural network using the MATLAB’s Neural-Network-Toolkit
and PyBrain library implemented in Python. For the MATLAB implementa-
tion, there were various runs made for different number of neurons in the hidden
layer. The architecture of the neural network can be seen in Figure 3. The Fig-
ure 4 show the train-test-validation plots for different network architectures.
The dataset was distributed into 70% (Training), 20% (Validation) and 10%
(Testing) section for the neural network to run. The subsection 3.4 shows the
performance of the models learned. It was seen that the neural networks started
to overfit as the number of neurons were increased more than 40.
# of Neurons Training Error (RMSE) Testing Error (RMSE)
10 0.5986 0.61341
20 0.5875 0.61301
50 0.5852 0.62889
Table 1: RMSE Error rates for different network architectures.
It was observed that the learner could not learn very accurately as the data
a lot as the data was not much for the neural network to learn on.
4
(a) Train-Validation-Test error plot for 10
neuron hidden layer
(b) Error distribution histogram for 10 neuron
hidden layer
(c) Train-Validation-Test error plot for 20
neuron hidden layer
(d) Error distribution histogram for 20 neuron
hidden layer
(e) Train-Validation-Test error plot for 50
neuron hidden layer
(f) Error distribution histogram for 50 neuron
hidden layer
Figure 4: Plots of various Train-Validation-Test error for number of neurons =
[10, 20, 50]
5
Deep Networks
For this project, we tried using deep networks as well. The deep network was
made using PyBrain. We tried using different activation functions and archi-
tectures to understand how deep networks would work. The architecture shown
in Figure 5 had 3 layers - visible later contains 91 neurons, the first hidden
layer (tanh) had 91 neurons, the second hidden layer (sigmoid) had 50 neu-
rons, the third hidden layer (sigmoid) had 20 neurons, and the output layer
had 1 linear node. The testing error was 0.83643 was very high compared to
other approaches. We concluded that the network was learning the data well,
but was overfitting.
Input
layer
Hidden layer
(Hyperbolic
Tangent)
Hidden
layer(Sigmoid)
y1
y2
y3
Output
layer
3.3 Gradient Boosting
In parallel, we worked on training the gradient boosting model with varying
parameters to get the best fit for the data. We started with basic decision
stumps with number of regressors ranging from 1 to 2000. We also varied the
maximum Depth for the decision tree used as the regression model from 3 to 7.
We used alpha 0.9 for our algorithm. We observed that we got best performance
with 2000 boosters and depth as 7.
3.4 Random Forests
Several aspects of Random Forest technique was explored. The major funda-
mental behind Random Forest is to take a model, that overfits, the data, then
use feature and data bagging to bring down the complexity to fit the data bet-
ter. The usual model that is used in Random Forest is a high depth Regression
tree. We tried to explore other models, that overfitted the data.
The first option was to consider simple linear regression with feature trans-
formation. The data from X1 was transformed into X1 and X12
features and
6
Figure 5: Train and Test error plot for Gradient Boosting vs number of learners
linear regression was done on that. Significantly better results were obtained in
this transformation( a test error of 0.4322 compared to 0.4181) , but it signifi-
cantly worsened with an addition of X13
features to the feature list. This was
used as the regressor for the Random Forests, but the results were better for
a Tree Regressor. The major take away from this analysis was the use of X22
features into the feature list for tree regression. Several other regressors were
also tried like knn regressor was used, but tree regressor came out on top.
Since Decision Tree regression was significantly better than linear regression
in Random Forest, we decided to proceed with that with the X22
features also
in place(a total of 182 features). nFeatures was chosen as 150, and the depth was
set as 13,14,15,16,17, of which a maxDepth of 14 obtained optimal performance.
150 decision trees were learned and the optimum results were obtained for 90
learners.
Learner Training Error (MSE) Testing Error (MSE)
Linear Regressor 0.4068 0.4243
Linear Regressor with X12
feature 0.3996 0.4140
Tree Regressor 0.1951 0.3822
Table 2: MSE Error rates for Random Forests
4 Ensemble of all Learners
At the end, since we trained a lot of learners separately, some of which were
ensembles themselves, we thought of aggregating the results of the learners
to improve our prediction.We also analyzed the variance between the results
of our learners, and an average variance of 0.0204 was obtained. Since the
7
variance was noticeable, a weighted average aggregation of the results seemed
the best approach. We chose the model parameters for the best performing
models from each category to get a consolidated result. The section 4 shows
the architecture of our ensember. Initially, we chose a very simple approach of
assigning all models with the same weights to get a prediction. We got a some
improvement with MSE of 0.5908. We, saw that this was performing just below
our best individual prediction model. So, we decided to bump the weight of our
best learner in the ensemble. This helped improve our accumulated prediction,
providing an MSE of 0.5878.
Figure 6: Ensemble of Learners
5 Conclusion
This project gave a us glimpse on how machine learning techniques are applied
to real world problems. We applied a variety of techniques including neural
networks, decision trees, random forests, gradient boosting, kmeans clustering,
and PCA. Testing out various parameters of the different learner types helped us
identify where each of the models under-fitted and over-fitted the data. Finally,
while modifying the parameters of each model helped us reduce the variance in
the models, we used a final weighted ensemble of various learners to reduce the
bias of individual learners.
8

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Milind Bhandarkar
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageNilesh Salpe
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Papersameiralk
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleDataWorks Summit
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...Spark Summit
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesTilak Patidar
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O Sri Ambati
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 

Was ist angesagt? (20)

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Paper
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 

Ähnlich wie Predicting rainfall using ensemble of ensembles

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationIJCSEA Journal
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classificationComparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classificationAlexander Decker
 
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Novel algorithms for  Knowledge discovery from neural networks in Classificat...Novel algorithms for  Knowledge discovery from neural networks in Classificat...
Novel algorithms for Knowledge discovery from neural networks in Classificat...Dr.(Mrs).Gethsiyal Augasta
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sineijcsa
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataIRJET Journal
 
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...Alexander Decker
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptxssuser31398b
 

Ähnlich wie Predicting rainfall using ensemble of ensembles (20)

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
debatrim_report (1)
debatrim_report (1)debatrim_report (1)
debatrim_report (1)
 
F017533540
F017533540F017533540
F017533540
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classificationComparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
 
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Novel algorithms for  Knowledge discovery from neural networks in Classificat...Novel algorithms for  Knowledge discovery from neural networks in Classificat...
Novel algorithms for Knowledge discovery from neural networks in Classificat...
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
 
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 

Mehr von Varad Meru

Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningVarad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemVarad Meru
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...Varad Meru
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An OverviewVarad Meru
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Varad Meru
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Varad Meru
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Varad Meru
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project GuidanceVarad Meru
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducationVarad Meru
 

Mehr von Varad Meru (16)

Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
 

Kürzlich hochgeladen

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Predicting rainfall using ensemble of ensembles

  • 1. Predicting rainfall using ensemble of Ensembles.∗† Prolok Sundaresan, Varad Meru, and Prateek Jain‡ University of California, Irvine {sunderap,vmeru,prateekj}@uci.edu Abstract Regression is an approach for modeling the relationship between data X and the dependent variable y. In this report, we present our experiments with multiple approaches, ranging from Ensemble of Learning to Deep Learning Networks on the weather modeling data to predict the rainfall. The competition was held on the online data science competition portal ‘Kaggle’. The results for weighted ensemble of learners gave us a top-10 ranking, with the testing root-mean-squared error being 0.5878. 1 Introduction The task of this in-class Kaggle competition was to predict the amount of rainfall at a particular location using satellite data. We wanted to try various algorithms and ensembles for regression to experiment and learn. The report is structured in the following manner. The section 2 describes the dataset contents and the latent structure found using latent variable analysis and clustering. This was done by Prolok and Prateek. The section 3 describes various models used in the project in detail. The Neural Network/Deep Learning section was done by Varad. Random Forests was done by Prolok and Prateek. The work on Gradient Boosting was done by Prateek and Varad. The section 4 described the ensemble of ensembles technique used by us. The ensemble sits on top of different ensembles and learners which were done in section 3. The work on the final ensemble was done by all the three members. The section 5 presents our learning and conclusion. 2 Understanding The Data Visualizing the data was a difficult task since the data was in 91 dimensions. In order to look for patterns in the data and visualized it, we applied SVD technique to reduce the dimensionality of the features to 2 principle dimensions. Then we applied k means clustering with k=5 on the data with 91 dimensions ∗The online competition is available at the Kaggle website https://inclass.kaggle.com/ c/how-s-the-weather. The name of the team was skynet †This work was does as a part of the project for CS 273: Machine Learning, Fall 2014, taught by Prof. Alexander Ihler. ‡Prolok Sundaresan: Student# 66008474, Varad Meru: Student# 26648958, Prateek Jain: Student# 28321844 1
  • 2. and plotted the assignments in the 2 dimensional transformed feature space. We saw patterns in the data. Especially some points were densely clustered and some were sparse. To visualize it better, we transformed the feature in 3 dimensional space, with the first 3 principle components, and saw that the points were clustered around 3 planes. Figure 1: Visualizing the data in 3 dimensions 3 Machine Learning Models 3.1 Mixture of Experts As seen from our visualization in Figure 1, we could identify two highly dense areas of the feature data on either side of a region of sparsely distributed data. The idea behind using the mixture of experts approach was, that intuitively, it would be difficult for a single regressor to fit the dataset, since the distribution is non-uniform. We decided to split the data into clusters. To cluster the data, we used several initialization of the k means algorithm with the kmeans++. We used number of clusters as one of the parameters of our model which we tried to change. Since each of the clusters got a subset of a points from the original dataset, number of data points per cluster was not a very large number. Our concern with this was that any model we chose would overfit the data in its cluster. Therefore, we used the ensemble method of gradient boosting for each of the clusters. Since, in gradient boosting, we start with an underfitting model and 2
  • 3. (a) Cluster assignments of Data Points (b) Mixture of Experts Error Figure 2: Visualizing the principle components of Data 3
  • 4. then gradually add complexity, the chances of overfitting would be less in this model. We decided to use Decision stumps as our regressors for the boosting algorithm. For evaluating the prediction for the validation split and the test data, we first check which cluster the data point belongs to. We did this, by creating a K nearest neighbor classifier on the center of the 3 clusters created in the previous step. Then, the classifier predicts the cluster assignment for each test point, and we use the array of boosting regressors corresponding to that cluster on the data point, to get its corresponding prediction. The parameters of the model we modified were the number of clusters and the number of regressors used for boosting. We found that though the test error reduced considerably on increasing the regressors for boosting, the validation error increased after a certain point as can be seen from Figure 3. We got minimum validation error for 700 regressors. 3.2 Neural Networks We implemented various types of neural networks, ranging from single layer networks to 3-layer sigmoidal neural networks. Single Layer Network Figure 3: Single Layer Architecture. We build the neural network using the MATLAB’s Neural-Network-Toolkit and PyBrain library implemented in Python. For the MATLAB implementa- tion, there were various runs made for different number of neurons in the hidden layer. The architecture of the neural network can be seen in Figure 3. The Fig- ure 4 show the train-test-validation plots for different network architectures. The dataset was distributed into 70% (Training), 20% (Validation) and 10% (Testing) section for the neural network to run. The subsection 3.4 shows the performance of the models learned. It was seen that the neural networks started to overfit as the number of neurons were increased more than 40. # of Neurons Training Error (RMSE) Testing Error (RMSE) 10 0.5986 0.61341 20 0.5875 0.61301 50 0.5852 0.62889 Table 1: RMSE Error rates for different network architectures. It was observed that the learner could not learn very accurately as the data a lot as the data was not much for the neural network to learn on. 4
  • 5. (a) Train-Validation-Test error plot for 10 neuron hidden layer (b) Error distribution histogram for 10 neuron hidden layer (c) Train-Validation-Test error plot for 20 neuron hidden layer (d) Error distribution histogram for 20 neuron hidden layer (e) Train-Validation-Test error plot for 50 neuron hidden layer (f) Error distribution histogram for 50 neuron hidden layer Figure 4: Plots of various Train-Validation-Test error for number of neurons = [10, 20, 50] 5
  • 6. Deep Networks For this project, we tried using deep networks as well. The deep network was made using PyBrain. We tried using different activation functions and archi- tectures to understand how deep networks would work. The architecture shown in Figure 5 had 3 layers - visible later contains 91 neurons, the first hidden layer (tanh) had 91 neurons, the second hidden layer (sigmoid) had 50 neu- rons, the third hidden layer (sigmoid) had 20 neurons, and the output layer had 1 linear node. The testing error was 0.83643 was very high compared to other approaches. We concluded that the network was learning the data well, but was overfitting. Input layer Hidden layer (Hyperbolic Tangent) Hidden layer(Sigmoid) y1 y2 y3 Output layer 3.3 Gradient Boosting In parallel, we worked on training the gradient boosting model with varying parameters to get the best fit for the data. We started with basic decision stumps with number of regressors ranging from 1 to 2000. We also varied the maximum Depth for the decision tree used as the regression model from 3 to 7. We used alpha 0.9 for our algorithm. We observed that we got best performance with 2000 boosters and depth as 7. 3.4 Random Forests Several aspects of Random Forest technique was explored. The major funda- mental behind Random Forest is to take a model, that overfits, the data, then use feature and data bagging to bring down the complexity to fit the data bet- ter. The usual model that is used in Random Forest is a high depth Regression tree. We tried to explore other models, that overfitted the data. The first option was to consider simple linear regression with feature trans- formation. The data from X1 was transformed into X1 and X12 features and 6
  • 7. Figure 5: Train and Test error plot for Gradient Boosting vs number of learners linear regression was done on that. Significantly better results were obtained in this transformation( a test error of 0.4322 compared to 0.4181) , but it signifi- cantly worsened with an addition of X13 features to the feature list. This was used as the regressor for the Random Forests, but the results were better for a Tree Regressor. The major take away from this analysis was the use of X22 features into the feature list for tree regression. Several other regressors were also tried like knn regressor was used, but tree regressor came out on top. Since Decision Tree regression was significantly better than linear regression in Random Forest, we decided to proceed with that with the X22 features also in place(a total of 182 features). nFeatures was chosen as 150, and the depth was set as 13,14,15,16,17, of which a maxDepth of 14 obtained optimal performance. 150 decision trees were learned and the optimum results were obtained for 90 learners. Learner Training Error (MSE) Testing Error (MSE) Linear Regressor 0.4068 0.4243 Linear Regressor with X12 feature 0.3996 0.4140 Tree Regressor 0.1951 0.3822 Table 2: MSE Error rates for Random Forests 4 Ensemble of all Learners At the end, since we trained a lot of learners separately, some of which were ensembles themselves, we thought of aggregating the results of the learners to improve our prediction.We also analyzed the variance between the results of our learners, and an average variance of 0.0204 was obtained. Since the 7
  • 8. variance was noticeable, a weighted average aggregation of the results seemed the best approach. We chose the model parameters for the best performing models from each category to get a consolidated result. The section 4 shows the architecture of our ensember. Initially, we chose a very simple approach of assigning all models with the same weights to get a prediction. We got a some improvement with MSE of 0.5908. We, saw that this was performing just below our best individual prediction model. So, we decided to bump the weight of our best learner in the ensemble. This helped improve our accumulated prediction, providing an MSE of 0.5878. Figure 6: Ensemble of Learners 5 Conclusion This project gave a us glimpse on how machine learning techniques are applied to real world problems. We applied a variety of techniques including neural networks, decision trees, random forests, gradient boosting, kmeans clustering, and PCA. Testing out various parameters of the different learner types helped us identify where each of the models under-fitted and over-fitted the data. Finally, while modifying the parameters of each model helped us reduce the variance in the models, we used a final weighted ensemble of various learners to reduce the bias of individual learners. 8