SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 370
ISSN: 2454-132X
Impact factor: 4.295
(Volume 5, Issue 1)
Available online at: www.ijariit.com
Predicting housing prices using advanced regression techniques
Bharathi A. N.
bharathinandhees1997@gmail.com
KPR Institute of Engineering and
Technology, Coimbatore, Tamil Nadu
Dr. N. Yuvaraj
drnyuvaraj@gmail.com
KPR Institute of Engineering and
Technology, Coimbatore, Tamil Nadu
Dhivya B.
dhivyakrishnan1998@gmail.com
KPR Institute of Engineering and
Technology, Coimbatore, Tamil Nadu
ABSTRACT
The prices of House increases every year, so there is a need
for the system to predict house prices in the future. House
price prediction can help the developer to determine the
selling price of a house. It also can help the customer to
arrange the right time to purchase a house. There are some
factors that influence the price of a house which depends on
physical conditions, concept, location and others. House
prices vary for each place and in different communities.
There are various techniques for predicting house prices. One
of the efficient ways is by the use of the regression technique.
Regression is a reliable method of identifying which variables
have an impact on a topic of interest. Random forests are very
accurate and robust to over-fitting. The process of performing
a regression allows to confidently determine which factors
matter the most, which factors can be ignored and how the
factors influence each other. The main objective is to use an
advanced methodology for prediction.
Keywords— House prices, Regression, Price prediction,
Lasso regression
1. INTRODUCTION
One of the business activity that most people are interested in
this globalization era is Investment. There are several objects
that are often used for investment, for example, gold, stocks
and property [1]. In determining the price of the home, the
developer must carefully calculate and determine the
appropriate method as the property prices always increase
continuously and almost never fall in the long or short term [2].
Prediction analysis is one among the several approaches that
can be used to determine the price of the house. It is a challenge
to get as close as a possible result based on the model built. For
a specific house price, it is determined by location, size, house
type, city, country, tax rules, economic cycle, population
movement, interest rate, and many other factors which could
affect demand and supply. For local house price prediction,
there are many useful regression algorithms to use. A set of
statistical processes for estimating the relationships among
variables is Regression analysis. It includes many techniques
for modeling and analyzing several variables when the focus is
on the relationship between a dependent variable and one or
more independent variables (or 'predictors').
Regression analysis, more specifically, helps one understand
how the typical value of the dependent variable changes when
any one of the independent variables is varied, while the other
independent variables are held fixed. One of the main
advantages of regression-based predicting techniques is that
they use research and analysis to predict what is likely to
happen in the next quarter, year or even farther into the future.
For small-business owners, regression-based forecasting can
provide insight into how higher taxes changes in consumer
spending or shifts in the local economy.
Regression and forecasting techniques can lend a scientific
angle to manage small businesses, reducing large amounts of
raw data to actionable information. The dataset taken has the
training set including 1460 houses (i.e., observations)
accompanied by 79 attributes (i.e., features, variables, or
predictors) and the sales price for each house. The testing
set includes 1459 houses with the same 79 attributes, but
the sales price was not included as this is our target
variable. In this paper, the proposed house price prediction is
based on the random forest algorithm.
2. LITERATURE SURVEY
In a study [3] conducted on the housing prices in the City of
Savannah, Georgia using the hedonic pricing model. The
paper’s data contains 2,888 single-family houses for the period
between 2000 and 2005. It shows that the log price of houses is
positively and significantly correlated with the number of
bathrooms, bedrooms, fireplaces, garage spaces, stories and the
total square feet of the house. Additionally, the paper adds three
dummy variables, May, June, and July, to account for the
seasonable factor with regards to the houses’ prices. If the
house is sold in May, the variable May is set to be equal to 1
and 0 otherwise. The other variables, June and July are
constructed in a similar fashion. The paper finds that the log
sale prices of houses are significantly and positively correlated
with May and July while June is insignificant. This implies that
houses that are closed in May or July tend to have a higher
price.
The social and economic impact of housing in the Scottish
countryside is examined. Investment in housing finance
impacts the economy directly and indirectly. The employment,
GDP, productivity and many other important factors are
A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 371
impacted by Housing finance investment. The study revealed
that housing is an important Indicator for increasing the wealth
of nations. It was then concluded that the Scottish housing
policy objective is to improve the quality standard of housing
as well as to increase the investment in the house old sector.
In research [8] it is found that if significance level is accepted
as 0.05 all the 5 variables in a regression model (Floor, Heating
system, Earthquake Zone, Rental Value and Land Value) have
a significant impact on the dependent variable Value. Land
value and rental value have the highest impact on housing
price. Existing floor, heating system and earthquake zone are
the following them. Although it is found that the other variable
is not significant in the study, and it can change according to
the sample size. If the sample size increases, the regression
model once again is recommended for further studies. The
application of multiple regression analysis in a house data set
explains or model’s variation in house price which
demonstrated good examples of the strategic application of the
mathematical tool to aid analysis, hence decision making in
property investment. Variation in house price which
demonstrated good examples of the strategic application of the
mathematical tool to aid [5] (2010) uses support vector machine
(SVM) regression to forecast the housing prices in China in
between 1993 and 2002 and in a certain district in Tangshan
city in Between 2000 to 2002. The paper utilizes the genetic
algorithm to tune the hyper-parameters in the SVM regression
model. The error scores for the SVM regression model for both
China and a Tangshan City’s district are both lower than 4%.
This indicates that the SVM regression model performs well in
forecasting housing prices in China. In Singapore’s housing
market, (2006) decision tree model is used to study the housing
characteristics’ effects on prices [6]. The paper concludes that
the owners of 2-room to 4-room flats are more concerned with
the flats’ basic characteristics such as model type and age more
than the owners of 5-or-more-room flats. Moreover, owners of
executive flats care more about the services characteristics such
as the neighbourhood location and recreational facilities than
basic housing characteristics.
In a research 2014[7] relationships were developed between
various home characteristics and the asking price of a
residential property was analyzed using both a simple linear
regression and the multiple linear regression using a method of
ordinary least squares. Home square footage was utilized as the
explanatory variable in the simple linear regression, and the
multiple linear regression consisted of the addition of land size,
number of bedrooms, year of construction, and other
explanatory variables. The multiple linear regression results
proved the bias due to the omission of crucial factors in the
simple linear regression. It was found that Home square footage
was the most important factor in the determination of
residential property price, while garage capacity proved to be
the weakest factor.
Many previous studies find empirical evidence supporting the
significant interrelations between house price and various
economic variables, such as income, interest rates, construction
costs and labor market variables [8][9][10].
3. METHODS AND MATERIALS
There are various kinds of regression techniques available to
make predictions [11]. The techniques are mostly driven by
three metrics (number of independent variables, type of
dependent variables and shape of the regression line) which is
given in figure 1.
Various Algorithms used for the purpose of predicting Housing
prices are listed below.
Fig. 1: metrics of regression
3.1. Hedonic Pricing Model
Hedonic price theory assumes that a commodity such as a
house can be viewed as an aggregation of individual
components or attributes [12]. It is frequently used to measure a
property’s price. Hedonic pricing model combines both the
internal characteristics of a house(such as the number of
bedrooms, number of bathrooms, etc.) and its external
characteristic (such as neighbourhood’s walkability score,
public schools’ scores, etc.) to estimate its values. Hedonic
pricing can be implemented using the regression models.
Equation 1 will show the regression model in determining a
price.
𝑦 = 𝑎. 𝑥1 + b. 𝑥2 + ⋯ + n. 𝑥1 (1)
Where, y is the predicted price, and x1, x2, xi are the attributes
of a house. While a, b,... n indicate the correlation coefficients
of each variable in the determination of house prices. While the
hedonic technique is an acceptable method for accommodating
attribute differences of a house price determination model, it is
generally unrealistic to deal with the housing market in any
geographical area as a single unit. Therefore, it seems more
reasonable to introduce geographical information or location
factor into a model that allows shifts in the house price level.
3.2. Artificial Neural Network Model
The use of the neural network model is similar to the process
utilized in building the hedonic price model. However, the
neural network [13] must first be trained from a set of data. For
a particular input, the output (estimated house price) is
produced from the model. Then, the model compares the model
output to the actual output (actual house price). The accuracy of
the value is determined by the total mean square error and then
backpropagation is used in an attempt to reduce prediction
errors, which is done through the adjusting of the connection
weights. The performance [14] of the network can be
influenced by the number of hidden layers and the number of
nodes that are included in each hidden layer. A trial and error
process is applied to finding the optimal artificial neural
network model. It's far complicated than many other models,
such as decision tree and regression. It's hard to interpret and
understand the weights.
4. PROPOSED METHODOLOGY
4.1. Dataset and Preprocessing
There are two different data sets namely train dataset and test
dataset. Both contain numerous variables in terms of features
which were describing a house. Training dataset contains 1460
observations for which the sale price of a house is provided.
Based on this data, a prediction model is to be built. Test
dataset contains 1459 observations for which the sales price has
to be predicted. 80 variables in total focus on the quality and
quantity of many physical attributes of the property. Most of
the variables are exactly the type of information that a typical
A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 372
home buyer would have to know about a potential property.
This study is based on house price data of Ames
Housing dataset.
Some of these features of the dataset don’t have a linear
relationship with the house price such as ‘date’, ‘long’ and ‘lat’
representing the date the house was sold, the longitude and the
latitude of the house, respectively. These features should either
be removed or modified. First, using ‘date’ (the date the house
was sold) and ‘yr built’ (the year the house was built), we
calculate the age of the building. Using the feature ‘yr
renovated’ (the year the house was renovated) we create a new
binary feature to represent whether the house was renovated at
all. Although zip-code doesn’t have a linear relation with the
price, it could have useful information about the house price.
Hence it is treated as a categorical feature. Next, the features
‘id’, ‘date’, ‘yr built’, ‘lat’, ‘long’, ‘date yr’ and ‘yr renovated’
are removed.
4.2. Lasso Regression
In machine learning and statistics, lasso (least absolute
shrinkage and selection operator; also Lasso or LASSO) is
a regression analysis method that performs both variable
selection and regularization in order to enhance the prediction
accuracy and interpretability of the statistical model it
produces.
Lasso is a powerful regression technique. It works by
penalizing the magnitude of coefficients of features along with
minimizing the error between predicted and actual
observations. Lasso is called as L1 Regularization technique.
The algorithm can be implemented with the help of python’s
SciKit-learn Library [15]. Lasso attempts to minimize the cost
function. The cost function is given as Cost(W)= RSS(W) + α
(Sum of squares of weight) Here RSS refers to ‘Residual Sum
of Squares’ meaning the sum of the square of errors between
the predicted and actual values in the training data set. α is a co-
efficient that takes various values. There are three cases for
values of α.
1. α = 0; same coefficients as simple linear regression
2. α = ∞; All coefficients zero
3. 0 < α < ∞; coefficients between 0 and that of simple linear
regression The Lasso function can be
Cost (w) = ∑{
𝑁
𝑖=1
𝑦𝑖 − ∑ 𝑤𝑖
𝑀
𝑗=0
𝑥𝑖𝑗}2
+ 𝛼 ∑ |𝑤𝑖
𝑀
𝑗=0
|
.
The model can solve many of the challenges that we face with
linear regression and can be a very useful tool for fitting linear
models. It’s a better way to analyze data and capture
relationships in the data and avoid over-fitting.
4.3. House Price Affecting Factors
There are several factors that affect house prices. In research
[16] the factors affecting the house price are divided into three
main groups, they are physical condition, concept and location.
Physical conditions are properties possessed by a house that can
be observed by human senses, including the size of the house,
the number of bedrooms, the availability of kitchen and garage,
the availability of the garden, the area of land and buildings,
and the age of the house [17], while the concept is an idea
offered by developers who can attract potential buyers, for
example, the concept of a minimalist home, healthy and green
environment, and elite environment. Location is an important
factor in shaping the price of a house. This is because the
location determines the prevailing land price [18]. In addition,
the location also determines the ease of access to public
facilities, such as schools, campus, hospitals and health centres,
as well as family recreation facilities such as malls, culinary
tours, or even offer a beautiful scenery [19], [20].
4.4. XgBoost
XGBoost has become a widely used and really popular tool
among Kaggle competitors and Data Scientists in industry, as it
has been battle tested for production on large-scale problems. It
is a highly flexible and versatile tool that can work through
most regression, classification and ranking problems as well as
user-built objective functions. As open-source software, it is
easy to access and it may be used through different platforms
and interfaces. The portability and compatibility of the system
permit its usage on all three Windows, Linux and OS X. It also
supports training on distributed cloud platforms like AWS,
Azure, GCE among others and it is easily connected to large-
scale cloud dataflow systems such as Flink and Spark.
Although it was built and initially used in the Command Line
Interface (CLI) by its creator, it can also be loaded and used in
various languages and interfaces such as Python, C++, R, Julia,
Scala and Java.
XGBoost is an accurate and scalable implementation of
gradient boosting machines. Its name stands for eXtreme
Gradient Boosting; it was developed by Tianqi Chen and now it
is part of a wider collection of open-source libraries developed
by the Distributed Machine Learning Community (DMLC). It
has proven to push the limits of computing power for boosted
trees algorithms as it was built and developed for the sole
purpose of computational speed and model performance.
Specifically, it was engineered to exploit every bit of a memory
and hardware resources for tree boosting algorithms.
The implementation of XGBoost offers several advanced
features for tuning of models, computing environments and
algorithm enhancement. It is capable of performing the three
main forms of gradient boosting (such as Gradient Boosting
(GB), Stochastic GB and Regularized GB) and it is robust
enough to support fine-tuning and the addition of regularization
parameters. According to Tianqi Chen, the latter is what makes
it superior and different from other libraries. System-wise, the
library’s portability and flexibility allow the use of a wide
variety of computing environments like parallelization for tree
construction across several CPU cores; Out-of-Core computing;
distributed computing for large models; and Cache
Optimization to improve hardware usage and efficiency.
The algorithm was developed to efficiently reduce computing
time and allocate an optimal usage of memory resources.
Important features of implementation include handling of
missing values (Sparse Aware), Block Structure to support
parallelization in tree construction and the ability to fit and
boost on new data added to a trained model. It holds various
methodologies and steps in the prediction method.
5. WORKING MODEL
Fig. 2: Steps involved for prediction
A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 373
a) Reading data: At this stage, the data is read. The training
data is then needed to be concatenated with test data. This is
done mainly because of the presence of text variables. These
will later be replaced by dummy variables. If training and test
set is treated separately, it could end up with a different number
of dummy variables for each of them which would in turn
damage the prediction.
b) Data Preprocessing: It is a process of transforming the raw,
complex data into systematic understandable knowledge. It
involves the process of finding out missing and redundant data
in the dataset. The entire dataset is checked for Na and
whichever observation consists of Na will be deleted. Thus, this
brings uniformity in the dataset. Finally, the data has to be split
into training and test data.
c) Data Analysis: Before applying any model to our dataset,
we need to find out the characteristics of our dataset. Thus, we
need to analyze our dataset and study the different parameters
and relationship between these parameters. We can also find
out the outliers present in our dataset. Outliers occur due to
some kind of experimental errors and they need to be excluded
from the dataset.
d) Feature Engineering: Feature (variable or predictor)
engineering is one of the most important steps in model
creation. Often there is valuable information “hidden” in the
predictors that are only revealed when manipulating these
features in some way. Below are just some examples of the
features:
 Remodeled (categorical): Yes or No if Year Built is
different from Year Remodeled; if the year the house was
remodeled is different from the year it was built, the
remodeling likely increases property value.
 Seasonality (categorical): Combined Month Sold with Year
Sold; while more houses were sold during summer months,
this likely varies across years, especially during the time
period these houses were sold, which coincides with the
housing crash.
 New House (categorical): Yes or No if Year Sold is equal
to Year Built; if a house was sold the same year it was
built, we might expect it was in high demand and might
have a higher Sale Price.
 Total Area (continuous): Sum of all variables that describe
the area of different sections of a house; There are many
variables that pertain to the square footage of different
aspects of each house; we might expect that the total
square footage has a strong influence on Sale Price.
e) Modelling: Model selection is the process of combining data
and prior information to select among a group of statistical
models. In building a model, decisions to include or exclude
covariates, as well as uncertainty in how to code the covariates
in the design matrix for any given model, are based both on the
prior hypotheses and the data. Lasso (least absolute shrinkage
and selection operator; also Lasso or LASSO) is a regression
analysis method that performs both variable
selection and regularization in order to enhance the prediction
accuracy and interpretability of the statistical model it
produces.
6. CONCLUSION
In this paper, the LASSO regression technique was
implemented to predict the price of a house. The step by step
procedure to analyze the dataset and find the correlation
between the parameters are mentioned. Thus we can select the
parameters which are not correlated to each other and are
independent in nature and these feature set were then given as
an input. It performs both variable selection and regularization
in order to enhance the prediction accuracy.
7. REFERENCES
[1] R. M. A. van der Schaar, Analysis of Indonesian Property
Market; Overview and Foreign Ownership,‖ Investment
Indonesian. 2015.
[2] Y. Feng and K. Jones, Comparing multilevel modelling
and artificial neural networks in house price prediction,‖
2015 2nd IEEE Int. Conf. Spat. Data Min. Geogr. Knowl.
Serv., pp. 108–114, 2015.
[3] Rochard J. Cebula. “The Hedonic Pricing Model Applied
to the Housing Market of the City of Savannah and Its
Savannah Historic Landmark District”. In: The Review of
Regional Studies 39.1 (2009), pp. 9–22.
[4] [Gang-Zhi Fan, Seow Eng Ong, and Hian Chye Koh.
“Determinants of House Price: A Decision Tree
Approach”. In: Urban Studies 43.12 (2006)
[5] Gu Jirong, Zhu Mingcang, and Jiang Liuguangyan.
“Housing price based on genetic algorithm and support
vector machine”. In: Expert Systems with Applications 38
(2011), pp. 3383–3386.
[6] Eric Slone, Haitian Sun, Po-Hsiang Wang, (2014), “Market
Prices of Houses in Atlanta”, from
https://smartech.gatech.edu/bitstream/handle/1853/51632/
Market%20Prices%20of%20Houses%20in%20Atlanta.pdf
[7] P. Linneman, An empirical test of the efficiency of the
housing market‖. Journal of Urban Economics 20(1986):
140-154, 1986.
[8] J.M. Quigley, Real estate prices and economic cycles‖.
International Real Estate Reviews 2: 1-20. 1999.
[9] K.Tsatasaronis, & H. Zhu, What drives housing price
dynamics: Cross-country evidence?‖ BIS Quarterly Review
of March.
[10]Torgo, Luis, and Joao Gama. "Regression using
classification algorithms." Intelligent Data Analysis 1.4
(1997): 275-2.
[11] Ezgi Candas, Seda Bagdatli Kalkan and Tahsin
Yomralioglu, (2015), “Determining the Factors Affecting
Housing Prices”, FIG Working Week 2015, Sofia,
Bulgaria, 17 - 21 May 2015.
[12] Razi, Muhammad A., and KuriakoseAthappilly. "A
comparative predictive analysis of neural networks (NNs),
nonlinear regression and classification and regression tree
(CART) models." Expert Systems with Applications 29.1
(2005): 65-74.
[13]Lenk M. M., Worzala E. M. and A. Silva, 1997, “High-
tech Valuation: Should Artificial Neural Networks Bypass
The Human Valuer?”, Journal of Property Valuation &
Investment, 15(1): 8 – 26.
[14] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning
in Python." Journal of machine learning research 12.Oct
(2011): 2825-2830.
[15] R. A. Rahadi, S. K. Wiryono, D. P. Koesrindartotoor, and
I. B. Syamwil, Factors influencing the price of housing in
Indonesia,‖ Int. J. Hous. Mark. Anal., vol. 8, no. 2, pp.
169–188, 2015.
[16]V. Limsombunchai, House price prediction: Hedonic price
model vs. artificial neural network,‖ Am. J. …, 2004.
[17]D. X. Zhu and K. L. Wei, The Land Prices and Housing
Prices Empirical Research Based on Panel Data of 11
Provinces and Municipalities in Eastern China,‖ Int. Conf.
Manag. Sci. Eng., no. 2009, pp. 2118–2123, 2013.
A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 374
[18]S. Kisilevich, D. Keim, and L. Rokach, ―A GIS-based
decision support system for hotel room rate estimation and
temporal price prediction: The hotel brokers’ context,‖
Decis. Support Syst., vol. 54, no. 2, pp. 1119– 1133, 2013.
[19]C. Y. Jim and W. Y. Chen, ―Value of scenic views:
Hedonic assessment of private housing in Hong Kong,‖
Landsc. Urban Plan., vol. 91, no. 4, pp. 226–234, 2009.

Weitere ähnliche Inhalte

Ähnlich wie Predicting_housing_prices_using_advanced.pdf

Forecasting the US housing market
Forecasting the US housing marketForecasting the US housing market
Forecasting the US housing marketNicha Tatsaneeyapan
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithmijtsrd
 
Factors influencing the rise of house price in klang valley
Factors influencing the rise of house price in klang valleyFactors influencing the rise of house price in klang valley
Factors influencing the rise of house price in klang valleyeSAT Journals
 
Factors influencing the rise of house price in klang
Factors influencing the rise of house price in klangFactors influencing the rise of house price in klang
Factors influencing the rise of house price in klangeSAT Publishing House
 
REAL ESTATE PRICE PREDICTION
REAL ESTATE PRICE PREDICTIONREAL ESTATE PRICE PREDICTION
REAL ESTATE PRICE PREDICTIONIRJET Journal
 
Application of cost effective technology in low cost housing and their propag...
Application of cost effective technology in low cost housing and their propag...Application of cost effective technology in low cost housing and their propag...
Application of cost effective technology in low cost housing and their propag...Alexander Decker
 
House Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data AnalysisHouse Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data AnalysisIRJET Journal
 
Evaluation of residents’ view on affordability of public housing
Evaluation of residents’ view on affordability of public housingEvaluation of residents’ view on affordability of public housing
Evaluation of residents’ view on affordability of public housingAlexander Decker
 
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...IEREK Press
 
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...IEREK Press
 
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)Siniša Sovilj
 
House Rental Management System Presentation
House Rental Management System PresentationHouse Rental Management System Presentation
House Rental Management System PresentationRohanRajMudvari
 
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...theijes
 
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.Fawaz Khaled
 
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...IRJET Journal
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET Journal
 

Ähnlich wie Predicting_housing_prices_using_advanced.pdf (20)

Forecasting the US housing market
Forecasting the US housing marketForecasting the US housing market
Forecasting the US housing market
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
 
Factors influencing the rise of house price in klang valley
Factors influencing the rise of house price in klang valleyFactors influencing the rise of house price in klang valley
Factors influencing the rise of house price in klang valley
 
Factors influencing the rise of house price in klang
Factors influencing the rise of house price in klangFactors influencing the rise of house price in klang
Factors influencing the rise of house price in klang
 
REAL ESTATE PRICE PREDICTION
REAL ESTATE PRICE PREDICTIONREAL ESTATE PRICE PREDICTION
REAL ESTATE PRICE PREDICTION
 
Application of cost effective technology in low cost housing and their propag...
Application of cost effective technology in low cost housing and their propag...Application of cost effective technology in low cost housing and their propag...
Application of cost effective technology in low cost housing and their propag...
 
House Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data AnalysisHouse Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data Analysis
 
Evaluation of residents’ view on affordability of public housing
Evaluation of residents’ view on affordability of public housingEvaluation of residents’ view on affordability of public housing
Evaluation of residents’ view on affordability of public housing
 
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
 
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
A Model Proposed for the Prediction of Future Sustainable Residence Specifica...
 
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)
Renting vs Buying Home SD model v.1 (manuscript; © Sinisa Sovilj)
 
bhagat.pdf
bhagat.pdfbhagat.pdf
bhagat.pdf
 
House Rental Management System Presentation
House Rental Management System PresentationHouse Rental Management System Presentation
House Rental Management System Presentation
 
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...
An Evaluation of the Impact of Government Assisted Housing Programmes (GAHPs)...
 
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.
Mortgage Default, Property Price and Banks’ Lending Behaviour in Hong Kong SAR.
 
The Influencee of Location, Price and Service Quality On A House Purchase Dec...
The Influencee of Location, Price and Service Quality On A House Purchase Dec...The Influencee of Location, Price and Service Quality On A House Purchase Dec...
The Influencee of Location, Price and Service Quality On A House Purchase Dec...
 
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...
Unveiling Patterns in European Airbnb Prices: A Comprehensive Analytical Stud...
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 

Kürzlich hochgeladen

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Kürzlich hochgeladen (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Predicting_housing_prices_using_advanced.pdf

  • 1. A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 370 ISSN: 2454-132X Impact factor: 4.295 (Volume 5, Issue 1) Available online at: www.ijariit.com Predicting housing prices using advanced regression techniques Bharathi A. N. bharathinandhees1997@gmail.com KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu Dr. N. Yuvaraj drnyuvaraj@gmail.com KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu Dhivya B. dhivyakrishnan1998@gmail.com KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu ABSTRACT The prices of House increases every year, so there is a need for the system to predict house prices in the future. House price prediction can help the developer to determine the selling price of a house. It also can help the customer to arrange the right time to purchase a house. There are some factors that influence the price of a house which depends on physical conditions, concept, location and others. House prices vary for each place and in different communities. There are various techniques for predicting house prices. One of the efficient ways is by the use of the regression technique. Regression is a reliable method of identifying which variables have an impact on a topic of interest. Random forests are very accurate and robust to over-fitting. The process of performing a regression allows to confidently determine which factors matter the most, which factors can be ignored and how the factors influence each other. The main objective is to use an advanced methodology for prediction. Keywords— House prices, Regression, Price prediction, Lasso regression 1. INTRODUCTION One of the business activity that most people are interested in this globalization era is Investment. There are several objects that are often used for investment, for example, gold, stocks and property [1]. In determining the price of the home, the developer must carefully calculate and determine the appropriate method as the property prices always increase continuously and almost never fall in the long or short term [2]. Prediction analysis is one among the several approaches that can be used to determine the price of the house. It is a challenge to get as close as a possible result based on the model built. For a specific house price, it is determined by location, size, house type, city, country, tax rules, economic cycle, population movement, interest rate, and many other factors which could affect demand and supply. For local house price prediction, there are many useful regression algorithms to use. A set of statistical processes for estimating the relationships among variables is Regression analysis. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Regression analysis, more specifically, helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. One of the main advantages of regression-based predicting techniques is that they use research and analysis to predict what is likely to happen in the next quarter, year or even farther into the future. For small-business owners, regression-based forecasting can provide insight into how higher taxes changes in consumer spending or shifts in the local economy. Regression and forecasting techniques can lend a scientific angle to manage small businesses, reducing large amounts of raw data to actionable information. The dataset taken has the training set including 1460 houses (i.e., observations) accompanied by 79 attributes (i.e., features, variables, or predictors) and the sales price for each house. The testing set includes 1459 houses with the same 79 attributes, but the sales price was not included as this is our target variable. In this paper, the proposed house price prediction is based on the random forest algorithm. 2. LITERATURE SURVEY In a study [3] conducted on the housing prices in the City of Savannah, Georgia using the hedonic pricing model. The paper’s data contains 2,888 single-family houses for the period between 2000 and 2005. It shows that the log price of houses is positively and significantly correlated with the number of bathrooms, bedrooms, fireplaces, garage spaces, stories and the total square feet of the house. Additionally, the paper adds three dummy variables, May, June, and July, to account for the seasonable factor with regards to the houses’ prices. If the house is sold in May, the variable May is set to be equal to 1 and 0 otherwise. The other variables, June and July are constructed in a similar fashion. The paper finds that the log sale prices of houses are significantly and positively correlated with May and July while June is insignificant. This implies that houses that are closed in May or July tend to have a higher price. The social and economic impact of housing in the Scottish countryside is examined. Investment in housing finance impacts the economy directly and indirectly. The employment, GDP, productivity and many other important factors are
  • 2. A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 371 impacted by Housing finance investment. The study revealed that housing is an important Indicator for increasing the wealth of nations. It was then concluded that the Scottish housing policy objective is to improve the quality standard of housing as well as to increase the investment in the house old sector. In research [8] it is found that if significance level is accepted as 0.05 all the 5 variables in a regression model (Floor, Heating system, Earthquake Zone, Rental Value and Land Value) have a significant impact on the dependent variable Value. Land value and rental value have the highest impact on housing price. Existing floor, heating system and earthquake zone are the following them. Although it is found that the other variable is not significant in the study, and it can change according to the sample size. If the sample size increases, the regression model once again is recommended for further studies. The application of multiple regression analysis in a house data set explains or model’s variation in house price which demonstrated good examples of the strategic application of the mathematical tool to aid analysis, hence decision making in property investment. Variation in house price which demonstrated good examples of the strategic application of the mathematical tool to aid [5] (2010) uses support vector machine (SVM) regression to forecast the housing prices in China in between 1993 and 2002 and in a certain district in Tangshan city in Between 2000 to 2002. The paper utilizes the genetic algorithm to tune the hyper-parameters in the SVM regression model. The error scores for the SVM regression model for both China and a Tangshan City’s district are both lower than 4%. This indicates that the SVM regression model performs well in forecasting housing prices in China. In Singapore’s housing market, (2006) decision tree model is used to study the housing characteristics’ effects on prices [6]. The paper concludes that the owners of 2-room to 4-room flats are more concerned with the flats’ basic characteristics such as model type and age more than the owners of 5-or-more-room flats. Moreover, owners of executive flats care more about the services characteristics such as the neighbourhood location and recreational facilities than basic housing characteristics. In a research 2014[7] relationships were developed between various home characteristics and the asking price of a residential property was analyzed using both a simple linear regression and the multiple linear regression using a method of ordinary least squares. Home square footage was utilized as the explanatory variable in the simple linear regression, and the multiple linear regression consisted of the addition of land size, number of bedrooms, year of construction, and other explanatory variables. The multiple linear regression results proved the bias due to the omission of crucial factors in the simple linear regression. It was found that Home square footage was the most important factor in the determination of residential property price, while garage capacity proved to be the weakest factor. Many previous studies find empirical evidence supporting the significant interrelations between house price and various economic variables, such as income, interest rates, construction costs and labor market variables [8][9][10]. 3. METHODS AND MATERIALS There are various kinds of regression techniques available to make predictions [11]. The techniques are mostly driven by three metrics (number of independent variables, type of dependent variables and shape of the regression line) which is given in figure 1. Various Algorithms used for the purpose of predicting Housing prices are listed below. Fig. 1: metrics of regression 3.1. Hedonic Pricing Model Hedonic price theory assumes that a commodity such as a house can be viewed as an aggregation of individual components or attributes [12]. It is frequently used to measure a property’s price. Hedonic pricing model combines both the internal characteristics of a house(such as the number of bedrooms, number of bathrooms, etc.) and its external characteristic (such as neighbourhood’s walkability score, public schools’ scores, etc.) to estimate its values. Hedonic pricing can be implemented using the regression models. Equation 1 will show the regression model in determining a price. 𝑦 = 𝑎. 𝑥1 + b. 𝑥2 + ⋯ + n. 𝑥1 (1) Where, y is the predicted price, and x1, x2, xi are the attributes of a house. While a, b,... n indicate the correlation coefficients of each variable in the determination of house prices. While the hedonic technique is an acceptable method for accommodating attribute differences of a house price determination model, it is generally unrealistic to deal with the housing market in any geographical area as a single unit. Therefore, it seems more reasonable to introduce geographical information or location factor into a model that allows shifts in the house price level. 3.2. Artificial Neural Network Model The use of the neural network model is similar to the process utilized in building the hedonic price model. However, the neural network [13] must first be trained from a set of data. For a particular input, the output (estimated house price) is produced from the model. Then, the model compares the model output to the actual output (actual house price). The accuracy of the value is determined by the total mean square error and then backpropagation is used in an attempt to reduce prediction errors, which is done through the adjusting of the connection weights. The performance [14] of the network can be influenced by the number of hidden layers and the number of nodes that are included in each hidden layer. A trial and error process is applied to finding the optimal artificial neural network model. It's far complicated than many other models, such as decision tree and regression. It's hard to interpret and understand the weights. 4. PROPOSED METHODOLOGY 4.1. Dataset and Preprocessing There are two different data sets namely train dataset and test dataset. Both contain numerous variables in terms of features which were describing a house. Training dataset contains 1460 observations for which the sale price of a house is provided. Based on this data, a prediction model is to be built. Test dataset contains 1459 observations for which the sales price has to be predicted. 80 variables in total focus on the quality and quantity of many physical attributes of the property. Most of the variables are exactly the type of information that a typical
  • 3. A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 372 home buyer would have to know about a potential property. This study is based on house price data of Ames Housing dataset. Some of these features of the dataset don’t have a linear relationship with the house price such as ‘date’, ‘long’ and ‘lat’ representing the date the house was sold, the longitude and the latitude of the house, respectively. These features should either be removed or modified. First, using ‘date’ (the date the house was sold) and ‘yr built’ (the year the house was built), we calculate the age of the building. Using the feature ‘yr renovated’ (the year the house was renovated) we create a new binary feature to represent whether the house was renovated at all. Although zip-code doesn’t have a linear relation with the price, it could have useful information about the house price. Hence it is treated as a categorical feature. Next, the features ‘id’, ‘date’, ‘yr built’, ‘lat’, ‘long’, ‘date yr’ and ‘yr renovated’ are removed. 4.2. Lasso Regression In machine learning and statistics, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. Lasso is a powerful regression technique. It works by penalizing the magnitude of coefficients of features along with minimizing the error between predicted and actual observations. Lasso is called as L1 Regularization technique. The algorithm can be implemented with the help of python’s SciKit-learn Library [15]. Lasso attempts to minimize the cost function. The cost function is given as Cost(W)= RSS(W) + α (Sum of squares of weight) Here RSS refers to ‘Residual Sum of Squares’ meaning the sum of the square of errors between the predicted and actual values in the training data set. α is a co- efficient that takes various values. There are three cases for values of α. 1. α = 0; same coefficients as simple linear regression 2. α = ∞; All coefficients zero 3. 0 < α < ∞; coefficients between 0 and that of simple linear regression The Lasso function can be Cost (w) = ∑{ 𝑁 𝑖=1 𝑦𝑖 − ∑ 𝑤𝑖 𝑀 𝑗=0 𝑥𝑖𝑗}2 + 𝛼 ∑ |𝑤𝑖 𝑀 𝑗=0 | . The model can solve many of the challenges that we face with linear regression and can be a very useful tool for fitting linear models. It’s a better way to analyze data and capture relationships in the data and avoid over-fitting. 4.3. House Price Affecting Factors There are several factors that affect house prices. In research [16] the factors affecting the house price are divided into three main groups, they are physical condition, concept and location. Physical conditions are properties possessed by a house that can be observed by human senses, including the size of the house, the number of bedrooms, the availability of kitchen and garage, the availability of the garden, the area of land and buildings, and the age of the house [17], while the concept is an idea offered by developers who can attract potential buyers, for example, the concept of a minimalist home, healthy and green environment, and elite environment. Location is an important factor in shaping the price of a house. This is because the location determines the prevailing land price [18]. In addition, the location also determines the ease of access to public facilities, such as schools, campus, hospitals and health centres, as well as family recreation facilities such as malls, culinary tours, or even offer a beautiful scenery [19], [20]. 4.4. XgBoost XGBoost has become a widely used and really popular tool among Kaggle competitors and Data Scientists in industry, as it has been battle tested for production on large-scale problems. It is a highly flexible and versatile tool that can work through most regression, classification and ranking problems as well as user-built objective functions. As open-source software, it is easy to access and it may be used through different platforms and interfaces. The portability and compatibility of the system permit its usage on all three Windows, Linux and OS X. It also supports training on distributed cloud platforms like AWS, Azure, GCE among others and it is easily connected to large- scale cloud dataflow systems such as Flink and Spark. Although it was built and initially used in the Command Line Interface (CLI) by its creator, it can also be loaded and used in various languages and interfaces such as Python, C++, R, Julia, Scala and Java. XGBoost is an accurate and scalable implementation of gradient boosting machines. Its name stands for eXtreme Gradient Boosting; it was developed by Tianqi Chen and now it is part of a wider collection of open-source libraries developed by the Distributed Machine Learning Community (DMLC). It has proven to push the limits of computing power for boosted trees algorithms as it was built and developed for the sole purpose of computational speed and model performance. Specifically, it was engineered to exploit every bit of a memory and hardware resources for tree boosting algorithms. The implementation of XGBoost offers several advanced features for tuning of models, computing environments and algorithm enhancement. It is capable of performing the three main forms of gradient boosting (such as Gradient Boosting (GB), Stochastic GB and Regularized GB) and it is robust enough to support fine-tuning and the addition of regularization parameters. According to Tianqi Chen, the latter is what makes it superior and different from other libraries. System-wise, the library’s portability and flexibility allow the use of a wide variety of computing environments like parallelization for tree construction across several CPU cores; Out-of-Core computing; distributed computing for large models; and Cache Optimization to improve hardware usage and efficiency. The algorithm was developed to efficiently reduce computing time and allocate an optimal usage of memory resources. Important features of implementation include handling of missing values (Sparse Aware), Block Structure to support parallelization in tree construction and the ability to fit and boost on new data added to a trained model. It holds various methodologies and steps in the prediction method. 5. WORKING MODEL Fig. 2: Steps involved for prediction
  • 4. A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 373 a) Reading data: At this stage, the data is read. The training data is then needed to be concatenated with test data. This is done mainly because of the presence of text variables. These will later be replaced by dummy variables. If training and test set is treated separately, it could end up with a different number of dummy variables for each of them which would in turn damage the prediction. b) Data Preprocessing: It is a process of transforming the raw, complex data into systematic understandable knowledge. It involves the process of finding out missing and redundant data in the dataset. The entire dataset is checked for Na and whichever observation consists of Na will be deleted. Thus, this brings uniformity in the dataset. Finally, the data has to be split into training and test data. c) Data Analysis: Before applying any model to our dataset, we need to find out the characteristics of our dataset. Thus, we need to analyze our dataset and study the different parameters and relationship between these parameters. We can also find out the outliers present in our dataset. Outliers occur due to some kind of experimental errors and they need to be excluded from the dataset. d) Feature Engineering: Feature (variable or predictor) engineering is one of the most important steps in model creation. Often there is valuable information “hidden” in the predictors that are only revealed when manipulating these features in some way. Below are just some examples of the features:  Remodeled (categorical): Yes or No if Year Built is different from Year Remodeled; if the year the house was remodeled is different from the year it was built, the remodeling likely increases property value.  Seasonality (categorical): Combined Month Sold with Year Sold; while more houses were sold during summer months, this likely varies across years, especially during the time period these houses were sold, which coincides with the housing crash.  New House (categorical): Yes or No if Year Sold is equal to Year Built; if a house was sold the same year it was built, we might expect it was in high demand and might have a higher Sale Price.  Total Area (continuous): Sum of all variables that describe the area of different sections of a house; There are many variables that pertain to the square footage of different aspects of each house; we might expect that the total square footage has a strong influence on Sale Price. e) Modelling: Model selection is the process of combining data and prior information to select among a group of statistical models. In building a model, decisions to include or exclude covariates, as well as uncertainty in how to code the covariates in the design matrix for any given model, are based both on the prior hypotheses and the data. Lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. 6. CONCLUSION In this paper, the LASSO regression technique was implemented to predict the price of a house. The step by step procedure to analyze the dataset and find the correlation between the parameters are mentioned. Thus we can select the parameters which are not correlated to each other and are independent in nature and these feature set were then given as an input. It performs both variable selection and regularization in order to enhance the prediction accuracy. 7. REFERENCES [1] R. M. A. van der Schaar, Analysis of Indonesian Property Market; Overview and Foreign Ownership,‖ Investment Indonesian. 2015. [2] Y. Feng and K. Jones, Comparing multilevel modelling and artificial neural networks in house price prediction,‖ 2015 2nd IEEE Int. Conf. Spat. Data Min. Geogr. Knowl. Serv., pp. 108–114, 2015. [3] Rochard J. Cebula. “The Hedonic Pricing Model Applied to the Housing Market of the City of Savannah and Its Savannah Historic Landmark District”. In: The Review of Regional Studies 39.1 (2009), pp. 9–22. [4] [Gang-Zhi Fan, Seow Eng Ong, and Hian Chye Koh. “Determinants of House Price: A Decision Tree Approach”. In: Urban Studies 43.12 (2006) [5] Gu Jirong, Zhu Mingcang, and Jiang Liuguangyan. “Housing price based on genetic algorithm and support vector machine”. In: Expert Systems with Applications 38 (2011), pp. 3383–3386. [6] Eric Slone, Haitian Sun, Po-Hsiang Wang, (2014), “Market Prices of Houses in Atlanta”, from https://smartech.gatech.edu/bitstream/handle/1853/51632/ Market%20Prices%20of%20Houses%20in%20Atlanta.pdf [7] P. Linneman, An empirical test of the efficiency of the housing market‖. Journal of Urban Economics 20(1986): 140-154, 1986. [8] J.M. Quigley, Real estate prices and economic cycles‖. International Real Estate Reviews 2: 1-20. 1999. [9] K.Tsatasaronis, & H. Zhu, What drives housing price dynamics: Cross-country evidence?‖ BIS Quarterly Review of March. [10]Torgo, Luis, and Joao Gama. "Regression using classification algorithms." Intelligent Data Analysis 1.4 (1997): 275-2. [11] Ezgi Candas, Seda Bagdatli Kalkan and Tahsin Yomralioglu, (2015), “Determining the Factors Affecting Housing Prices”, FIG Working Week 2015, Sofia, Bulgaria, 17 - 21 May 2015. [12] Razi, Muhammad A., and KuriakoseAthappilly. "A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models." Expert Systems with Applications 29.1 (2005): 65-74. [13]Lenk M. M., Worzala E. M. and A. Silva, 1997, “High- tech Valuation: Should Artificial Neural Networks Bypass The Human Valuer?”, Journal of Property Valuation & Investment, 15(1): 8 – 26. [14] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830. [15] R. A. Rahadi, S. K. Wiryono, D. P. Koesrindartotoor, and I. B. Syamwil, Factors influencing the price of housing in Indonesia,‖ Int. J. Hous. Mark. Anal., vol. 8, no. 2, pp. 169–188, 2015. [16]V. Limsombunchai, House price prediction: Hedonic price model vs. artificial neural network,‖ Am. J. …, 2004. [17]D. X. Zhu and K. L. Wei, The Land Prices and Housing Prices Empirical Research Based on Panel Data of 11 Provinces and Municipalities in Eastern China,‖ Int. Conf. Manag. Sci. Eng., no. 2009, pp. 2118–2123, 2013.
  • 5. A. N. Bharathi et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 374 [18]S. Kisilevich, D. Keim, and L. Rokach, ―A GIS-based decision support system for hotel room rate estimation and temporal price prediction: The hotel brokers’ context,‖ Decis. Support Syst., vol. 54, no. 2, pp. 1119– 1133, 2013. [19]C. Y. Jim and W. Y. Chen, ―Value of scenic views: Hedonic assessment of private housing in Hong Kong,‖ Landsc. Urban Plan., vol. 91, no. 4, pp. 226–234, 2009.