According to Global Real Estate Transparency Index (GRETI 2016),Vietnam ranked 68 out of 109 countries. This apparent lack of transparency holds back Vietnam real-estate market, frustrating
investors on the valuation of properties. Our objective is to build an online data-driven real estate marketplace which are beneficial to both buyers and agents. We are harnessing the power of data to provide market trends, details ad-hoc analytics reports and
client/potential agent matching.
Giasan.vn: harnessing the power of data for Vietnam real-estate market
1. Harnessing the power of data for
Vietnam real-estate market
Viet-Trung Tran, Hoang-Long Nguyen, Tuan-Anh Nguyen
School of Information and Communication Technology
Hanoi University of Science and Technology
ABSTRACT
According to Global Real Estate Transparency Index (GRETI 2016),
Vietnam ranked 68 out of 109 countries. This apparent lack of
transparency holds back Vietnam real-estate market, frustrating
investors on the valuation of properties. Our objective is to build an
online data-driven real estate marketplace which are beneficial to
both buyers and agents. We are harnessing the power of data to
provide market trends, details ad-hoc analytics reports and
client/potential agent matching.
SYSTEM DESIGN
giasan.vn is built to scale. The system follows the design principles of a
high performance big data platform which consists of several layers:
data collection, data pre-processing, data storage, data processing and
analysis.
AUTOMATED PROPERTY EVALUATION MODEL
• Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale
geographically weighted regression on Spark”. Knowledge and Systems
Engineering (KSE), 2016 Eighth International Conference on. IEEE, 2016.
• First law of geography - Waldo Tobler: “Everything is related with everything
else, but closer things are more related”.
PRICE HEATMAP
NLP ANALYSIS
PRICE TREND ANALYSIS
SOME NUMBERS
• Vietnamese address normalization
• Named entity recognition
Big data processing
Natural language
understanding
Crawlers
Filters/deduplication
Distributed Database
Report
Chatbot
Website
0
200
400
600
800
1000
1200
1000 5000 10000 20000 50000
Distributed WLR
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Number of regression points
30 M
listings
50K
users/
month
1 M
Phone
numbers