SlideShare ist ein Scribd-Unternehmen logo
1 von 36
CSC Proprietary and Confidential
Prediction of the bike rental
demand in Washington, D.C.
Zilong Zhao
Associate Professional: Data Scientist
March 4th.2016
CSC Proprietary and Confidential 2December 14, 2016
I. Motivation 3-7
II. Data Exploration 8-13
III. Predictive Analysis 14-28
IV. Summary and Outlook 29-30
Table of Content
CSC Proprietary and Confidential 3December 14, 2016
What is a city bikeshare system?
CSC Proprietary and Confidential 4December 14, 2016
Source: http://regionalbraunschweig.de/fahrradparkhaeuser-fuer-die-loewenstadt/, Foto: SIna Rühland
CSC Proprietary and Confidential 5December 14, 2016
What is a city bikeshare system?
An automatic bike station powered by solar energy
Source: https://draufabfahren.de/unterwegs/der-perfekte-staedtetrip-mit-call-a-bike-36061
CSC Proprietary and Confidential 6December 14, 2016
Why is predictive analytics useful for bike sharing company?
• Bike available everytime and everywhere vs. avoiding over-capacities
• Bike positioned: how and when
• Reduction of bottlenecks caused by regular bike maintenance
• High availability of bikes: customer satisfaction increases
CSC Proprietary and Confidential 7December 14, 2016
Introduction of a Kaggle project
Forecast use of the bikeshare system in Washington, D.C.
About Kaggle
• A platform for various projects of predictive modelling and analytics
competitions
• Partnered with NASA, Wikipedia, Deloitte etc.
• Milestones:
 Gesture recognition for Microsoft Kinect,
 Data analysis for the Higgs boson project at CERN, Geneva, Switzerland
 Netflix US$1.000.000 prize for prediction of user ratings for films
Source: https://www.kaggle.com/c/bike-sharing-demand
CSC Proprietary and Confidential 8December 14, 2016
How does the data set look like?
Data Fields:
• Datetime: hourly date + timestamp
• Season: 1 = spring, 2 = summer, 3 = fall, 4 = winter
• Holiday: whether the day is considered a holiday
• Workingday: whether the day is neither a weekend nor holiday
• Weather:
 1: Clear, Few clouds, Partly cloudy, Partly cloudy
 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
• Temp: temperature in Celsius
• Atemp: "feels like" temperature in Celsius
• Humidity: relative humidity
• Windspeed: wind speed
• Casual: number of non-registered user rentals initiated
• Registered: number of registered user rentals initiated
• Count: number of total rentals, prediction target
• Size of the training data: 10886 rows, 12 columns
• 2 years, hourly
CSC Proprietary and Confidential 9December 14, 2016
Data preparation
• With target: The first 19 days of every month
• Without target: The rest days
Training data:
first 19 days, hourly
Test data:
rest days, hourly
85%: training set 15%:
validation
set
CSC Proprietary and Confidential 10December 14, 2016
Evaluation of prediction
Due to the evaluation rules from kaggle, the results will be evaluated by
the Root Mean Squared Logarithmic Error(RMSLE) defined as
1
n
𝑖=1
𝑛
log( 𝑦 𝑝𝑟𝑒𝑑 + 1) − log( 𝑦 𝑟𝑒𝑎𝑙 + 1)
2
• 𝑛 is the number of predictions
• 𝑦 𝑝𝑟𝑒𝑑 is predicted count
• 𝑦 𝑟𝑒𝑎𝑙 is the actual count
• log( 𝑥) is the natural logarithm
CSC Proprietary and Confidential 11December 14, 2016
Why RMSLE?
Data includes a large range of values, suppose
Real value 10 10000
Prediction 11 11000
• 𝑅𝑀𝑆𝐸: 11000 − 10000 ≫ 11 − 10
• 𝑅𝑀𝑆𝐿𝐸: log 11000 − log 10000 = log
11000
10000
= log 11 − log 10
Calculation of errors on each data point
CSC Proprietary and Confidential 12December 14, 2016
The weather effect
Weather:
1. Clear, Few clouds, Partly cloudy,
Partly cloudy
2. Mist + Cloudy, Mist + Broken clouds,
Mist + Few clouds, Mist
3. Light Snow, Light Rain +
Thunderstorm + Scattered clouds,
Light Rain + Scattered clouds
4. Heavy Rain + Ice Pallets +
Thunderstorm + Mist, Snow + Fog
Generally, Good weather
increases the bike
demands.
75% Quartile
25% Quartile
Median
CSC Proprietary and Confidential 13December 14, 2016
Q: What happened in the category 4?
“The overachieving snowfall
of January 9, 2012”
source: Blog of the Washington Post
CSC Proprietary and Confidential 14December 14, 2016
The Julian dates and times
Calendar date Julian date
January 1, 4713 B.C.E, at 12pm 0
January 2, 4713 B.C.E, at 12pm 1
March 4, 2016 C.E., at 14pm 2457452.0833333334885537624359130859375
• Continuous count of days
• Representation of date/time within a single variable
• Used primarily by astronomers
• Precision: 1 millisecond (0.001 seconds)
demo
CSC Proprietary and Confidential 15December 14, 2016
The heatmap
CSC Proprietary and Confidential 16December 14, 2016
The scikit-learn library
• A machine learning library in Python
• Includes various classification, regression and clustering algorithms
• Simple and efficient
• BSD license – open source, commercially usable
CSC Proprietary and Confidential 17December 14, 2016
How does linear regression work?
𝒙 10 13 16 20 5
𝑦 10 6 4 0 ?
demo𝑦 ≈ 𝑎𝑥 + 𝑏
CSC Proprietary and Confidential 18December 14, 2016
How does linear regression work?
𝒙 10 13 16 20 5
𝑦 10 6 4 0 14
demo𝑦 ≈ 𝑎𝑥 + 𝑏
CSC Proprietary and Confidential 19December 14, 2016
Visualizations of the forecast with linear regression
CSC Proprietary and Confidential 20December 14, 2016
A decision tree for cycling
• Attributes and their values:
– Weather: Sunny, Cloudy, Rain
– Humidity: High, Normal
– Wind: Strong, Weak
• Target concept cycling: Yes, No
Weather
Sunny Cloudy Rain
Yes
Humidity Wind
StrongNormalHigh Weak
No YesNo Yes
Root node
branches node
leaf node
Target P(X)
Yes 2/3
No 1/3
CSC Proprietary and Confidential 21December 14, 2016
Advantages and disadvantages of decision tree
Solution: aggregating many decision Trees, using method like random forest
• Easy to explain
• Representation as human decision-making
• Graphical interpretation possible
• Handling qualitative predictors naturally
• Problem:
–Predictive accurary generally not the best
–Sometimes very non-robust
CSC Proprietary and Confidential 22December 14, 2016
The random forest algorithm
• Collection of
unpruned decision
trees
• Combination of
individual tree
decisions
• Improve prediction
accuracy
• Encouraging diversity
among the trees
• Bagging, random
decision trees
• Automatic feature
selection
• Output importance of
variable
source: http://www.analyticsvidhya.com/blog/2015/09/random-forest-algorithm-multiple-challenges/
CSC Proprietary and Confidential 23December 14, 2016
Visualizations of the forecast with random forest regressor
demo
CSC Proprietary and Confidential 24December 14, 2016
The feature ranking from random forest
CSC Proprietary and Confidential 25December 14, 2016
The hourly trend
Three categories of bike demand:
• Peak: 7~9 and 16~19 hours
• Average: 10~15 hours
• Low: 0~6 and 20~24 hours
CSC Proprietary and Confidential 26December 14, 2016
About TensorFlow
• A Google software library for machine intelligence
• Developed by the Google Brain team, open source since Nov. 9th, 2015
• Scalable for cross-platform such as CPUs or GPUs in servers, desktops and also
capable on mobile devices
• Currently used for both research and production
CSC Proprietary and Confidential 27December 14, 2016
The artificial neural network
Hidden Layer
Input
Output
an ANN with one hidden layer
𝑤11 𝑣11
𝑥1
𝑦1
ℎ1
𝑥3
𝑥2
𝑤21
𝑤31
ℎ2
ℎ5
ℎ4
ℎ3
𝑣21
𝑣31
𝑣41
𝑣51
𝑦2
ℎ1 = φ
𝑖=1
3
𝑤𝑖1 𝑥𝑖
…
𝑦1 =
𝑖=1
5
𝑣𝑖1ℎ𝑖
φ: activation function
e.g. 𝜑(𝑥) = tanh 𝑥
CSC Proprietary and Confidential 28December 14, 2016
Visualizations of the forecast from ANN built with TensorFlow
CSC Proprietary and Confidential 29December 14, 2016
The machine learning benchmarks on the biking rental data
ML-methods RMSLE on validation set training time (s)
linear regression 0,9913 0,3210065
random forest 0,3525 0,692971
ANN with TensorFlow 0,3369 348,807
The RMSLE score with ANN on Kaggle‘s test data:
0.40173.
source: https://www.kaggle.com/c/bike-sharing-demand/leaderboard (last check on Feb. 25th. 2016)
CSC Proprietary and Confidential 30December 14, 2016
Outlook
• Combination with time series analysis
• Feature engineering
– Categorization of hours: peak, average, low
• Separate models for registered and
casual users
Source: http://brandchannel.com/2015/11/16/google-tensorflow-ai-111615/
CSC Proprietary and Confidential 31December 14, 2016
Credits
• Kaggle
• Wikipedia, https://en.wikipedia.org
• Python Software Foundation, https://www.python.org/
• Stack Overflow, http://stackoverflow.com
• scikit-learn, http://scikit-learn.org
• TensorFlow, https://www.tensorflow.org/
• Financial Time Series Prediction Using Machine Learning
Algorithms, Master Thesis, LESLIE TIONG CHING OW, Aug. 2012
• Date/Time Plotting, IDL Online Help,
http://www.physics.nyu.edu/grierlab/idl_html_help/plotting14.html
• Dr. Florian Wilhelm
• The Big Data Analytics Team
CSC Proprietary and Confidential 32December 14, 2016
Thank You
Zilong Zhao
Big Data & Analytics
BCRM, Wiesbaden
zzhao3@csc.com
CSC Proprietary and Confidential 33December 14, 2016
CSC Proprietary and Confidential 34December 14, 2016
Converting Gregorian calendar date to Julian day number
• First, computing the number of years(𝑦) and months(𝑚) since March 1, 4801 B.C.E.:
• Then, computing:
• At last, for the full Julian Date with time:
CSC Proprietary and Confidential 35December 14, 2016
Activation function
Definition. The activation function of a node in a neural network defines
the output of the node, given a set of predetermined inputs.
With the activation function, non-linearity is introduced into the neural
network.
φ 𝑥 = tanh 𝑥 φ 𝑥 = 𝑒−𝑥2
CSC Proprietary and Confidential 36December 14, 2016
Time series analysis

Weitere ähnliche Inhalte

Ähnlich wie Prediction of the bike rental demand in Washington

SharkFest16_Palm_Online
SharkFest16_Palm_OnlineSharkFest16_Palm_Online
SharkFest16_Palm_Online
Brad Palm
 

Ähnlich wie Prediction of the bike rental demand in Washington (20)

PLOTCON NYC: Interactive Visual Statistics on Massive Datasets
PLOTCON NYC: Interactive Visual Statistics on Massive DatasetsPLOTCON NYC: Interactive Visual Statistics on Massive Datasets
PLOTCON NYC: Interactive Visual Statistics on Massive Datasets
 
Opportunistic persistent data storage
Opportunistic persistent data storage Opportunistic persistent data storage
Opportunistic persistent data storage
 
Introduction to big data
Introduction to big data Introduction to big data
Introduction to big data
 
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
 
The Gnocchi Experiment
The Gnocchi ExperimentThe Gnocchi Experiment
The Gnocchi Experiment
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Time Value of Money And How It Applies To Pensions
Time Value of Money And How It Applies To PensionsTime Value of Money And How It Applies To Pensions
Time Value of Money And How It Applies To Pensions
 
Geospatial Data and Key Characteristics of Geospatial Data Analysis and Science
Geospatial Data and Key Characteristics of Geospatial Data Analysis and ScienceGeospatial Data and Key Characteristics of Geospatial Data Analysis and Science
Geospatial Data and Key Characteristics of Geospatial Data Analysis and Science
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Sustrans Scotland Raising the Standards Day 2017: Monitoring and Evaluation
Sustrans Scotland Raising the Standards Day 2017: Monitoring and EvaluationSustrans Scotland Raising the Standards Day 2017: Monitoring and Evaluation
Sustrans Scotland Raising the Standards Day 2017: Monitoring and Evaluation
 
What's My Security Policy Doing to My Help Desk w/ Chris Swan
What's My Security Policy Doing to My Help Desk w/ Chris SwanWhat's My Security Policy Doing to My Help Desk w/ Chris Swan
What's My Security Policy Doing to My Help Desk w/ Chris Swan
 
Occupancy level estimation using pir sensors only
Occupancy level estimation using pir sensors onlyOccupancy level estimation using pir sensors only
Occupancy level estimation using pir sensors only
 
SharkFest16_Palm_Online
SharkFest16_Palm_OnlineSharkFest16_Palm_Online
SharkFest16_Palm_Online
 
Bike sharing analysis san francisco
Bike sharing analysis san franciscoBike sharing analysis san francisco
Bike sharing analysis san francisco
 
Energy usage insights_with_hadoop_and_h_base
Energy usage insights_with_hadoop_and_h_baseEnergy usage insights_with_hadoop_and_h_base
Energy usage insights_with_hadoop_and_h_base
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
8 Things You Need to Know About DRaaS
8 Things You Need to Know About DRaaS8 Things You Need to Know About DRaaS
8 Things You Need to Know About DRaaS
 
Data Science Perspective, Manish Kurse, 2016
Data Science Perspective, Manish Kurse, 2016Data Science Perspective, Manish Kurse, 2016
Data Science Perspective, Manish Kurse, 2016
 
UNL Transcript
UNL TranscriptUNL Transcript
UNL Transcript
 

Prediction of the bike rental demand in Washington

  • 1. CSC Proprietary and Confidential Prediction of the bike rental demand in Washington, D.C. Zilong Zhao Associate Professional: Data Scientist March 4th.2016
  • 2. CSC Proprietary and Confidential 2December 14, 2016 I. Motivation 3-7 II. Data Exploration 8-13 III. Predictive Analysis 14-28 IV. Summary and Outlook 29-30 Table of Content
  • 3. CSC Proprietary and Confidential 3December 14, 2016 What is a city bikeshare system?
  • 4. CSC Proprietary and Confidential 4December 14, 2016 Source: http://regionalbraunschweig.de/fahrradparkhaeuser-fuer-die-loewenstadt/, Foto: SIna Rühland
  • 5. CSC Proprietary and Confidential 5December 14, 2016 What is a city bikeshare system? An automatic bike station powered by solar energy Source: https://draufabfahren.de/unterwegs/der-perfekte-staedtetrip-mit-call-a-bike-36061
  • 6. CSC Proprietary and Confidential 6December 14, 2016 Why is predictive analytics useful for bike sharing company? • Bike available everytime and everywhere vs. avoiding over-capacities • Bike positioned: how and when • Reduction of bottlenecks caused by regular bike maintenance • High availability of bikes: customer satisfaction increases
  • 7. CSC Proprietary and Confidential 7December 14, 2016 Introduction of a Kaggle project Forecast use of the bikeshare system in Washington, D.C. About Kaggle • A platform for various projects of predictive modelling and analytics competitions • Partnered with NASA, Wikipedia, Deloitte etc. • Milestones:  Gesture recognition for Microsoft Kinect,  Data analysis for the Higgs boson project at CERN, Geneva, Switzerland  Netflix US$1.000.000 prize for prediction of user ratings for films Source: https://www.kaggle.com/c/bike-sharing-demand
  • 8. CSC Proprietary and Confidential 8December 14, 2016 How does the data set look like? Data Fields: • Datetime: hourly date + timestamp • Season: 1 = spring, 2 = summer, 3 = fall, 4 = winter • Holiday: whether the day is considered a holiday • Workingday: whether the day is neither a weekend nor holiday • Weather:  1: Clear, Few clouds, Partly cloudy, Partly cloudy  2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist  3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds  4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog • Temp: temperature in Celsius • Atemp: "feels like" temperature in Celsius • Humidity: relative humidity • Windspeed: wind speed • Casual: number of non-registered user rentals initiated • Registered: number of registered user rentals initiated • Count: number of total rentals, prediction target • Size of the training data: 10886 rows, 12 columns • 2 years, hourly
  • 9. CSC Proprietary and Confidential 9December 14, 2016 Data preparation • With target: The first 19 days of every month • Without target: The rest days Training data: first 19 days, hourly Test data: rest days, hourly 85%: training set 15%: validation set
  • 10. CSC Proprietary and Confidential 10December 14, 2016 Evaluation of prediction Due to the evaluation rules from kaggle, the results will be evaluated by the Root Mean Squared Logarithmic Error(RMSLE) defined as 1 n 𝑖=1 𝑛 log( 𝑦 𝑝𝑟𝑒𝑑 + 1) − log( 𝑦 𝑟𝑒𝑎𝑙 + 1) 2 • 𝑛 is the number of predictions • 𝑦 𝑝𝑟𝑒𝑑 is predicted count • 𝑦 𝑟𝑒𝑎𝑙 is the actual count • log( 𝑥) is the natural logarithm
  • 11. CSC Proprietary and Confidential 11December 14, 2016 Why RMSLE? Data includes a large range of values, suppose Real value 10 10000 Prediction 11 11000 • 𝑅𝑀𝑆𝐸: 11000 − 10000 ≫ 11 − 10 • 𝑅𝑀𝑆𝐿𝐸: log 11000 − log 10000 = log 11000 10000 = log 11 − log 10 Calculation of errors on each data point
  • 12. CSC Proprietary and Confidential 12December 14, 2016 The weather effect Weather: 1. Clear, Few clouds, Partly cloudy, Partly cloudy 2. Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3. Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4. Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog Generally, Good weather increases the bike demands. 75% Quartile 25% Quartile Median
  • 13. CSC Proprietary and Confidential 13December 14, 2016 Q: What happened in the category 4? “The overachieving snowfall of January 9, 2012” source: Blog of the Washington Post
  • 14. CSC Proprietary and Confidential 14December 14, 2016 The Julian dates and times Calendar date Julian date January 1, 4713 B.C.E, at 12pm 0 January 2, 4713 B.C.E, at 12pm 1 March 4, 2016 C.E., at 14pm 2457452.0833333334885537624359130859375 • Continuous count of days • Representation of date/time within a single variable • Used primarily by astronomers • Precision: 1 millisecond (0.001 seconds) demo
  • 15. CSC Proprietary and Confidential 15December 14, 2016 The heatmap
  • 16. CSC Proprietary and Confidential 16December 14, 2016 The scikit-learn library • A machine learning library in Python • Includes various classification, regression and clustering algorithms • Simple and efficient • BSD license – open source, commercially usable
  • 17. CSC Proprietary and Confidential 17December 14, 2016 How does linear regression work? 𝒙 10 13 16 20 5 𝑦 10 6 4 0 ? demo𝑦 ≈ 𝑎𝑥 + 𝑏
  • 18. CSC Proprietary and Confidential 18December 14, 2016 How does linear regression work? 𝒙 10 13 16 20 5 𝑦 10 6 4 0 14 demo𝑦 ≈ 𝑎𝑥 + 𝑏
  • 19. CSC Proprietary and Confidential 19December 14, 2016 Visualizations of the forecast with linear regression
  • 20. CSC Proprietary and Confidential 20December 14, 2016 A decision tree for cycling • Attributes and their values: – Weather: Sunny, Cloudy, Rain – Humidity: High, Normal – Wind: Strong, Weak • Target concept cycling: Yes, No Weather Sunny Cloudy Rain Yes Humidity Wind StrongNormalHigh Weak No YesNo Yes Root node branches node leaf node Target P(X) Yes 2/3 No 1/3
  • 21. CSC Proprietary and Confidential 21December 14, 2016 Advantages and disadvantages of decision tree Solution: aggregating many decision Trees, using method like random forest • Easy to explain • Representation as human decision-making • Graphical interpretation possible • Handling qualitative predictors naturally • Problem: –Predictive accurary generally not the best –Sometimes very non-robust
  • 22. CSC Proprietary and Confidential 22December 14, 2016 The random forest algorithm • Collection of unpruned decision trees • Combination of individual tree decisions • Improve prediction accuracy • Encouraging diversity among the trees • Bagging, random decision trees • Automatic feature selection • Output importance of variable source: http://www.analyticsvidhya.com/blog/2015/09/random-forest-algorithm-multiple-challenges/
  • 23. CSC Proprietary and Confidential 23December 14, 2016 Visualizations of the forecast with random forest regressor demo
  • 24. CSC Proprietary and Confidential 24December 14, 2016 The feature ranking from random forest
  • 25. CSC Proprietary and Confidential 25December 14, 2016 The hourly trend Three categories of bike demand: • Peak: 7~9 and 16~19 hours • Average: 10~15 hours • Low: 0~6 and 20~24 hours
  • 26. CSC Proprietary and Confidential 26December 14, 2016 About TensorFlow • A Google software library for machine intelligence • Developed by the Google Brain team, open source since Nov. 9th, 2015 • Scalable for cross-platform such as CPUs or GPUs in servers, desktops and also capable on mobile devices • Currently used for both research and production
  • 27. CSC Proprietary and Confidential 27December 14, 2016 The artificial neural network Hidden Layer Input Output an ANN with one hidden layer 𝑤11 𝑣11 𝑥1 𝑦1 ℎ1 𝑥3 𝑥2 𝑤21 𝑤31 ℎ2 ℎ5 ℎ4 ℎ3 𝑣21 𝑣31 𝑣41 𝑣51 𝑦2 ℎ1 = φ 𝑖=1 3 𝑤𝑖1 𝑥𝑖 … 𝑦1 = 𝑖=1 5 𝑣𝑖1ℎ𝑖 φ: activation function e.g. 𝜑(𝑥) = tanh 𝑥
  • 28. CSC Proprietary and Confidential 28December 14, 2016 Visualizations of the forecast from ANN built with TensorFlow
  • 29. CSC Proprietary and Confidential 29December 14, 2016 The machine learning benchmarks on the biking rental data ML-methods RMSLE on validation set training time (s) linear regression 0,9913 0,3210065 random forest 0,3525 0,692971 ANN with TensorFlow 0,3369 348,807 The RMSLE score with ANN on Kaggle‘s test data: 0.40173. source: https://www.kaggle.com/c/bike-sharing-demand/leaderboard (last check on Feb. 25th. 2016)
  • 30. CSC Proprietary and Confidential 30December 14, 2016 Outlook • Combination with time series analysis • Feature engineering – Categorization of hours: peak, average, low • Separate models for registered and casual users Source: http://brandchannel.com/2015/11/16/google-tensorflow-ai-111615/
  • 31. CSC Proprietary and Confidential 31December 14, 2016 Credits • Kaggle • Wikipedia, https://en.wikipedia.org • Python Software Foundation, https://www.python.org/ • Stack Overflow, http://stackoverflow.com • scikit-learn, http://scikit-learn.org • TensorFlow, https://www.tensorflow.org/ • Financial Time Series Prediction Using Machine Learning Algorithms, Master Thesis, LESLIE TIONG CHING OW, Aug. 2012 • Date/Time Plotting, IDL Online Help, http://www.physics.nyu.edu/grierlab/idl_html_help/plotting14.html • Dr. Florian Wilhelm • The Big Data Analytics Team
  • 32. CSC Proprietary and Confidential 32December 14, 2016 Thank You Zilong Zhao Big Data & Analytics BCRM, Wiesbaden zzhao3@csc.com
  • 33. CSC Proprietary and Confidential 33December 14, 2016
  • 34. CSC Proprietary and Confidential 34December 14, 2016 Converting Gregorian calendar date to Julian day number • First, computing the number of years(𝑦) and months(𝑚) since March 1, 4801 B.C.E.: • Then, computing: • At last, for the full Julian Date with time:
  • 35. CSC Proprietary and Confidential 35December 14, 2016 Activation function Definition. The activation function of a node in a neural network defines the output of the node, given a set of predetermined inputs. With the activation function, non-linearity is introduced into the neural network. φ 𝑥 = tanh 𝑥 φ 𝑥 = 𝑒−𝑥2
  • 36. CSC Proprietary and Confidential 36December 14, 2016 Time series analysis