BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Â
Transforming instagram data into location intelligence
1. Data Science Innovation:
Transforming Instagram Data
Into Location Intelligence and Internet of Things
April 2014
Suresh.sood@uts.edu.au
or
linkedin.com/in/sureshsood
2. Topic Areas
1. Statistics/Data mining or Data Science?
2. Data Science workflows/discovery
3. Research informing our thinking about location intelligence
4. Data Science innovation and exploratory analysis
5. Motivations for Instagram project
6. Pattern mining trajectories/Data mining
7. Instagram analytics tools
8. NoSQL- MongoDB
9. Datafication 3 back end (walk thru)
10. Location Social Recommender system
11. Q&A
3. Statistics, Data Mining or Data Science ?
⢠Statistics
â precise deterministic causal analysis over precisely collected data
⢠Data Mining
â deterministic causal analysis over re-purposed data carefully sampled
⢠Data Science
â trending/correlation analysis over existing data using bulk of
population i.e. big data
Adapted from:
NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
5. Useful References Informing our Thinking about
Location Intelligence
(Silva et al (2013) A comparison of Foursquare and Instagram to the study of city
dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD
International Workshop on Urban Computing
Instagram and Foursquare datasets might be compatible in finding popular regions of
city
Chaoming Song, et al. (2010), Limits of Predictability in Human Mobility, Science
There is a potential 93% average predictability in user mobility, an exceptionally high
value rooted in the inherent regularity of human behavior. Yet it is not the 93%
predictability that we find the most surprising. Rather, it is the lack of variability in
predictability across the population.
Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for
Pervasive Systems. Proceedings of the 9th International Conference on Pervasive
Computing (Pervasive'11)
Daily and weekly routines => Few significant places every day => Regularity in human
activities => Regularity leads to predictability
6. Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human
Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge
Workshop.
we have shown that it is possible to exploit the correlation between movement data and
social interactions in order to improve the accuracy of forecasting of the future geographic
position of a user. In particular, mobility correlation, measured by means of mutual
information, and the presence of social ties can be used to improve movement forecasting
by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of
potential existence of physical or distant social interactions and vice versa.
Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human Mobility
Where are you going to be 285 days from now at 2pm âŚwe show that it is possible to
predict location of a wide variety of hundreds of subjects even years into the future and
with high accuracy.
Useful References Informing our Thinking about
Location Intelligence
7. âOne of the most fascinating aspects of location-based
data is the stability and predictability of patterns that can
be mined from seemingly unrelated data. A cluster of
random dots on a map can represent a daily
transportation route, the most popular dating spots or
the neighborhoods with the highest concentration of
gang violence. These patterns, analyzed over time and in
large numbers, begin to allow for informed predictions of
behaviors and events.
For government, this analytical capability enables better
resource allocation and more effective outcomesâ.
Interview with G. Edward DeSeve, former White House ARRA chief administrator,
December 15, 2011. Seen in âThe power of zoom: Transforming government
through location intelligenceâ by Deloitte Consulting LLP
Source: https://www.deloitte.com/assets/Dcom-
UnitedStates/Local%20Assets/Documents/Federal/us_fed_govlab_power_of_zo
om_report_100212.pdf
Useful References Informing our Thinking about
Location Intelligence
8. Useful NSW Govt resources on Location Intelligence
⢠NSW Globe â globe.six.nsw.gov.au
â Uses Google Earth to explore spatial data and images
⢠NSW Location Intelligence Strategy (April 2014)
â http://www.finance.nsw.gov.au/ict/sites/default/files/
NSW Location Intelliegence Strategy.pdf
⢠NSW Government datasets
â http://data.nsw.gov.au/
9. Data Science Innovation
Data Science innovation is something an
organization has not done before or even
something nobody anywhere has done before. A
data science innovation focuses on discovering
and using new or untraditional data sources to
solve new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
10. The ANZ Heavy Traffic Index comprises flows
of vehicles weighing more than 3.5 tonnes
(primarily trucks) on 11 selected roads around
NZ. It is contemporaneous with GDP growth.
The ANZ Light Traffic Index is made up of light
or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth
http://www.anz.co.nz/commercial-institutional/economic-markets-research/truckometer/
11. Discovery (Exploratory) Analytics
ďź Exploratory
â Unstructured
â Machine learning
â Data mining
â Complex analysis
â Data diversity
ďź Richness of new sources
X Business Intelligence
â Dashboard
â Real time decisioning
â Alerts
â Fresh data
â Response time
ďź Speed of Query
12. Data Science Innovation
New sources of information for data driven applications and Internet of Things
Number of journeys made
Distances travelled
Types of roads used
Speed
Time of travel
Levels of acceleration and braking
Any accidents which may occur
The Industrial Ecology Lab -
towards an integrated
Australian research platform
13. Black Box Insurance
⢠Telematics technology (black box) helps assess the driving
behavior and deliver true driver centric premiums by
capturing:
â Number of journeys
â Distances travelled
â Types of roads
â Speed
â Time of travel
â Acceleration and braking
â Any accidents
⢠Benefits low mileage, smooth and safe drivers
⢠Privacy vs. Saving monies on insurance (Canada)
â http://bit.ly/Black_box
15. Smartphone, Google Glass or Apple Watchwill
Know What you Want before you do
ââŚfrom 2014 your phone [glasses or watch] will
anticipate your needs, do the research, tell you
what what you want to know â sometimes
before the question even occurs to youâŚâ
Chapman, Jake (2013), The Wired World in 2014
19. Motivations for Instagram Project
⢠Trajectory data (not i.i.d. â independent and identically distributed)
⢠A new authentication approach based on trajectory
⢠Predictive capability phones, glasses and watches
⢠Internet of Things (Sensors, RFID, Wheelchairs and Drones)
⢠Indoor GPS
⢠Car parking âanywhereâ
⢠Location based services e.g. advertising
⢠Tourist recommender system
⢠Food analytics and traceability (farmď fork)
⢠Mobile apps with trajectory data e.g. Foursquare, Instagram, Nike+ EveryTrial
⢠Insurance âpay as you driveââ telematics black box based insurance policy
20. Pattern Mining Trajectories
Group
of
Trajectories
Trajectory Patterns:
1. Hot regions (basic unit)
2. Trajectory pattern is
relationships amongst regions
Opportunities : Location based networks
Destination prediction
Car-pooling
Personal route planning
Group buying
Loyalty
Credit card data
Adapted from: Chang, Wei, Yeh and Peng, âDiscovering Personalised Routes from Trajectoriesâ
ACM, LBSNâ11, Chicago,illinois,USA, 1 November 2011
23. Why is Instagram Popular ?
⢠Mobile photo sharing app + social network
⢠Mobile first Workflow:
â take picture or select => crop/filter => geo-tag/hashtag/description/share
⢠Instagram is âTwitter but with photo updatesâ
⢠Status updates are transformed photos
⢠Default is pictures and accounts are public
⢠Pictures include:
â Geolocation, hashtags, comments and likes
⢠Mobile app friendly vs. desktop
24. Instagram Analytics Tools (off the shelf)
⢠Statigram
â Lifetime likes
â Total comments
â New followers/last 7 days
â Most liked photos
⢠Simply Measured
â Total engagement Instagram, Facebook and Twitter
â Engaging photo/filter/location
â Top photos by date
â Active commenters
â Best time for engagement
â Best day for engagement
â Top filters
⢠Nitrogram
â Countries of followers
â Most engaging
â Most commented
â Likes and comments on a photo
25. MongoDB - An Innovation in Databases?
âMongoDB gets the job doneâ
âdocument-oriented NoSQL databaseâ
âMongoDB is natural choice when dealing with JSONâ
âSame data model in code = same model in databaseâ
âData structure store to model applicationsâ
âIn MongoDB Instagram post can be stored in single collection and stored exactly as represented in the program as one
object. In a relational database an Instagram post would occupy multiple tables.â
âMongoDB understands geo-spatial co-ordinates and supports geo-spatial indexingâ
âInitial MongoDB prototype RedHat OpenShift (Public/Private or Community âPlatform as a Serviceâ)
Recommendation engine integrating Mahout libraries and MongoDB (see Roadmap)
As discussed @ Journey to MongoDB:Trajectory Pattern Mining in Australian Instagram
By Suresh Sood and Xinhua Zhu
**Sydney MongoDB Meetup 30 April 2013
26. JSON Sources Driving Internet of Things
⢠RaZberry
â http://www.theregister.co.uk/Print/2013/09/16/zwave_pi_its_time_the_raspberry_pi_took_control/
⢠Teradata
â http://www.teradata.com.au/newsrelease.aspx?LangType=3081
⢠Google
â http://googledevelopers.blogspot.com.au/2012/10/got-big-json-bigquery-expands-data.html
27. ⢠Rich query language
⢠Native secondary indexes
⢠Geospatial indexes & search
⢠Text indexes & search
⢠Aggregation framework (see Mongo doc for Release 2.4.9)
⢠Map-Reduce (Javascript ) implementation
⢠Client-side analytics
MongoDB Analytics Support of Instagram Project
28. Architectural Implementation using MongoDB
Name Node
Mongo Database distributed across shards
Data
Collection
Data
Collection Stats Stats
Map Reduce
Instagram via API
39. MongoDB Mahout or Mortar Recommender
Recommended
Trajectories
⢠Trajectories
⢠Points of Interest
⢠User profiles
⢠Image details
⢠Recommender engine
(Mahout or Mortar)
Algorithms
MongoDB
Connector for
Hadoop
Version 1.2.0
40. Supporting Documentation
⢠Instagram project documentation
â Data Model and Data Collection Procedure (V2.0)
⢠MongoDB Aggregation and Data Processing
Release 2.4.9