The document discusses data science innovation and the future of professions in light of new technologies. It describes how accounting work may be automated or replaced by computer-assisted techniques and predictive analytics software. This would allow accountants to shift from reactive to proactive work by leveraging accounting data and insights to predict client scenarios and advise clients. Key areas discussed include systems of insight using big data, machine engineering to create applications from insights, and the role of data science.
1. Data Science Innovation:
Systems of insight & Machine Engineering
@Soody
linkedin.com/in/sureshsood
http://www.slideshare.net/ssood/systemof-insight
2.
3. The Future of the Professions
(Susskind & Susskind 2015)
• Tax and audit work replaced by computer assisted techniques
• Technology automating and innovating
• Accounting work reconfiguring
• New business models
• Move from bespoke to “off the peg”
• Mastery of data with new tools and techniques - Big Data
• Diversification
• Shift to proactivity from reactivity
• Professionals replaced by less expert people and high performing systems
• Post-professional society expertise available online
4. The Future of the Professions How Technology Will Transform the Work of Human Experts, Richard Susskind and Daniel
Susskind (2015)
5. 'The Predictive Accountant’ Persona
1. CA SMP Practice and Member
2. Data savvy
3. Focus shifts from being reactive to proactive and predictive
4. Leverages accounting data and predictive analytics software to find patterns in data and insights
5. Uses the tools and dashboards to predict client scenarios before time: maximising opportunity,
limiting risks and proactively advising.
6. CA ANZ SMP’s benefit from analytics by adding value when connecting SME client challenges and
opportunities to identified customer patterns. Sharing these insights delivers more value in the
accounting conversations and helps tackle the real business problems facing clients.
9
6. Key Drivers Informing Our Thinking
1. New ways of looking at traditional accounting & client data
2. Innovation from new data sources built on democratisation of data
3. Democratisation of data science - Predictive capability of big data
(correlations & data science)
4. Systems of Insight achieve machine engineering (insight to process or
application)
5. Embedded analytics, messaging and mobile impacts client experience
7. • A great NZ invention !
• Powerful statistical programming language
• Most widely used data analysis software
• 2M+ data scientists, statisticians and analysts
• Creates unique data visualizations
• New York Times, Twitter and Flowing Data
• Thriving open-source community
• Leading edge of analytics research
• Fill talent gap with new grads
• Highest paid IT skill (Dice.com, Jan 2014)
• Most-used data science language after SQL (O’Reilly, Jan 2014)
• Used by 70% of data miners (Rexer, Sep 2013)
• #15 of all programming languages (RedMonk, Jan 14)
• Growing faster than any other language (KDnuggets, Aug 13)
Open Source R
8. ‘The Predictive Accountant Portal
The Predictive Accountant Data Sources
Predictive
Analytics
Excel style
dashboard
Connected Practice
Digital Marketing / eNewsletters/ Integrated business
tools software
Apps Marketplace
Accounting Analytic Apps
Education
Analytic Training
10. 2020 Global Data Forecast (Bytes)
2020 estimates suggest four times more digital data than all the grains of sand on Earth
Source: Pg. 4, Building a Digital Analytics Organization: Create Value by Integrating Analytical Processes,
Technology, and People into Business Operations by Judah Phillips, FT Press, 30 Jul 2013
11. Data Science Innovation
Data science innovation is something an
organization or individual has not done
before using data. The innovation focuses
on discovery using new or
nontraditional data sources solving new
problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
12. Variety of Data Types & Big Data Challenge
1. Astronomical
2. Documents
3. Earthquake
4. Email
5. Environmental sensors
6. Fingerprints
7. Health (personal) Images
8. Graph data (social network)
9. Location
10.Marine
11.Particle accelerator
12.Satellite
13.Scanned survey data
14.Sound
15.Text
16.Transactions
17.Video
Big Data consists of extensive datasets primarily in the characteristics of
volume, variety, velocity, and/or variability that require a scalable
architecture for efficient storage, manipulation, and analysis.
. Computational portability is the movement of the computation to the location of the data.
13. HadoopConfigurations(SingleandMulti-Rack)
Adapted from: http://stackiq.com/
Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks
3 TB drives ,18 data nodes
configuration represents 648 TB
of raw storage HDFS standard
replication factor of 3
216 TB of usable storage
Name/secondary/data nodes – 6 core 96 GB
Management node – 4 core 16 GB
16. http://tacocopter.com/
New Sources of Information (Big data) : Social Media + Internet of Things Innovations
7,919 40,204
2,003,254,102 51
Gridded Data Sources
18. The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide
jackets, and so on):
SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where
(V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like
'%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like
'%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')
The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record,
spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest
open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates
spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.
GDELT + BigQuery = Query The Planet
20. Black Box Insurance
• Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management using real data generating
individually tailored premiums
• Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km per month driving or
generating 12,000 points per month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to 144 million points per annum.
• Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric premiums by capturing:
– Number of journeys
– Distances travelled
– Types of roads
– Speed
– Time of travel
– Acceleration and braking
– Any accidents
– Location ?
• Benefits low mileage, smooth and safe drivers
• Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
21. The ANZ Heavy Traffic Index comprises
flows of vehicles weighing more than 3.5
tonnes (primarily trucks) on 11 selected
roads around NZ. It is contemporaneous
with GDP growth.
The ANZ Light Traffic Index is made up of
light or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth in normal circumstances (but cannot
predict sudden adverse events such as the
Global Financial Crisis).
http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
ANZ TRUCKOMETER
22. What is Machine Learning?
Machine learning is a scientific discipline that deals
with the construction and study of algorithms that
can learn from data. Such algorithms operate by
building a model based on inputs and using that to
make predictions or decisions, rather than following
only explicitly programmed instructions.
http://en.wikipedia.org/wiki/Machine_learning
24. Netflix – A Picture of A Data Driven Company
• ~75 million users
• 8.5 million events per second
• Zero loss?
• 550 billion events per day
• Hundreds of event types
• 1.3 PB/day
• 21GB /sec (peak)
• 37% of peak US internet bandwidth
• Operates on Amazon Web Services
Source : http://techblog.netflix.com/2016/02/evolution-of-netflix-data-pipeline.html
25. Square Kilometer
Array (SKA)
• Data collected in a single day take nearly two million years to playback on an MP3 player
• Central computer has processing power of about one hundred million PCs.
• SKA will use enough optical fiber linking up all the radio telescopes to wrap twice around the Earth.
• Dishes of SKA when fully operational will produce 10 times the global internet traffic as of 2013.
• Aperture arrays in the SKA could produce more than 100 times the global internet traffic as of 2013.
• The SKA will generate enough raw data to fill 15 million 64 GB MP3 players every day.
• The SKA supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky
Way galaxies - in order to process all the data that the SKA will produce.
• So sensitive that it will be able to detect an airport radar on a planet 50 light years away.
• Thousands of antennas with collecting area of about one square kilometer (that's 1,000,000 square meters).
• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations or several years. SKA ETA 5 minutes !
• In first six hours of operation, SKA will generate more information than all previous radio telescopes
• in the world combined.
• The Square Kilometer Array will link 250,000 radio telescopes together, creating most sensitive telescope.
To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which,
according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all
came into existence. As a scientist, this is a once in a lifetime opportunity.”
Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska
Centaurus A
26. • Next generation radio telescope
• 100 x more sensitive & 1,000,000 X faster
• 5 square km of dish over 3000 km
• Two sites: Western Australia & Karoo Desert RSA
• Worlds most ambitious IT Project
• First real exascale ready application
• Largest global big-data challenge
• SKA SDP exascale systems:
• 100,000 nodes
• 800 cabinets
• consume 20 MW
• Expected failure rates of 300 nodes per week
Square Kilometre Array
http://www.ska.gov.au/
28. 8 Steps Towards Building the Data Centric Business
1. Put digital service (Vargo & Lusch) at centre of business blurring distinction with
physical products via sensors and apps
2. Identify data and monetisation opportunities using business model canvas
3. Select unique sources of data to help drive innovation
4. Uses data to drive interactions and customer experiences
5. Understand the data lifecycle from creation to storage
6. Value extraction from data (economic or social)
7. Review patterns of big data businesses
8. Got on top of big data technology trends and analytics software