This document discusses issues related to big data and artificial intelligence. It provides a brief history of computing and data storage, noting how data volumes and processing power have increased dramatically in recent decades. It then discusses some challenges for public consideration, including how data is used for predictive analysis and targeted advertising. The document also discusses how data from various sources can reveal personal details and be used to infer people's personalities and attributes.
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Olli big data_andai
1. LITTLE Issues with
big Data and AI
Jim Isaak
2015 SSIT Vice President
2010 Computer Society President
Nov. 2017 v2
2. Society on Social Implications of Technology
What’s coming?
A quick history of how big data is, and
why the 21st century is not the same as
the previous millennium
And then some of the “So What?”
– Challenges the public needs to consider
– That technologists need to consider
– That Policy makers need to consider
– But first, a word from our Sponsor:
11/20/20172
3. Society on Social Implications of Technology
Impacts of Technology on Society
www.IEEESSIT.org
11/20/20173
4. Society on Social Implications of Technology
When I was a boy … (1972)
Computers typically had 32k bytes of
RAM
And 2.5 MB disk drives
And took forever to do things we
consider common place now
Moore’s Law – double in density/2 yrs
(speed, and ½ price)
11/20/20174
5. Society on Social Implications of Technology
But now…
Intel’s latest “desktop” chip is 4GHz
(1,000,000,000,000 faster than my 1970’s system)
Consider a person walking at 4 mph
now 6x the speed of light
My local storage has gone from two novels
to the Library of Congress
11/20/20175
6. Society on Social Implications of Technology
Bytes per (8bits):
11/20/20176
Item Bytes
Short novel 1 Megabyte
1,000,000
A pickup truck
Filled with books
1 Gigabyte
1,000,000,000
The Library of
Congress – print
collection
10 Terabytes
10,000,000,000,000
Note: storage is measured in Bytes,
Communications in bits …
“Broadband” network typically 500 kilobits and up
50,000 bytes – or 20 seconds per book
7. Society on Social Implications of Technology
A comparison in human terms
It takes me six seconds to get an
ingredient from the frig (load from RAM)
(Ideally a single cycle for a processor)
For a 4GHz processor, Rotational delay
2ms => 1.5 years!
For seek time plus rotation delay is
4ms => 4.5 yrs and 100ms =>77 yrs
How many items can I get from the frig
while waiting for one from “the store”
(and don’t even consider net latency!)
11/20/20177
8. Society on Social Implications of Technology
The new challenge/limitation
Watts –
– Power requirements
– And BTU’s of heat generated by
thousands of processors/disk drives
It’s why Google, Amazon, et al are placing data
centers near hydro power & cooling options
..or at least cheap power
And – how are you going to use that much stuff?
11/20/20178
9. Society on Social Implications of Technology
An example – Bluffdale Utah, NSA
65 MegaWatts
(1 MW – 600+ homes)
(200MW Kennecott Utah Copper)
Aug. 2016 water use
6.6M Gallons for cooling
11/20/20179
Est 3-12 Exabytes
3,000,000,000,000,000
(i.e. one 64bit processor of
address space)
10. Society on Social Implications of Technology
The 21st century realization
(Google, et al)
1. All data has value – and you don’t know
what will be useful in the future
(Buffdale center: storing pocket litter)
2. Critically missing in traditional systems
fault tolerance, massive scalability
malleable schema’s, flexible queries
=> Community Development e.g. Hadoop
11/20/201710
11. Society on Social Implications of Technology
Going Viral
(getting Real- Time)
“Google Flu Trends appeared to detect regional
outbreaks of influenza 7–10 days before conventional
Centers for Disease Control and Prevention surveillance
systems” Clinical Infectious Diseases (2009) doi: 10.1086/630200
Simple concept: track search trends relate to symptoms, relate to
location, identify potential hot spots. – this specific concept has been
picked up with more focused algorithms applied
The Google experience did not work as well as might be desired
– big data hubris
An associate of mine was researching social media streams to track
potential ‘hot spots’ for civil unrest, terrorism, etc.
She now works for NSA
….. Now Trending….11/20/201711
12. Society on Social Implications of Technology
Patients Like Me
“We're unleashing the power of data for good
by empowering people to take control of their
health because we believe real-world evidence
can change the healthcare system”
Can trigger “instant” medical studies based on
400,000+ participants with 2500+ medical
conditions
Lithium, Bi-Polar and ALS – 16 patients in
journal article – but PLM found 69 in a day.
https://www.ted.com/talks/jamie_heywood_the_big_id
ea_my_brother_inspired
11/20/201712
13. Society on Social Implications of Technology
Now let’s see “Applications”
Summer 2016 CEO Cambridge Analytica
(11 minutes)
CEO of Cambridge Analytica March 17
(30 minutes)
Composite from these two
presentations (25 Min) (not online)
11/20/201713
14. Society on Social Implications of Technology
Election 2016
Democratic DB – every voter, likelihood of voting,
feedback from surveys on candidate preferences … do
everything you can to get the expected supporters out.
Trump “Project Alamo” w/ Cambridge Analytica:
Facebook “psych” survey + profiles + external data on
220,000,000 Americans w/4000+ data points each
“voter registration records, gun ownership records, credit
card purchase histories, and internet account
identities”=>
personally targeted ads to either:
– Gain support (funding, voting)
– Suppress turnout of targeted groups
11/20/201714
15. Society on Social Implications of Technology
What Data Sources?
Facebook profile
OCEAN like
personality test
Credit Cards
Credit Record
Browser Searches
Email “terms”
Church
attendance
CATV viewing
Car registration
Home ownership
Magazine
subscriptions
11/20/201715
“and the beat goes on…”
16. Society on Social Implications of Technology
OCEAN personality analysis
Openness, which refers to how readily an individual will
take on new experiences or acceptance of non-conventional
ideas, levels of creativity …
Conscientiousness, which applies to attention to
detail, vigilance, organization and a desire to complete a
Extraversion, which relates to assertiveness,
enjoyment of human interactions and risk-taking.
Agreeableness, which tends to be indicative of co-
operation, kindness and consideration for others.
Neuroticism, which relays levels of anxiety, ability to
deal with stress and maintaining calmness under pressure.
11/20/201716
17. Society on Social Implications of Technology
“They [the Trump campaign] were using
40–50,000 different variants of ad every day
that were continuously measuring responses
and then adapting and evolving based on that
response,”
– Martin Moore, director of Kings College’s Centre for the
Study of Media, Communication and Power, told The
Guardian in early December.
11/20/201717
18. Society on Social Implications of Technology
Predictive analysis
Predictive analysis: finding and
quantifying hidden patterns in the data
using complex mathematical models
that can be used to predict future
outcomes.
“Amazon customers like you ….”
Think “Minority Report” …
without the prescient mediums
11/20/201718
23. Society on Social Implications of Technology
From the man who “Liked” OCEAN
Dr. Michal Kosinski found that just a few
facebook “Likes” could match you to your
OCEAN profile with high probability.
3 Million Facebook Profiles (1/1000)
10+ and you know a personality as well as
their co-workers
100+ family/friends
250+ you know them better than their spouse
• Michael’s Keynote on Privacy
11/20/201723
24. Society on Social Implications of Technology
Every friend you “like”
Sexual orientation 88%
Gender, political views, race (95%)
Age, IQ,
Birds of a feather – friends like friends
∑ trivial data points => non-trivial
– Facebook + credit-card + search…
Also language use …
11/20/201724
25. Society on Social Implications of Technology
To summarize
Your face may disclose:
Humans can do gender, age, introvert, …
Political views, sexual orientation
Gay, liberal, atheism – capital crimes some places
5 pictures sufficient to get ‘gay’ at 92%
Also captured:
– Location data, continuous
– Sensors – heart rate
11/20/201725
26. Society on Social Implications of Technology
I fed a sample from my web page
into the Cambridge tool
Test 1 – 10yr old text
22 yr old male
89% liberal
69% hard working
19% contemplative
51% team oriented
22% laid back
60% leader potential
INTJ “Jungian style”
Test 2 - Recent text
30 year old male
38% conservative
67% hardworking
27% contemplative
35% competitive
22% laid back
34% leader potential
ISTJ style
11/20/201726
27. Society on Social Implications of Technology
Save the Rhinos
Noseong Park, Edoardo Serra, and
V.S. Subrahmanian document their predictive analytics
software to save rhinos
IEEE Intelligent Systems, August 2015
Tracking, and then predicting rhino movement, and
poacher movement can help target drone and ranger
patrols to save more rhinos
Ends with the caveat that your rhinos may differ
11/20/201727
28. Society on Social Implications of Technology
What if?
We used all of the Cambridge Analytica
and other available data …
And analyzed which persons were most
likely to:
– Commit suicide (most common form of gun violence)
– Attack a church congregation
– Initiate a terrorist attack
“Subject 47 has bought 3 assault rifles in
the last week and 300 clips of ammo”
What would/should we do?
11/20/201728
29. Society on Social Implications of Technology
AI – coming of age
Less than “the movies” view, But
more than folks expect
Past the tipping point, so it’s hard to
see where it can lead
11/20/201729
“Alexa …”
30. Society on Social Implications of Technology
Recent in AI: Deep Learning
Watson has spoken
– It’s not just a game show any more
– It’s natural language in context
– It’s open ended responses to open
ended questions (Siri, Hello Barbie etc.)
And the AI folks are on board
– Deep Learning to go beyond
understanding data to modeling “you”
Prof. Pedro Domingo’s , UW in his book “The Master
Algorithm: How the Quest for the Ultimate Learning
Machine Will Remake Our World”
11/20/201730
31. Society on Social Implications of Technology
Open Source Tools Emerging
For big data manage
Analytics
AI methods
And emerging open-data sources
“Data wants to be free”
=> Letting a thousand flowers bloom
11/20/201731
32. Society on Social Implications of Technology
Bit Rot
Vint Cerf, “father of the Internet”
raises the concern
Consider the media (floppy disc), and the
associated reading device(s), and the encoding
technique (PC-DOS files in ASCII with data for
WordStar) and the required environment (DOS
2.0)
Will we be able to access the data?
11/20/201732
33. Society on Social Implications of Technology
Provenance
Credibility – or is it just the number of
times the lie is re-told?
– This is one rationale for ‘citations’ in
academic literature
– And for “reproducibility” in the scientific
method … But
For Big Data can there be quality
control, authority, chain of evidence,
credible source, validation..???
There will be “data jamming” attacks
11/20/201733
34. Society on Social Implications of Technology
The Right to be Forgotten
1998 the Spanish newspaper La
Vanguardia published an announcement
regarding the forced sale of properties
A property belonged to Mario Costeja
González, who was named
In 2009, Costeja contacted the
newspaper to complain that when his
name was entered in the Google search
engine it led to the announcements
In 2010 …
11/20/201734
35. Society on Social Implications of Technology
Jurisdictions
He took his concerns to the Spanish Agency of
Data Protection
From there it went to the EU Advocate General
Then to the EU Court of Justice
Google’s online form for EU citizens or EFTA
nationals to request the removal of links if the
data linked is "inadequate, irrelevant or no
longer relevant, or excessive in relation to the
purposes for which they were processed“ 2014
11/20/201735
36. Society on Social Implications of Technology
POP Quiz
Can you name two politicians who would
like some of their history “forgotten”
Or more challenging, can you name one
who would not like this to happen?
11/20/201736
37. Society on Social Implications of Technology
The Proxy Did It!
O’Neil, Cathy; Weapons of Math Destruction:
How Big Data Increases Inequality and
Threatens Democracy; Random House, 2016
(also Discover Mag, Oct 2016 issue)
“models and algorithms encode human prejudice”
11/20/201737
38. Society on Social Implications of Technology
E-Scores vs FICO
Fair Isaac credit scores are based on
YOUR personal financial history,
But cannot be used in sales/marketing
(just hiring, promotions, loans, etc.)
eScores are proxies for FICO in some
ways, matching you into “buckets” and
affecting YOUR job, credit, even your
time on hold to get service
“e-scores are arbitrary, unaccountable,
unregulated, and often unfair” (O’Neil)
11/20/201738
39. Society on Social Implications of Technology
Time on hold????? ???
Managing call center traffic
(“please dial 1 if you are rich, dial 2 to go on hold,
dial 3 to talk to someone in India, and 4 if you
just would like to dial in more numbers.”)
Ditto for web credit card web sites – before
you even “look” your browsing and purchasing
patterns are being evaluated.
These may not be your Friend
“People Like You…” (zip, job, search…)
11/20/201739
40. Society on Social Implications of Technology
e_Scores to “Score” ?
CreditScoreDating.com
“at least the customers know what they
are getting into and why” (O’Neil)
Job Applicants are “researched” on the
web BEFORE any contact from the
company – eScores, Facebook, etc.
“The law stipulates employers must alert job
seekers when credit issues disqualify them…”
(O’Neil) Right….
11/20/201740
41. Society on Social Implications of Technology
Show of Hands:
Your wait time should be based on
e-score proxies --- folks like you….
Your wait time should be based on
specific knowledge about you …
(“tends to complain a lot, let him wait a bit
longer, play the subliminal message tape”)
11/20/201741
42. Society on Social Implications of Technology
How Big is BIG?
Microsoft and U. Washington have developed a
system to store binary data in DNA sequences.
– All of the data on the 2016 Internet could fit
into a shoe box
– Much lower energy, less risk of bit rot, but …
right now, real slow read/write times
Oct 28, 2016 WSJ insert “Fast Forward Tech”
• Memristor’s (HP/SanDisk term)/ReRAM
Not this high of density, but significantly
faster, denser and lower power than SDRAM.
Jan 2017, IEEE Consumer Electronics Magazine
11/20/201742
43. Society on Social Implications of Technology
Opting Out
https://www.privacyrights.org/
http://www.stopdatamining.me/opt-
out-list/
11/20/201743
44. Society on Social Implications of Technology
From the SSIT Blog
An Asian firm, “Deep Knowledge” has
appointed a virtual director to their Board. In
this case it is a construct designed to detect
trends that the human directors might miss.
One suspects that Apple might want a model
of Steve Jobs around for occasional
consultation, if not back in control again
11/20/201744
45. Society on Social Implications of Technology
AI and Ethics
The Partnership on AI Ethics
http://www.partnershiponai.org/
IBM, Google, Microsoft, Amazon, Facebook
IEEE Standards – Autonomous Systems Ethics
http://standards.ieee.org/news/2016/ieee_aut
onomous_systems.html
11/20/201745
46. Society on Social Implications of Technology
Resources
http://www.bigbrotherawards.org/
(European – Privacy International)
“Saving Rhinos with Predictive Analytics” IEEE
Computer Society “Edge”
IEEE Computer Magazine, April 2016
Special Issue on Big Data
http://bigdata.ieee.org/
https://sites.google.com/site/io/underneath-the-
covers-at-google-current-systems-and-future-
directions
https://applymagicsauce.com/ Cambridge Univ.
Evaluation tool
11/20/201746
47. Society on Social Implications of Technology
SSIT
IEEE’s Forum for Academic, Practical and Policy dialog
on the Impact of Technology on Society
Engineers and Technologists who care about how
their products, discoveries, and services will
affect humanity
• Conferences world wide
• Quarterly publication
• Ongoing social media interactions
• Perennial issues to consider as technology happens
Major topics include:
Privacy, Security, Health, Ethics, Equity, Quality of Life
As affected by technology such as:
NanoTech, Genomics, networks, computing, RFID, drones
47 11/20/2017
48. Society on Social Implications of Technology
Social Media – Public Dialog
Blog and comments
LinkedIn Group
Facebook Group
YouTube Channel
Twitter
WWW.IEEESSIT.ORG
11/20/201748
49. Society on Social Implications of Technology
Questions?
Answers???
Thank You
11/20/201749
51. Society on Social Implications of Technology
Alpha:
the first step towards Omega
1992: DEC introduces Alpha, the first 64 bit
commercial computer chip …
64 bits can directly address 16EB (Exabytes,
16 Billion GB) of “real” memory .. And Alpha
was the fastest chip – so could seriously index
lots of data
The Alpha App:
Altavista – 1995 the first web index
1997- IBM introduces 16GB disk array
11/20/201751
52. Society on Social Implications of Technology
Donald Knuth, Stanford
Volume 3 (first ed. 1973)
Sorting and Searching, Second Edition
(Reading, Massachusetts: Addison-Wesley, 1998),
xiv+780pp.+foldout.
ISBN 0-201-89685-0
Advisor and mentor to two students:
Larry Page and Sergey Brin decided to
implement a full version – 1998
They call it “Google”
11/20/201752
53. Society on Social Implications of Technology
A side note on performance
Computer cycle times from MIPS to
GIPS (instructions per second)
Disk rotation latency (half turn average)
Seek Time (1/3 of disc surface average)
Solid State Drives change the game again
Add DNA and Intel’s new chip 3Dxxx?
11/20/201753
4,000 RPM 7.14 ms 7 million Instructions
15,000 RPM 2 ms 2 million Instructions
100ms 100 million instructions
4ms 4 million instructions
54. Society on Social Implications of Technology
Emerging “Tricks”
For highly compact storage (not fast)
DNA tools are being developed
massive storage – slow access
• Intel 3-D “Optane” memory:
• Pushing “flash ram” capabilities
into higher speed, more dense devices
• Data tools are now doing “memory
first” operations, expecting terabytes of
RAM
11/20/201754
55. Society on Social Implications of Technology
Data for Good “movement”
DataKind.org
– Harnessing the power of data science in
the service of humanity
– DataKind is a unique way to build your
skills and network with top data
scientists around the world
The Data for Good Exchange is part of a long
Bloomberg tradition of advocacy for using data
science and human capital to solve problems
at the core of society
11/20/201755
Hinweis der Redaktion
LITTLE ISSUES WITH BIG DATA
Big Data is one of the areas in computing that is just, well, getting bigger
However it exposes a number of issues that folks doing "Data Engineering" need to consider - from bit rot to legal issues like ownership, privacy, disclosure and liability. Presidential elections since 1960 have triggered big data driven investments. Consumer behavior is now tracked at the individual level, and tracking many more “data points” than you might anticipate. But Big Data is also a path towards environmental protection, detecting emergence of disease, and validating medical research.
Jim will provide some historical context that shines a light on recent, current and future issues facing corporations, software developers, but also managers, lawyers and the public in general.
Speaker
Jim Isaak
Past President of the IEEE Computer Society (Computer.org)
Past VP of the IEEE Society on the Social Implications of Technology (IEEESSIT.org)
Currently Blog mister for SSIT, Chair NH Life Member group, policy gadfly, and prescient skeptic.
News Alert – Ransomware Attack (may 2017)
Most recent – focused on Windows XP
Not other versions, Apple, Android
Encrypts your drives (and cloud) “Pay in Bitcoin” to get your data back
Hit hard on Hospitals, others with “slow acceptance” of new versions
Possible N.Korea – “some similarities”
Not “for profit” (minimal ROI)
Possible “shot over the bow”
What you can do?
Don’t follow email links unless you have high confidence in the source AND reason
Backup key information to “offline” mediaDVDs or USB “thumb drives” – regularly
Keep your OS and tools “updated”(auto update on Windows 10 not robust)
Keep your system(s) off when not in use
Harnessing the power of data science in the service of humanity
My “report” on Homo Deus:
30-39 year old male
78% liberal, 56% hard working, 26% contemplative, 35% competitive, 34% laid back
53% leadership potential, INTJ
------------------------------------------
Elaine:
40+ year old female
38% conservative, 70% hard working, 47% contenplative, 63% team, 27% laid back
55% leader, ESTJ
(600TB, or 600 thousand GB for the Titan supercomputer in 2012)