SlideShare ist ein Scribd-Unternehmen logo
1 von 67
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Machine
Learning for
Social Good
Dr. Greg Ainslie-Malik – Machine Learning Architect
During the course of this presentation, we may make forward‐looking statements regarding
future events or plans of the company. We caution you that such statements reflect our
current expectations and estimates based on factors currently known to us and that actual
events or results may differ materially. The forward-looking statements made in the this
presentation are being made as of the time and date of its live presentation. If reviewed after
its live presentation, it may not contain current or accurate information. We do not assume
any obligation to update any forward‐looking statements made herein.
In addition, any information about our roadmap outlines our general product direction and is
subject to change at any time without notice. It is for informational purposes only, and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation
either to develop the features or functionalities described or to include any such feature or
functionality in a future release.
Splunk, Splunk>, Data-to-Everything, D2E, and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved.
Forward-
Looking
Statements
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Introduction to Machine Learning
Common challenges with Machine Learning
Where have we seen Machine Learning used for social good?
Anomaly detection
Fraud detection
Learning analytics
What else are we doing to promote good use of Machine Learning?
Agenda
4
3
2
1
© 2 0 2 0 S P L U N K I N C .
Introduction to
Machine Learning
© 2 0 2 0 S P L U N K I N C .
What is Machine Learning?
Artificial
Intelligence (AI)
Machine
Learning
Deep Learning
• AI is supposed to mean any type of
algorithm or programme that allows
computers to mimic human behaviour
• ML is a subset of this that allows
machines to make improvements over
time
• Deep Learning is a type of machine
learning that is based on neural
networks
© 2 0 2 0 S P L U N K I N C .
What is Machine Learning?
Data Rules Outcomes Data
Outcomes
(supervised
only)
Rules
Classic Programming Machine Learning
© 2 0 2 0 S P L U N K I N C .
Why Use Machine Learning?
Observation from Splunk customers
Identify anomalies
or ‘unknown
unknowns’
Improve alert
accuracy
Highlight weak
relationships
© 2 0 2 0 S P L U N K I N C .
How Machine Learning Fits into Splunk
Search
Every Search Can
Use Machine Learning
Third-Party
Applications
Smartphones
and Devices
Tickets
Email
Send an email
File a ticket
Send a text
Flash lights
Trigger
process
flow
AlertReal TimeOT
Industrial
Assets
IT
Consumer and
Mobile Devices
Security
© 2 0 2 0 S P L U N K I N C .
Common Challenges
with Machine Learning
© 2 0 2 0 S P L U N K I N C .
Problem Statement
There is a lack of trust in
Machine Learning.
This is largely caused by limited transparency or
explainability of most Machine Learning processes.
Therefore it can be difficult to identify
negative bias when applying Machine Learning.
© 2 0 2 0 S P L U N K I N C .
UNTAPPED
UNANALYSED
UNOWNED
MOST ORGANIZATIONS’ DATA IS STILL
DARK DATA
60%
of organizations report
that the majority of
their data is still dark
*Splunk Inc., “State of Dark Data Report” , May 2019
© 2 0 2 0 S P L U N K I N C .
Our World
Never Stops
Evolving.
How can we handle the half-life of data?
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Use of AI
Globally, 61%-67% saw value in AI
for their organizations.
60%–70% of respondents believe that they
will be using AI across IT, operations and
talent management in the future.
And yet …
Only 10%–15% say their organizations are
deploying AI for use cases today.
While only 12% say that AI is currently
guiding their business strategy, 61% expect
it to do so in the next five years.
of respondents say
they expect AI to guide
business strategy in
the next five years.
Organizations admit they’re not ready for AI.
Their top four concerns:
1. Lack of trained AI experts
2. Lack of understanding of AI
3. Not knowing what can be automated
4. Difficulty successfully wrangling the data
61%
81%
80%
78%
78%
© 2 0 2 0 S P L U N K I N C .
Do you know what’s
happening?
Can you turn data
into action?
How do you build
for the future?
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Try to gain as much visibility
of your data as possible
Minimise the delivery time
for that data
Invest in data skills
Key
Takeaways
1
2
3
© 2 0 2 0 S P L U N K I N C .
Machine Learning for
Social Good
Example case studies
© 2 0 2 0 S P L U N K I N C .
Finding Potential
Cyber Security
Incidents
Identifying anomalies in massive datasets
© 2 0 2 0 S P L U N K I N C .
https://conf.splunk.com/files/2019/slides/SEC1374.pdf
Use Case:
Proxy Communication Investigation Workflow
1
2
3
4
5
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| fit DensityFunction count by "HourOfDay" into df_bots_dns
| table _time count IsOutlier(count)
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.03
| table _time count IsOutlier(count)
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| summary df_bots_dns
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| summary df_bots_dns
Much bigger standard deviation
Much higher mean than the
other times of day
None of the times of day have
many training points
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.003 show_density=true
| where 'IsOutlier(count)'>0
| join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std]
| table _time count ProbabilityDensity(count) cardinality mean std
| eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
Reduce the threshold and
include the probability density in
the results
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.003 show_density=true
| where 'IsOutlier(count)'>0
| join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std]
| table _time count ProbabilityDensity(count) cardinality mean std
| eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
Filter the data to only show the
anomalies
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.003 show_density=true
| where 'IsOutlier(count)'>0
| join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std]
| table _time count ProbabilityDensity(count) cardinality mean std
| eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
Join with the summary data to
include the cardinality, mean
and standard deviation
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to Find Anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.003 show_density=true
| where 'IsOutlier(count)'>0
| join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std]
| table _time count ProbabilityDensity(count) cardinality mean std
| eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
Calculate some additional fields using
the mean and standard deviation that
describe how extreme the outlier is
© 2 0 2 0 S P L U N K I N C .
Using the DensityFunction to find anomalies
| tstats count WHERE (index=botsv2) BY _time span=60m
| eval HourOfDay=strftime(_time, "%H")
| apply df_bots_dns threshold=0.003 show_density=true
| where 'IsOutlier(count)'>0
| join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std]
| table _time count ProbabilityDensity(count) cardinality mean std
| eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
© 2 0 2 0 S P L U N K I N C .
Identifying Fraud
Finding anomalies in credit card transactions,
prescriptions and accesses to patient data
© 2 0 2 0 S P L U N K I N C .
Common
for exploring
transactional
data
Credit Card Fraud Example
Group Like with Like
Data is
often “batch”
loaded
Often
proactively
searching for
Unknown
Unknowns
© 2 0 2 0 S P L U N K I N C .
Enrich the Transactions
Region Change between
card txns?
Cal time delta between
card txns.
Merchant Change between
card txns?
© 2 0 2 0 S P L U N K I N C .
Synthesize More Context
Too quickly between
regions?
Avg Merchant/Region
change by num txns.
Aggregate counts per card
Stdev TimeDelta/Amt
by averages.
Too quickly between
merchants?
© 2 0 2 0 S P L U N K I N C .
Prep for Clustering and Visualization
1. Standard Scalar –
normalize distribution
2. Principal Component
Analysis (PCA) – reduce
to 3 dimensions
© 2 0 2 0 S P L U N K I N C .
Finally – Cluster with KMeans
© 2 0 2 0 S P L U N K I N C .
https://medcitynews.com/2019/02/splunk-and-newyork-presbyterian/
https://www.healthcareitnews.com/news/newyork-presbyterian-working-machine-
learning-analytics-combat-opioid-crisis
“At a time when overdose deaths are at
crisis levels across the country and in New
York City, largely due to the opioid
epidemic, healthcare providers have a
responsibility to safeguard against any
potential diversion of drugs. NewYork-
Presbyterian is taking a leading role in
protecting the public by implementing highly
effective controls to avoid the illegitimate
use of controlled substances. Ultimately, we
hope that other hospitals benefit from this
new platform as well.”
Jennings Aske, senior vice president and chief
information security officer at NewYork-Presbyterian
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Together, NewYork-Presbyterian
and Splunk are also creating an
enhanced data analytics solution
that investigates unauthorized
access to patient records.
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
Detect the anomaly…
© 2 0 2 0 S P L U N K I N C .
…drill down into that user…
© 2 0 2 0 S P L U N K I N C .
Predicting
Student Outcomes
Predicting student grades based on their
digital interactions with university IT and
identifying students that are at risk of
dropping out
© 2 0 2 0 S P L U N K I N C .
© 2 0 2 0 S P L U N K I N C .
What Data Scientists Really Do
Data Preparation accounts for about 80% of the work of data scientists
“Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes Mar 23, 2016
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100)
| eval student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Calculate a weighted score and
create a unique identifier for each
student and module combination
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Calculate the number of clicks, total
score and average score for each
student in each month
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Calculate the cumulative score over
time for each student and also get the
previous average score for each
month and create a rolling count
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Find the highest rolling count to use
as the course length
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Fill in empty average and cumulative
results and also calculate the module
percentage complete
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Enrich the data with additional context
for each student
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
index=oulad code_module=AAA
| eval weighted_score=score*(weight/100,
student_code=id_student."_".code_module."_".code_presentation
| bin _time span=1mon
| stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as
average_score by student_code _time
| streamstats sum(month_score) as cumulative_score last(average_score) as last_average
count by student_code
| eventstats max(count) as course_length
| eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)),
cumulative_score=if(cumulative_score>0,cumulative_score,0),
module_perc_complete=count/course_length
| join student_code [| inputlookup student_info.csv | eval
student_code=id_student."_".code_module."_".code_presentation | table student_code age_band
highest_education imd_band studied_credits final_result]
| table _time student_code sum_clicks average_score cumulative_score module_perc_complete
studied_credits age_band highest_education imd_band final_result
| outputlookup oulad_aaa.csv
Select only the fields we are
interested in and save to a lookup
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number<7
| fit RandomForestClassifier final_result from average_score cumulative_score
module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into
rf_oulad_aaa
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number<7
| fit RandomForestClassifier final_result from average_score cumulative_score
module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into
rf_oulad_aaa
Remove data for withdrawn students
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number<7
| fit RandomForestClassifier final_result from average_score cumulative_score
module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into
rf_oulad_aaa
Select a random sample of 70% of the data
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number<7
| fit RandomForestClassifier final_result from average_score cumulative_score
module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into
rf_oulad_aaa
Train a random forest classifier
on the data
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number>6
| apply rf_oulad_aaa
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| search final_result!="Withdrawn"
| sample partitions=10 seed=42
| search partition_number>6
| apply rf_oulad_aaa
Apply the random forest classifier on
the remaining 30% of the data
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
| inputlookup oulad_aaa.csv
| eval
withdrawn=if(final_result="Withdrawn","Yes","No")
| sample partitions=10 seed=42
| search partition_number<7
| fit RandomForestClassifier withdrawn from
average_score cumulative_score
module_perc_complete studied_credits sum_clicks
age_band highest_education imd_band into
rf_withdrawn_oulad_aaa
| inputlookup oulad_aaa.csv
| eval
withdrawn=if(final_result="Withdrawn","Yes","No")
| sample partitions=10 seed=42
| search partition_number>6
| apply rf_withdrawn_oulad_aaa
Train model Test model
© 2 0 2 0 S P L U N K I N C .
Predicting Student Outcomes
© 2 0 2 0 S P L U N K I N C .
What Can Be Done to
Promote Good Use of
Machine Learning?
© 2 0 2 0 S P L U N K I N C .
UK Government
First to Pilot AI
Procurement
Guidelines Co-
Designed with
World Economic
Forum
https://www.weforum.org/press/2019/09/uk-government-
first-to-pilot-ai-procurement-guidelines-co-designed-with-
world-economic-forum/
Splunk has supported the
development of these guidelines and
worked closely with the WEF and UK
Government. We will help pilot them in
the UK and believe the guidance will
enable Governments across the world
transform citizen services and deliver
ethically sound and beneficial AI
based solutions.”
— Lenny Stein, Senior Vice President,
Global Affairs, Splunk
“
© 2 0 2 0 S P L U N K I N C .
Work with the WEF
Intent
Provide information to non-specialists so that they can assess the suitability of ML for a given
problem/solution
Current solution
Procurement guidance for ‘unlocking public sector AI’
High level procurement processes
Best practices when evaluating an RFP
Map for creating AI-related RFPs
Unlocking Public Sector AI go-live
Expected in the coming months
4
3
2
1
© 2 0 2 0 S P L U N K I N C .
You!
Thank

Weitere ähnliche Inhalte

Was ist angesagt?

The Risks and Rewards of AI
The Risks and  Rewards of AIThe Risks and  Rewards of AI
The Risks and Rewards of AISplunk
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessSplunk
 
Splunk AI & Machine Learning Roundtable 2019 - Zurich
Splunk AI & Machine Learning Roundtable 2019 - ZurichSplunk AI & Machine Learning Roundtable 2019 - Zurich
Splunk AI & Machine Learning Roundtable 2019 - ZurichSplunk
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Splunk
 
SplunkLive! Stockholm 2019 - Customer presentation: Norlys
SplunkLive! Stockholm 2019 - Customer presentation: Norlys SplunkLive! Stockholm 2019 - Customer presentation: Norlys
SplunkLive! Stockholm 2019 - Customer presentation: Norlys Splunk
 
SplunkLive! Stockholm 2019 - Customer presentation: ISS
SplunkLive! Stockholm 2019 - Customer presentation: ISS SplunkLive! Stockholm 2019 - Customer presentation: ISS
SplunkLive! Stockholm 2019 - Customer presentation: ISS Splunk
 
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics Methods
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics MethodsSpliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics Methods
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics MethodsSplunk
 
Best Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesBest Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesSplunk
 
Splunk4Leaders
Splunk4Leaders Splunk4Leaders
Splunk4Leaders Splunk
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMSSplunk
 
Catch these Sessions on-demand at .conf Online
Catch these Sessions on-demand at .conf OnlineCatch these Sessions on-demand at .conf Online
Catch these Sessions on-demand at .conf OnlineSplunk
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Splunk
 
IoT Analytics @ splunk
IoT Analytics @ splunkIoT Analytics @ splunk
IoT Analytics @ splunkSplunk
 
Introduction into Security Analytics Methods
Introduction into Security Analytics Methods Introduction into Security Analytics Methods
Introduction into Security Analytics Methods Splunk
 
Extending Splunk to Business Use Cases With Automated Process Mining
Extending Splunk to Business Use Cases With Automated Process MiningExtending Splunk to Business Use Cases With Automated Process Mining
Extending Splunk to Business Use Cases With Automated Process MiningSplunk
 
Clear the Mist from your Clouds with Splunk
Clear the Mist from your Clouds with SplunkClear the Mist from your Clouds with Splunk
Clear the Mist from your Clouds with SplunkSplunk
 
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...DevOps.com
 
How to justify the economic value of your data investment
How to justify the economic value of your data investmentHow to justify the economic value of your data investment
How to justify the economic value of your data investmentSplunk
 
SplunkLive! Utrecht 2019: NXP
SplunkLive! Utrecht 2019: NXP SplunkLive! Utrecht 2019: NXP
SplunkLive! Utrecht 2019: NXP Splunk
 

Was ist angesagt? (20)

The Risks and Rewards of AI
The Risks and  Rewards of AIThe Risks and  Rewards of AI
The Risks and Rewards of AI
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
 
Splunk AI & Machine Learning Roundtable 2019 - Zurich
Splunk AI & Machine Learning Roundtable 2019 - ZurichSplunk AI & Machine Learning Roundtable 2019 - Zurich
Splunk AI & Machine Learning Roundtable 2019 - Zurich
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
 
SplunkLive! Stockholm 2019 - Customer presentation: Norlys
SplunkLive! Stockholm 2019 - Customer presentation: Norlys SplunkLive! Stockholm 2019 - Customer presentation: Norlys
SplunkLive! Stockholm 2019 - Customer presentation: Norlys
 
SplunkLive! Stockholm 2019 - Customer presentation: ISS
SplunkLive! Stockholm 2019 - Customer presentation: ISS SplunkLive! Stockholm 2019 - Customer presentation: ISS
SplunkLive! Stockholm 2019 - Customer presentation: ISS
 
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics Methods
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics MethodsSpliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics Methods
Spliunk Discovery Köln - 17-01-2020 - Intro to Security Analytics Methods
 
Best Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesBest Practices for Forwarder Hierarchies
Best Practices for Forwarder Hierarchies
 
Splunk4Leaders
Splunk4Leaders Splunk4Leaders
Splunk4Leaders
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
 
Catch these Sessions on-demand at .conf Online
Catch these Sessions on-demand at .conf OnlineCatch these Sessions on-demand at .conf Online
Catch these Sessions on-demand at .conf Online
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them
 
IoT Analytics @ splunk
IoT Analytics @ splunkIoT Analytics @ splunk
IoT Analytics @ splunk
 
Introduction into Security Analytics Methods
Introduction into Security Analytics Methods Introduction into Security Analytics Methods
Introduction into Security Analytics Methods
 
Extending Splunk to Business Use Cases With Automated Process Mining
Extending Splunk to Business Use Cases With Automated Process MiningExtending Splunk to Business Use Cases With Automated Process Mining
Extending Splunk to Business Use Cases With Automated Process Mining
 
Clear the Mist from your Clouds with Splunk
Clear the Mist from your Clouds with SplunkClear the Mist from your Clouds with Splunk
Clear the Mist from your Clouds with Splunk
 
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...
 
How to justify the economic value of your data investment
How to justify the economic value of your data investmentHow to justify the economic value of your data investment
How to justify the economic value of your data investment
 
SplunkLive! Utrecht 2019: NXP
SplunkLive! Utrecht 2019: NXP SplunkLive! Utrecht 2019: NXP
SplunkLive! Utrecht 2019: NXP
 

Ähnlich wie Machine learning for social good: Case studies in anomaly detection and fraud prevention

The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
A Risk Based Approach to Security Detection and Investigation by Kelby Shelton
A Risk Based Approach to Security Detection and Investigation by Kelby SheltonA Risk Based Approach to Security Detection and Investigation by Kelby Shelton
A Risk Based Approach to Security Detection and Investigation by Kelby SheltonJohn Billings CISSP
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkDavid Chiu
 
SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101Splunk
 
Splunk 4 Ninja ITSI Workshop
Splunk 4 Ninja ITSI WorkshopSplunk 4 Ninja ITSI Workshop
Splunk 4 Ninja ITSI WorkshopMarc Serieys
 
Fake News and Message Detection
Fake News and Message DetectionFake News and Message Detection
Fake News and Message DetectionIRJET Journal
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...DevOps.com
 
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020innov-acts-ltd
 
Crime Prediction and Analysis
Crime Prediction and AnalysisCrime Prediction and Analysis
Crime Prediction and AnalysisIRJET Journal
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGCRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGIRJET Journal
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Person Acquisition and Identification Tool
Person Acquisition and Identification ToolPerson Acquisition and Identification Tool
Person Acquisition and Identification ToolIRJET Journal
 
IRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET Journal
 
Employment Performance Management Using Machine Learning
Employment Performance Management Using Machine LearningEmployment Performance Management Using Machine Learning
Employment Performance Management Using Machine LearningIRJET Journal
 
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...Michael Scully
 
Merging forensics w data analytics
Merging forensics w data analyticsMerging forensics w data analytics
Merging forensics w data analyticschris75308
 

Ähnlich wie Machine learning for social good: Case studies in anomaly detection and fraud prevention (20)

The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
A Risk Based Approach to Security Detection and Investigation by Kelby Shelton
A Risk Based Approach to Security Detection and Investigation by Kelby SheltonA Risk Based Approach to Security Detection and Investigation by Kelby Shelton
A Risk Based Approach to Security Detection and Investigation by Kelby Shelton
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 
Blue Eye Technology
Blue Eye TechnologyBlue Eye Technology
Blue Eye Technology
 
SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101
 
28022017 Simen Munter Mindfields
28022017 Simen Munter Mindfields28022017 Simen Munter Mindfields
28022017 Simen Munter Mindfields
 
Hyperlink
HyperlinkHyperlink
Hyperlink
 
Splunk 4 Ninja ITSI Workshop
Splunk 4 Ninja ITSI WorkshopSplunk 4 Ninja ITSI Workshop
Splunk 4 Ninja ITSI Workshop
 
Fake News and Message Detection
Fake News and Message DetectionFake News and Message Detection
Fake News and Message Detection
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
 
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
 
Crime Prediction and Analysis
Crime Prediction and AnalysisCrime Prediction and Analysis
Crime Prediction and Analysis
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGCRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Person Acquisition and Identification Tool
Person Acquisition and Identification ToolPerson Acquisition and Identification Tool
Person Acquisition and Identification Tool
 
IRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine Learning
 
Employment Performance Management Using Machine Learning
Employment Performance Management Using Machine LearningEmployment Performance Management Using Machine Learning
Employment Performance Management Using Machine Learning
 
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...
Itsi in-the-wild-why-micron-chose-splunk-it-service-intelligence-and-lessons-...
 
Merging forensics w data analytics
Merging forensics w data analyticsMerging forensics w data analytics
Merging forensics w data analytics
 

Mehr von Splunk

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)Splunk
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank InternationalSplunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)Splunk
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College LondonSplunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - KeynoteSplunk
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform SessionSplunk
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security SessionSplunk
 

Mehr von Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Machine learning for social good: Case studies in anomaly detection and fraud prevention

  • 1. © 2 0 2 0 S P L U N K I N C . © 2 0 2 0 S P L U N K I N C . Machine Learning for Social Good Dr. Greg Ainslie-Malik – Machine Learning Architect
  • 2. During the course of this presentation, we may make forward‐looking statements regarding future events or plans of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results may differ materially. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, it may not contain current or accurate information. We do not assume any obligation to update any forward‐looking statements made herein. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionalities described or to include any such feature or functionality in a future release. Splunk, Splunk>, Data-to-Everything, D2E, and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved. Forward- Looking Statements © 2 0 2 0 S P L U N K I N C .
  • 3. © 2 0 2 0 S P L U N K I N C . Introduction to Machine Learning Common challenges with Machine Learning Where have we seen Machine Learning used for social good? Anomaly detection Fraud detection Learning analytics What else are we doing to promote good use of Machine Learning? Agenda 4 3 2 1
  • 4. © 2 0 2 0 S P L U N K I N C . Introduction to Machine Learning
  • 5. © 2 0 2 0 S P L U N K I N C . What is Machine Learning? Artificial Intelligence (AI) Machine Learning Deep Learning • AI is supposed to mean any type of algorithm or programme that allows computers to mimic human behaviour • ML is a subset of this that allows machines to make improvements over time • Deep Learning is a type of machine learning that is based on neural networks
  • 6. © 2 0 2 0 S P L U N K I N C . What is Machine Learning? Data Rules Outcomes Data Outcomes (supervised only) Rules Classic Programming Machine Learning
  • 7. © 2 0 2 0 S P L U N K I N C . Why Use Machine Learning? Observation from Splunk customers Identify anomalies or ‘unknown unknowns’ Improve alert accuracy Highlight weak relationships
  • 8. © 2 0 2 0 S P L U N K I N C . How Machine Learning Fits into Splunk Search Every Search Can Use Machine Learning Third-Party Applications Smartphones and Devices Tickets Email Send an email File a ticket Send a text Flash lights Trigger process flow AlertReal TimeOT Industrial Assets IT Consumer and Mobile Devices Security
  • 9. © 2 0 2 0 S P L U N K I N C . Common Challenges with Machine Learning
  • 10. © 2 0 2 0 S P L U N K I N C . Problem Statement There is a lack of trust in Machine Learning. This is largely caused by limited transparency or explainability of most Machine Learning processes. Therefore it can be difficult to identify negative bias when applying Machine Learning.
  • 11. © 2 0 2 0 S P L U N K I N C . UNTAPPED UNANALYSED UNOWNED MOST ORGANIZATIONS’ DATA IS STILL DARK DATA 60% of organizations report that the majority of their data is still dark *Splunk Inc., “State of Dark Data Report” , May 2019
  • 12. © 2 0 2 0 S P L U N K I N C . Our World Never Stops Evolving. How can we handle the half-life of data? © 2 0 2 0 S P L U N K I N C .
  • 13. © 2 0 2 0 S P L U N K I N C . Use of AI Globally, 61%-67% saw value in AI for their organizations. 60%–70% of respondents believe that they will be using AI across IT, operations and talent management in the future. And yet … Only 10%–15% say their organizations are deploying AI for use cases today. While only 12% say that AI is currently guiding their business strategy, 61% expect it to do so in the next five years. of respondents say they expect AI to guide business strategy in the next five years. Organizations admit they’re not ready for AI. Their top four concerns: 1. Lack of trained AI experts 2. Lack of understanding of AI 3. Not knowing what can be automated 4. Difficulty successfully wrangling the data 61% 81% 80% 78% 78%
  • 14. © 2 0 2 0 S P L U N K I N C . Do you know what’s happening? Can you turn data into action? How do you build for the future? © 2 0 2 0 S P L U N K I N C .
  • 15. © 2 0 2 0 S P L U N K I N C . Try to gain as much visibility of your data as possible Minimise the delivery time for that data Invest in data skills Key Takeaways 1 2 3
  • 16. © 2 0 2 0 S P L U N K I N C . Machine Learning for Social Good Example case studies
  • 17. © 2 0 2 0 S P L U N K I N C . Finding Potential Cyber Security Incidents Identifying anomalies in massive datasets
  • 18. © 2 0 2 0 S P L U N K I N C . https://conf.splunk.com/files/2019/slides/SEC1374.pdf Use Case: Proxy Communication Investigation Workflow 1 2 3 4 5
  • 19. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m
  • 20. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | fit DensityFunction count by "HourOfDay" into df_bots_dns | table _time count IsOutlier(count)
  • 21. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.03 | table _time count IsOutlier(count)
  • 22. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | summary df_bots_dns
  • 23. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | summary df_bots_dns Much bigger standard deviation Much higher mean than the other times of day None of the times of day have many training points
  • 24. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.003 show_density=true | where 'IsOutlier(count)'>0 | join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std] | table _time count ProbabilityDensity(count) cardinality mean std | eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std Reduce the threshold and include the probability density in the results
  • 25. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.003 show_density=true | where 'IsOutlier(count)'>0 | join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std] | table _time count ProbabilityDensity(count) cardinality mean std | eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std Filter the data to only show the anomalies
  • 26. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.003 show_density=true | where 'IsOutlier(count)'>0 | join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std] | table _time count ProbabilityDensity(count) cardinality mean std | eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std Join with the summary data to include the cardinality, mean and standard deviation
  • 27. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to Find Anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.003 show_density=true | where 'IsOutlier(count)'>0 | join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std] | table _time count ProbabilityDensity(count) cardinality mean std | eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std Calculate some additional fields using the mean and standard deviation that describe how extreme the outlier is
  • 28. © 2 0 2 0 S P L U N K I N C . Using the DensityFunction to find anomalies | tstats count WHERE (index=botsv2) BY _time span=60m | eval HourOfDay=strftime(_time, "%H") | apply df_bots_dns threshold=0.003 show_density=true | where 'IsOutlier(count)'>0 | join HourOfDay [| summary df_bots_dns | table HourOfDay cardinality mean std] | table _time count ProbabilityDensity(count) cardinality mean std | eval distance_from_mean=abs(count-mean), deviations_from_mean=abs(count-mean)/std
  • 29. © 2 0 2 0 S P L U N K I N C . Identifying Fraud Finding anomalies in credit card transactions, prescriptions and accesses to patient data
  • 30. © 2 0 2 0 S P L U N K I N C . Common for exploring transactional data Credit Card Fraud Example Group Like with Like Data is often “batch” loaded Often proactively searching for Unknown Unknowns
  • 31. © 2 0 2 0 S P L U N K I N C . Enrich the Transactions Region Change between card txns? Cal time delta between card txns. Merchant Change between card txns?
  • 32. © 2 0 2 0 S P L U N K I N C . Synthesize More Context Too quickly between regions? Avg Merchant/Region change by num txns. Aggregate counts per card Stdev TimeDelta/Amt by averages. Too quickly between merchants?
  • 33. © 2 0 2 0 S P L U N K I N C . Prep for Clustering and Visualization 1. Standard Scalar – normalize distribution 2. Principal Component Analysis (PCA) – reduce to 3 dimensions
  • 34. © 2 0 2 0 S P L U N K I N C . Finally – Cluster with KMeans
  • 35. © 2 0 2 0 S P L U N K I N C . https://medcitynews.com/2019/02/splunk-and-newyork-presbyterian/ https://www.healthcareitnews.com/news/newyork-presbyterian-working-machine- learning-analytics-combat-opioid-crisis “At a time when overdose deaths are at crisis levels across the country and in New York City, largely due to the opioid epidemic, healthcare providers have a responsibility to safeguard against any potential diversion of drugs. NewYork- Presbyterian is taking a leading role in protecting the public by implementing highly effective controls to avoid the illegitimate use of controlled substances. Ultimately, we hope that other hospitals benefit from this new platform as well.” Jennings Aske, senior vice president and chief information security officer at NewYork-Presbyterian
  • 36. © 2 0 2 0 S P L U N K I N C .
  • 37. © 2 0 2 0 S P L U N K I N C .
  • 38. © 2 0 2 0 S P L U N K I N C . Together, NewYork-Presbyterian and Splunk are also creating an enhanced data analytics solution that investigates unauthorized access to patient records.
  • 39. © 2 0 2 0 S P L U N K I N C .
  • 40. © 2 0 2 0 S P L U N K I N C . Detect the anomaly…
  • 41. © 2 0 2 0 S P L U N K I N C . …drill down into that user…
  • 42. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes Predicting student grades based on their digital interactions with university IT and identifying students that are at risk of dropping out
  • 43. © 2 0 2 0 S P L U N K I N C .
  • 44. © 2 0 2 0 S P L U N K I N C . What Data Scientists Really Do Data Preparation accounts for about 80% of the work of data scientists “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes Mar 23, 2016
  • 45. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA
  • 46. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100) | eval student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv
  • 47. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Calculate a weighted score and create a unique identifier for each student and module combination
  • 48. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Calculate the number of clicks, total score and average score for each student in each month
  • 49. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Calculate the cumulative score over time for each student and also get the previous average score for each month and create a rolling count
  • 50. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Find the highest rolling count to use as the course length
  • 51. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Fill in empty average and cumulative results and also calculate the module percentage complete
  • 52. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Enrich the data with additional context for each student
  • 53. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes index=oulad code_module=AAA | eval weighted_score=score*(weight/100, student_code=id_student."_".code_module."_".code_presentation | bin _time span=1mon | stats sum(sum_click) as sum_clicks sum(weighted_score) as month_score avg(score) as average_score by student_code _time | streamstats sum(month_score) as cumulative_score last(average_score) as last_average count by student_code | eventstats max(count) as course_length | eval average_score=if(average_score>0,average_score,if(last_average>0,last_average,0)), cumulative_score=if(cumulative_score>0,cumulative_score,0), module_perc_complete=count/course_length | join student_code [| inputlookup student_info.csv | eval student_code=id_student."_".code_module."_".code_presentation | table student_code age_band highest_education imd_band studied_credits final_result] | table _time student_code sum_clicks average_score cumulative_score module_perc_complete studied_credits age_band highest_education imd_band final_result | outputlookup oulad_aaa.csv Select only the fields we are interested in and save to a lookup
  • 54. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes
  • 55. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number<7 | fit RandomForestClassifier final_result from average_score cumulative_score module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into rf_oulad_aaa
  • 56. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number<7 | fit RandomForestClassifier final_result from average_score cumulative_score module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into rf_oulad_aaa Remove data for withdrawn students
  • 57. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number<7 | fit RandomForestClassifier final_result from average_score cumulative_score module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into rf_oulad_aaa Select a random sample of 70% of the data
  • 58. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number<7 | fit RandomForestClassifier final_result from average_score cumulative_score module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into rf_oulad_aaa Train a random forest classifier on the data
  • 59. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number>6 | apply rf_oulad_aaa
  • 60. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | search final_result!="Withdrawn" | sample partitions=10 seed=42 | search partition_number>6 | apply rf_oulad_aaa Apply the random forest classifier on the remaining 30% of the data
  • 61. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes
  • 62. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes | inputlookup oulad_aaa.csv | eval withdrawn=if(final_result="Withdrawn","Yes","No") | sample partitions=10 seed=42 | search partition_number<7 | fit RandomForestClassifier withdrawn from average_score cumulative_score module_perc_complete studied_credits sum_clicks age_band highest_education imd_band into rf_withdrawn_oulad_aaa | inputlookup oulad_aaa.csv | eval withdrawn=if(final_result="Withdrawn","Yes","No") | sample partitions=10 seed=42 | search partition_number>6 | apply rf_withdrawn_oulad_aaa Train model Test model
  • 63. © 2 0 2 0 S P L U N K I N C . Predicting Student Outcomes
  • 64. © 2 0 2 0 S P L U N K I N C . What Can Be Done to Promote Good Use of Machine Learning?
  • 65. © 2 0 2 0 S P L U N K I N C . UK Government First to Pilot AI Procurement Guidelines Co- Designed with World Economic Forum https://www.weforum.org/press/2019/09/uk-government- first-to-pilot-ai-procurement-guidelines-co-designed-with- world-economic-forum/ Splunk has supported the development of these guidelines and worked closely with the WEF and UK Government. We will help pilot them in the UK and believe the guidance will enable Governments across the world transform citizen services and deliver ethically sound and beneficial AI based solutions.” — Lenny Stein, Senior Vice President, Global Affairs, Splunk “
  • 66. © 2 0 2 0 S P L U N K I N C . Work with the WEF Intent Provide information to non-specialists so that they can assess the suitability of ML for a given problem/solution Current solution Procurement guidance for ‘unlocking public sector AI’ High level procurement processes Best practices when evaluating an RFP Map for creating AI-related RFPs Unlocking Public Sector AI go-live Expected in the coming months 4 3 2 1
  • 67. © 2 0 2 0 S P L U N K I N C . You! Thank