Max Shron, Thinking with Data at the NYC Data Science Meetup

Thinking with Data
Max Shron
@mshron

Big picture
• Data is too much fun, too easy to rabbit-hole
• Specialized knowledge is hard to communicate
• Not all stats is well-adapted to the real world
• We need techniques to handle that

Big picture
• Design — UX, consulting, etc.
• Humanities — philosophy, law, etc.
• Social science — sociology, psychology, etc.

Scoping
• First set of techniques: scoping.
• The world gives us vague requests.
• We should have things clear before we start, or
we end up with uninteresting questions.
• Write things down or say out loud.

Scoping
• Imagine we are working with a company with a
subscription business. The CEO asks us for a
churn model.
• Bad scope: “We will use R to create a logistic
regression to predict who will quit using the
product.”
• Not actionable, irrelevant detail.

Scoping
• CoNVO:
• Context
• Need
• Vision
• Outcome
• Iterative process — start simple, refine, refine, refine.

Scoping
• Context
• Who are we working with? What are the big
picture, long term goals?
• “The company has subscription model. CEO’s
goal is to improve proﬁtability.”

Scoping
• Need
• What is the particular knowledge we are
missing?
• “We want to understand who drops off early
enough so that we can intervene.”

Scoping
• Vision
• What would it look like to solve the problem?
• “We will build a predictive model using
behavioral data to predict who will drop off —
early enough to be useful.”
• Sources of data: important. Kinds of offers:
important. Kind of experimentation: important.
Kind of model: unimportant.

Scoping
• Outcome
• Who will be responsible for next steps? How will we
know if we are correct?
• “The tech team will implement the model in a batch
process to run daily, automatically sending out email
offers. We will calculate success metrics (precision
and recall) on held out users, and send a weekly
email of stats to stay on top.”
• We need a control group!

Scoping
• How do we develop a CoNVO?
• interviews
• kitchen sink interrogation
• roleplaying
• story-telling
• mockups

• Clearer vision with mockups
John Smith is 36 years old,
he has seen 40 different
pages over the past two
weeks, his and he has a 20%
chance to convert.
Scoping

Scoping
• Context We are hired to work in a hospital system with
250K patients over 20 years. Report to CEO, who is
interested in building a tool for reducing medical issues.
• Need After talking to some doctors, some belief that
there is overuse of antibiotics, but hard to detect.
• Vision A pilot investigation. If we ﬁnd signal, repeatable
ﬂagging tool.
• Outcome CMO will decide if pilot is valuable based on
report. Automated tool would be run by CMO on demand.

Arguments
• Data is not a ray gun!
• People need to be convinced, including you.
• The world is not deductive logic, we need a
theory that includes that people have minds.
• Trusting a tool, making a point with a graph,
coming to terms on a deﬁnition, convincing
someone to act differently, etc etc.

Arguments
• General model is semi-deductive. We move from
what is known / agreed-upon and move towards
what is not yet known.
• Patterns of reasoning help us make stronger cases
in less time and effort. Take advantage of two
thousand years of research.

Arguments
• Example: Predicting Δ poverty from satellite data
• It takes 5-10 years to get small scale poverty
estimates in poor countries.
• The vision: predict whether the poverty estimates
will go up or down ahead of time, using cheap
satellite data.
• The outcome: use to informally guide policy
decisions, keeping track of interventions.

Arguments
• Claim - Your audience does not believe it yet but
you think you can make a case for it.
• “Poverty can be modeled effectively with satellite
data.”
• Prior knowledge - Things your audience already
believes before the case is started.

Arguments
• Evidence - Where data enters an argument. We
transform data into evidence. Counts, models, graphs,
etc. make up the evidence.
• Justiﬁcation - The reasoning why the evidence should
cause us to believe the claim.
• “These graphs indicate that the residuals for our
model are as we had anticipated.”
• Rebuttal - Any of the reasons why the justiﬁcation might
not hold in this particular case. Usually smart to know.

Arguments
• Patterns!
• Causal analysis
• Convincing takes more than math
• Categories of dispute

Arguments
• Disputes of fact — getting the details straight
• “The F1 for this model is .7”
• Two stock issues:
• What is a reasonable truth condition?
• Is it satisﬁed?

Arguments
• Disputes of definition — relating words to math
• “Poverty is defined as FGT, α = 2”
• Three stock issues:
• Does this definition make a useful distinction?
• How consistent is this definition with prior ideas?
• What, if any, are the reasonable alternatives?

Arguments
• Disputes of value — making the right trade-offs
• “Our model is simple enough.”
• Two stock issues:
• How do our goals determine which values are
most important?
• Have the values been properly applied here?

Arguments
• Disputes of policy — the right course of action
• “We should use this model to informally guide our decisions
between ofﬁcial estimates.”
• Four stock issues:
• Is there a problem? ill
• Where is credit or blame due? blame
• Will the proposal solve it? cure
• Will it be better on balance? cost

Summary
• Take half the math and tools and twice the listening
to what people actually need.
• This is the tip of the iceberg. In general, we have a
lot to learn from others
• Let’s talk! @mshron

Max Shron, Thinking with Data at the NYC Data Science Meetup

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Max Shron, Thinking with Data at the NYC Data Science Meetup

Ähnlich wie Max Shron, Thinking with Data at the NYC Data Science Meetup (20)

Mehr von mortardata

Mehr von mortardata (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Max Shron, Thinking with Data at the NYC Data Science Meetup