Boost Fertility New Invention Ups Success Rates.pdf
From Volume to Value - A Guide to Data Engineering
1. 1A S T R O N O M E R . I O
From Volume to Value
A Guide to Data Engineering
2. 2Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Table of Contents
Introduction........................................................................................................................................ 3
Information Overload......................................................................................................................5
Talent Gap..........................................................................................................................................6
A New Role: Data Engineering........................................................................................................8
Data Maturity Goals........................................................................................................................10
Starting to Climb..............................................................................................................................12
Next Steps..........................................................................................................................................15
Connect and Route Your Data with Astronomer........................................................................16
Conclusion (TL;DR)..........................................................................................................................17
About Astronomer............................................................................................................................18
Sources...............................................................................................................................................19
3. 3Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Introduction
In today’s digital age, getting ahead depends on leveraging data better than competitors. Take
Amazon’s acquisition of Whole Foods that caused competitors’ stock to drop significantly. Why?
Because shareholders understand that when Amazon adds this plethora of storefront data to its
abundance of virtual-buyer data, they will discover exclusive insights to drive business.1
And while reaching the peak of success and retaining the lead in the race to the summit look
different based on industry, geography and other factors, some commonalities hold true. At
Astronomer, we’ve mapped out the journey to becoming more mature with data—in other
words, the path to gaining a competitive advantage.
4. 4Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
No matter where organizations are on their journey, next steps will require more data sets to
deal with and more preparation to ready that data for analytics. Before moving toward the
summit, it’s important to consider some key questions:
• What metrics are most important to measure in my business?
• What data sets are needed to measure them?
• How can those data sets be accessed?
• Who’s responsible to clean, reformat, organize, transform and otherwise prepare the data
for analysis?
Answering these questions is certainly challenging, which perhaps explains why only 4% of
companies actively use their data. The remaining 96% includes thousands of companies that
collect data but haven’t quite figured out how to derive maximum value from it.2
Those who
have, however, will quickly gain a competitive advantage and see their early efforts pay off in
the long run.
In this guide, we’ll discuss three things to get you there:
1. Core challenges to extracting value from data
2. Practical ways to overcome those challenges and get to value
3. Actionable next steps for your organization
Only 4 percent of companies are actively using
their data. Are you?
(Bain and Company)
5. 5Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Information Overload
According to a McKinsey Global Institute (MGI) report, “data have
swept into every industry and business function and are now an
important factor of production, alongside labor and capital.” MGI
estimated that retailers using big data to its fullest potential could
increase operating margins by more than 60 percent, and that
both businesses and consumers would benefit from leveraging the
exponentially increasing data sets.3
And that was back in 2011.
In 2016, a Gartner analysis further defined the need for data:
organizations that provide agile, curated internal and external data
sets for a variety of content authors will realize twice the business
benefits of those that don’t.4
So why isn’t everybody curating these data sets and enabling individual analysts to not only ac-
cess information but also contribute back to models? Because the many data sets available to
companies between legacy systems, cloud-based tools, CRMs, databases, websites and other
data-generating sources create a mass of structured, unstructured and siloed data sets that
don’t “talk” to each other. Consolidating data is a critical first step, but it costs companies count-
less hours of cleaning, enriching, and formatting.
Simply put, data is a mess.
Do you have data in a ...
• legacy system?
• cloud-based tool?
• CRM?
• database?
• data lake?
• website?
• app?
• more than one of any of the
above?
It’s likely you have a LOT of data. In
various forms. Accumulating quickly.
6. 6Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Talent Gap
Of course, any mess can be cleaned up. The state of the mess—
commonly described as the “three v’s of data” (volume, velocity
and variety) aren’t the only obstacles. There’s another problem:
the deep technical skills required to build, deploy and maintain
a modern data infrastructure that can handle big data, and fast,
are rare. In fact, the MGI analysis predicted that by 2018, the
United States alone could face a shortage of 140,000 to 190,000
people with deep analytical skills and a shortage of 1.5 million
managers and analysts who understand how to make effective
decisions based on data.
To contend with this, many companies have created a new
role: the data scientist. Data scientists, according to the
Harvard Business Review, are a “hybrid of data hacker, analyst,
communicator and trusted adviser” with skills like programming, multivariable calculus and
linear algebra and an understanding of machine learning. They can find patterns and extract
insights from a giant body of data and write algorithms to run over these data sets.5
Becoming
mature with data is impossible without these capabilities.
There’s just one problem: data scientists aren’t
spending their time creating algorithms, mining
data for patterns or interpreting insights.
Do you have a data
scientist on staff? Ask them
how much time they spend ...
• Building training sets
• Cleaning and organizing data
• Collecting data sets
• Mining data for patterns
• Refining algorithms
• Articulating analysis
If you don’t have a data scientist on staff,
who does these tasks? And how much of
their time is devoted to each one?
7. 7Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Eighty percent of a data scientist’s time is spent collecting data
sets and cleaning and organizing them.6
It takes a high level of
skill to do, but it’s not data science.
So having a data science team isn’t enough. Every company
must take a step back and clean, enrich, reformat and otherwise
prepare data for the data scientists and analysts. All these
activities fall into the category of data engineering.
To maximize insights from
data and get to value faster,
forward-thinking organizations
are creating a new role: the data
engineer.
Data engineering
[dat-uh en-juh-neer-ing]:
verb. the act of
accessing, processing,
enriching, cleaning and/
or otherwise orchestrating
data analysis
8. 8Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
A New Role: Data Engineering
So what is data engineering, exactly? And why is it so important? Data engineering is the act
of accessing, processing, enriching, cleaning and/or otherwise orchestrating data analysis.
Data engineers build tools, infrastructure,
frameworks, and services. In smaller companies—
where no data infrastructure team has yet been
formalized—the data engineering role may also
cover the workload around setting up and operat-
ing the organization’s data infrastructure.
( Maxime Beauchemin, Airbnb. The Rise of the Data Engineer)
Maxime joined Facebook as a business intelligence engineer in 2011 and left as a data engi-
neer two years later. The need for more complex, code-based ETL and changing data mod-
eling drove the demand for data engineering.7
Even though data engineering alone doesn’t
reveal insights, it readies your data to be analyzed reliably. Without it, there’s no possibility for
meaningful analysis or data science.
9. 9Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Data ScientistsData Engineers
Prepare data
for analysis
Process
raw data
Function behind
the scenes
Build infrastructure to
consolidate and enrich
numerous data sets
Handle large-scale
data processing
Monitor and
maintain systems
Probe for
insights
Deliver results to
business users
Apply machine learning,
algorithms and other
analytics approaches
Uncover meaning in
large amounts of data
Articulate analysis,
often visually
Interpret results
of analysis
In simple terms, data engineers and data scientists work together like this:
When both data engineering and data science are priorities for an organization,
getting more mature with data is inevitable.
10. 10Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Data Maturity Goals
In considering how to become more mature with data, it can be
helpful to look to practical examples of companies who have done
it well. Airbnb is near the summit of the data maturity mountain. It’s
reached heights most companies can’t yet fathom—heights to the
tune of $3.5 billion in projected earnings in 2020, which exceeds the
bottom lines of 85% of Fortune 500 companies.8
For them, data engineering isn’t a black box; it’s cultural.9
Access
to data and the ability to contribute to business logic have been
democratized.
As the company’s size and reach (and number of employees)
increased, so did its available data sets. Making the right
data available across the organization required strategic data engineering. First, Airbnb
established what they called “Core Data,” a single source of truth for everyone.
To do this, they created Airflow, a workflow management system that programmatically authors,
schedules and monitors dependency-based data pipelines, without running unnecessarily. This
technology allows them to schedule all their data to flow to a single data-space.10
They also built a
data portal for employees, a “search and discovery tool” through which they can pull the numbers
they need on their own. It puts the power of real-time data analytics into the hands of everyone
working to make the company successful.
Now everyday decision-makers have access to information on the spot, but at the same
time, a data engineering team maintains quality control by managing data warehousing,
enhancing the performance of core data infrastructure, integrating data flow between
systems and tools and looking for new ways to automate their tasks.11
Airbnb is near the
summit of the data
maturity mountain.
WIth $3.5 billion in projected earnings,
what do they do differently?
Democratize data.
How? A single source of truth that is
searchable for everyone and a
“Data University” to make
sure everyone knows
how to use it.
11. 11Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Of course, even the most reliable data portal is only as good as it
is useful, so the Airbnb data science team went a step further and
tracked the weekly active users (WAUs) logging into the portal,
then created a “Data University” with courses to teach those
employees how to use the portal and mine the data it holds.12
This has allowed the company to operate under a philosophy
of data democratization, giving every employee access to up-
to-date data and the power to make decisions based on that
data. And all of that happens without an Airbnb data scientist
in every department because each employee is empowered
at a larger scale to find and use data—they also understand
exactly how to do that thanks to the Data University.
Now, 45% of Airbnb employees are WAUs, and that particular
economy of scale has eliminated an information bottleneck and
freed up the data science team to focus on the most pressing
problems.
Airbnb is far from the only company to understand the appeal of data democratization. Other
tech giants like Facebook have pioneered the trend, but many others are jumping on board—
companies like Finish Line 13
,Chobani14
and even the government 15
.
TL;DR
*Some practical steps Airbnb took to get to
the summit
• Hired a data engineer
• Consolidated all data in one place
• Made data fully accessible
• Taught their employees to query
• Allowed multiple content authors
• Took action based on data
• Watched revenue grow
*Though this guide doesn’t get technical,
if you’re wondering how data flows,
Airbnb uses Apache Airflow, a workflow
management system.
12. Starting to Climb
Implementing a world-class culture of data engineering within your company requires scaling the
data maturity mountain.
If that seems daunting, take heart: remember that 96% of companies are not maximizing their da-
ta’s value. There are many points in between the base camp and the summit, and organizations can
pick up and move to the next campsite anytime. The first step is determining where you stand now:
0.0 Camp Flying Blind
Data initiatives are most likely not a priority for you, which means you’re probably not reading this.
1.0 Camp Frustrated
You collect data, but probably aren’t sure how to extract actionable business intelligence from it.
2.0 Camp In Control
Here, you’re using some tools to aggregate data and likely understand how to access the
information you need for your role. But you’re not totally sure it’s reliable and have no idea what
other teams are doing.
3.0 Camp Activated
With connected data, you’re looking for new and relevant data sets that you can
plug in for even greater insights. You’ve got basic
algorithms in place and are starting to explore
data science. But you’re spending more time
preparing data for analytics than analyzing it.
0.0
Flying Blind
1.0
Frustrated
2.0
In Control
CompetitiveAdvantage
12A S T R O N O M E R . I O
3.0
Activated
13. Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
4.0 Camp Intelligent
At this stage, you offer data visualization in several forms across your organization and rely on
predictive analytics—and maybe machine learning and artificial intelligence (AI) technology.
You’re probably enabling better data science through intentional, improved data engineering.
5.0 Camp Insane - Summit
Your organization is devoted to data engineering or data science, and insights drive and de-
fine every decision you make for your business. To enable that, there is a single source of truth
that is accessible to everyone. Anyone from marketers to data scientists can contribute back
to business logic.
If you’re not exactly sure which camp
you’re in, take the 60-second self-assessment.
astronomer.io/data-assessment
No matter where you’ve mapped yourself, remember: very few businesses
have reached the summit of “Insane”—and few are still stuck in the
doldrums “flying blind” at the zero spot—so it’s fair to assume that your
business’s data strategy, and that of your biggest competitors, is somewhere
in between these two extremes. And that’s a good thing; it means you can
scale up whenever you like.
4.0
Intelligent
5.0
Insane Mode!
A S T R O N O M E R . I O 13
14. 14Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Next Steps
Like Stephen Covey says, begin with the end in mind. If Airbnb’s culture of data engineering
represents the summit, here’s a checklist of steps to getting there:
Read this guide
Commit to getting value from your data
Consider hiring a data scientist
Create a data engineering capability in your organization
This is where Astronomer can help!
Consolidate all data in one place
Route data to give decision-makers full access
Teach them to query (if necessary)
Empower business users to contribute to core tables
Once you trust and understand the data, probe for insights
Take action
Grow your revenue!
How does Astronomer fit in?
The rapid, agile, secure data routing and prep required for this to-do list relies on specialized
tools. For Airbnb, that’s Apache Airflow. Astronomer’s data engineering platform incorporates
all the strength of Apache Airflow with all the power of Astronomer to empower teams to con-
struct the data infrastructure they need for cross-organizational data democratization.
Astronomer’s data engineering platform streamlines and
amplifies your data engineering capabilities.
✔
15. 15Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Connect and Route Your Data
with Astronomer
Astronomer is a data engineering platform that connects
data from legacy systems, BI tools, databases and other
sources—and routes it where it can be analyzed.
Astronomer offers complete customizability through its
use of open-source software, including Airbnb’s Apache
Airflow, and offers both a library of standard data
pipelines and full access to developers to write custom
pipelines, defined as code. A business user can set up
a standard pipe, like sending Facebook Ads to Redshift,
in minutes. Or a data scientist, analyst or data engineer
can author, schedule and monitor their own dependen-
cy-based data pipelines to centralize and route data
from analytics tools, legacy systems, apps and more.
Whatever camp you’re currently in,
Astronomer meets you where you are and
helps you get ahead.
16. 16Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Conclusion (TL;DR)
• Digital Darwinism threatens every organization.
• For most companies, data is a mess.
• There is a shortage of folks with the skills to deal with data.
• Companies who get ahead now have a serious advantage.
• Getting ahead looks like:
1. making data engineering a priority.
2. consolidating data into a single source of truth.
3. democratizing data for the entire organization.
17. 17Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
About Astronomer
Since our beginning in 2015, we have said we are with the machines. We believe the future of
work looks like machines + humans operating in their respective strengths and accomplishing
more, together. By assembling a world-class team of data engineers to program machines
to connect, process and route large amounts of data, we free humans up to do what they do
best: analyze data to discover insights and make essential decisions.
Learn more at astronomer.io or connect with us at humans@astronomer.io.
18. 18Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O
Sources
1. “Big Prize in Amazon-Whole Foods Deal: Data” by Laura Stevens and Heather Haddon, Wall Street Journal,
2017, astrnmr.co/2uTXNdc
2. “The Value of Big Data: How analytics differentiates winners” by Rasmus Wegener and Velu Sinha, Bain &
Company, 2013, astrnmr.co/2uTRE0y
3. “Big data: The Next Frontier for Innovation, Competition and Productivity” by James Manyika, Michael Chui,
Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela Hung Byers, McKinsey and Com-
pany, 2011, astrnmr.co/2sPDMrK
4. “Market Guide for Self-Service Data Preparation” by Rita L. Sallam et al, Gartner, 2016, astrnmr.co/2tzriSo
5. “Data Scientist: The Sexiest Job of the 21st Century” by Thomas H. Davenport and D.J. Patil, Harvard Business
Review, 2012, astrnmr.co/2syVbAW
6. “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says” by Gil Press,
Forbes, 2016, astrnmr.co/2uzVgWx
7. “The Rise of Data Engineering” by Maxime Beauchemin, 2017, astrnmr.co/2uTRiqV
8. “Airbnb’s Profits to Top $3 Billion by 2020” by Leigh Gallagher, Fortune, 2017, astrnmr.co/2syKtKR
9. “Democratizing Data at Airbnb” by Chris Williams, Eli Brumbaugh, Jeff Feng, John Bodley, and Michelle Thom-
as, Airbnb, 2017, astrnmr.co/2uzEt5V
10. “Airflow: A Workflow Management Platform” by Maxime Beauchemin, Airbnb, 2015, astrnmr.co/2uA286c
11. “How Airbnb Democratized Data” by Olivia Timson, Innovation Enterprise, 2016, astrnmr.co/2sPjEpI
12. “How Airbnb Democratizes Data with Data University” by Jeff Feng, Erin Coffman and Elena Grewal, Airbnb,
2017 https://astrnmr.co/2v2hY8F
13. “The Value of Democratizing Data” by Samuel Greengard, Baseline, 2015, astrnmr.co/2vBVVcn
14. “How Data Democratization Can Deliver a Healthy Breakfast” by Errol Apostolopoulos, DataInformed,
2016,astrnmr.co/2vBsffB
15. “Democratizing Big Data to Bring Government Ahead of the Curve” by Quinton Alsbury, Wired, astrnmr.
co/2vB4Uuf