Weitere ähnliche Inhalte Ähnlich wie Gaining Support for Hadoop in a Large Corporate Environment Ähnlich wie Gaining Support for Hadoop in a Large Corporate Environment (20) Mehr von DataWorks Summit Mehr von DataWorks Summit (20) Kürzlich hochgeladen (20) Gaining Support for Hadoop in a Large Corporate Environment1. Gaining Support for Hadoop
in a Large Corporate
Environment
Tuesday, June 3, 2014
Hadoop for Business Apps, Hadoop Summit
2. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Overview.
2
• Create the team
- Who are We
• Research challenge.
• Evaluate the data
- Resource Evaluation
• What did we learn?
- New Analytics
- New Benefits
- New Data
- New Infrastructure
• How did we move out of Research and into the Enterprise?
3. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
About Me.
3
• Jennifer Lim has over 14 years of experience in large enterprise data warehousing and
analytics. Most recently, she was a Research Scientist for the Sprint Advanced Analytics Lab
and is now acting as a Lead Technology Architect, focusing on upgrading the enterprise
analytics infrastructure in support of all those great use cases being discovered in the research
lab. She has an MBA from Avila University, with a BS from Iowa State University.
Jennifer.Lim@sprint.com
• Sprint is widely recognized for developing, engineering and deploying innovative technologies,
including the first wireless 4G service from a national carrier in the United States; offering
industry-leading mobile data services, leading prepaid brands including Virgin Mobile USA,
Boost Mobile, and Assurance Wireless; instant national and international push-to-talk
capabilities; and a global Tier 1 Internet backbone. www.sprint.com
About Sprint.
4. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
The Team.
Advanced Analytics Lab
4
• The CTO took a team focused on Network Technology Research and refocused them onto the
new “gold”: Data.
• Data Research Scientists and RF Engineers engaged in
- Mobile Internet Research
• Security & Privacy
• Location: location accuracy, population estimation
• Social Connection: social networks, influence, churn
- Network Research
• Wireless and IP Networks
• Wireless and wireline security: fraud prevention
- Architecture Research
• Performing data platform & tool evaluations
- Prototype Development
• Use Case Development
• Demonstration of new technologies & capabilities
Summer 2011
5. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Our Journey…
5
from data optimization
to a research idea
to a realization - was our data in the right place?
to developing a Hadoop-based analysis environment
to enhancing the technical capabilities of the enterprise data
warehouse
…to create Actionable Insights.
6. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Historically –
Data utilized for Optimization Tasks.
6
7. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
The Research Challenge.
7
XDRs
Voice
Texting
IP
Video
Websites Visited
Location
Applications Used
Social Networks
Calls & Texts
Find Insights
Available
No Where Else
Find New Use Cases
8. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Proof of Concept.
8
Transition --- from optimizing to asking questions about the data
October 2011
9. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Prototype Infrastructure.
9
• Current Enterprise infrastructure couldn’t be used to build the prototypes
- No formal IT project, so we couldn’t use IT resources.
- We didn’t have the funding to buy the latest & greatest.
- We needed something that could store a lot of data without a lot of prep.
- We wanted to experiment.
• Current Lab infrastructure couldn’t be used to build the prototypes
- Network focused
- File based, focused on finding specific traffic in same geo-location
• Look around, found some servers, dusted them off…grabbed open source Hadoop.
- 5 TBs, our servers were all memory & no disc
- 5 data nodes & 1 manager node
10. What Did We Learn?
New Analytics
New Benefits
New Data
New Infrastructure
10
11. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Analytics.
11
and creating this…
Finding ways to take network events…
Using Network data to create new Products,
Increase Customer Satisfaction, Attract new
Customers by providing actionable insights to
Customers and Enterprise decision makers
12. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Benefits. New Data.
12
• Incorporation of new, insightful data sets
• Incorporation of new, specialized business rules
• Geospatial! Techniques
• Examination of new
Business Intelligence
and Visualization tools
Becoming the Advocates & Demonstrators for new analytics
13. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Infrastructure.
Lab Cluster.
13
• Trials of distributions & server setups.
• Training of internal resources. Big Data User Group.
• Expansion of teams able to run Prototypes on the cluster.
- Usage Based Cost / Finance
- Application data transforms / Product
- Location Accuracy Improvement / Network
- Pathing Analysis / Marketing
- Device Behavior Analysis / Device
- Customer Text Analytics / Care
- ….
- approximately 1 Petabyte, our servers have 4 TB data drives and 256GB RAM
- 30 nodes…23 data nodes, with management nodes & visualization nodes
June 2013
14. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Infrastructure.
Production Cluster.
14
• Standard Visualizations and Analytics Tools Integrated.
• Funding Proven Use Cases.
• IT process & controls related to continuous data loading,
transformations, and reliability.
• Standards established.
• Resources scaled – from a team of 5 supporting the lab cluster to more
than 5 teams responsible for the system.
- Over 2 Petabytes, our servers have 4 TB data drives and 256GB RAM
(same as the lab cluster)
- 52 data nodes, with management nodes & visualization nodes
May 2014
15. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Enterprise Analytics Architecture.
Changes.
Agile:
• Enable faster development cycle
• Deal with structured & unstructured data
Scalable Hadoop environment:
• Billions of objects, high read/write volume, terabytes / petabytes
• Distribution model & consistency
Partnering Across the Enterprise. Big Data User Group.
• Marketing – Loyalty & Retention
• Network Development & Engineering
• Network Planning & Forecasting
• Finance Accounting
• Product – Consumer Aps & Entertainment
• Product – Messaging & Instant Communications
• Enterprise Architecture
• IT Application Development & Operations
• IT Data Management…
Hinweis der Redaktion As a presenter of advanced analytics proof of concepts to other corporations, I am questioned most frequently on the “how” by my audiences. Not the “how” about the technology or the data we used, but “how” we were able to gain momentum and support in a large corporate enterprise to incorporate new technology and practices in analytics. I will share with you how a major telecommunications company, Sprint, created a research team of just 8 people who were able to infect the Enterprise with new infrastructure, new data, and new analytics and transforming them into new business benefits.
When I speak with other companies on advanced analytics proof of concepts, the focus of their questions skips quickly past the “what” onto the “how” – how did we gain support, how did we find success, how did we decide which technology to select. I will share with you some of the lessons we learned as well as answer many of these questions. This discussion will showcase how Sprint, a major telecommunications company, went from issuing a research challenge to enabling the entire enterprise in the area of analytics. I’ll walk you through how we repurposed an existing team and started with our first Proof of Concept on Hadoop. We are now in the midst of setting up a multi-petabyte enterprise supported Hadoop system with multiple funded projects, are augmenting our research facilities, and have a long list of use case trials in the works.
Capture data “before” it is processed by the Enterprise databases
Merge streaming Data with static data from existing databases
Include geospatial tools from the start
Allow standard query language to allow anyone to access & use
Make it easy to create UDFs
Use off the shelf hardware and open source where possible
Use off the shelf visualization tools