3. About Conduit
Over 250 million active end users
More than 260,000 publishers
Over 3 billion monthly user interactions
Deployed in 120 countries
Founded in 2005
Acquired Wibiya in 2011
6. Tip #1
Don't buy the hype of
‘big data’ and throw
millions of dollars away,
but don’t stand still.
7. Tip #1
Select 1 well defined use case
Small super-smart team
Experiment on the cloud
Quantify the effort and value for your organization
‘fail faster while failing forward’
10. Conduit’s Data Platform in Numbers
• Hardware:
125 Nodes (+70 after DR) on 6 racks
500TB Used/1.2 PB Total
• Daily processed data:
50,000 files
500,000,000 records
700 GB
• Daily jobs submitted: Over 5,000
• Data freshness: 60 minutes
11. Tip #2
Data is turning challenges
into business opportunities.
12. 8%
8%
9%
9%
10%
11%
13%
15%
19%
0% 5% 10% 15% 20%
analyze complete rather than partial data sets
other
Customer intelligence for more targeted
marketing
Include more semi-structure/unstructured info
into decision making
Improve scientific research
ETL
log analysis
Reduce cost of data analysis
Mine data for business intelligence
Use Cases
14. But…
Hadoop in the Enterprise Eco System – lot of the features
Enterprises need or want are put on the back seat
Hadoop is NOT cheap (H/W & operations cost) – Make
sure company’s decision makers are on board
Hadoop is still rough on the edges – tooling may not be
as mature as Enterprises are used to
Data access is batch oriented
16. Tip #3
Nurture your ‘big brains’
Hadoop cutting edge technology – Investment in related
skills and training is crucial
Good Data Scientists are “unicorns”
Embrace the Open Source culture it will payoff
BI team is essential for connecting the dots
17. Data Roles @ Conduit
Product
Mobile
Data Infra Team
Data BI Team
Data Science Team
Wibiya Quick Launch
Toolbar
BI
Scientist Scientist Scientist Scientist
BI BI BI
Other
Scientist
BI
19. Tip #4
Complex decision making is time consuming therefore
unable to react in real time
Real time is expensive!
Taylor the right solution to accommodate the required data
freshness
Focus on big things!
20. Data Maturity vs. Freshness @Conduit
0 10 60
Low
Medium
High
Real Time
Monitoring
Hue/Hive
Reporting
Service
Advanced
Analytics
Models
Business
Objective
Advanced
Analytics
Models
Reporting
Service
Freshness
Data Maturity
(Structured,
cleansed &
completed(
Hadoop
DWH
Kafka
22. Tip #5
Data will be dirty, schema-less, no foreign keys
And yet, we are standing on a mountain of gold!
Make your best and know when to shift to data analysis
Tune your algorithms to tolerate data deficiencies then
hunt for insights
Big data is not Data Warehouse
28. Tip #6
Break down barriers preventing our users/applications from
using their valuable data in more effective ways to glean
meaningful insights
Provide your users advanced self service tools to access the
data
Hadoop ecosystem evolving as we speak
Your performance is measured by the tools effectiveness
and ease of use
29. To Summarize…
• Start small
• Identify the opportunities
• Invest in people & related skills
• Adjust processes to the organization needs
• Know your data limits
• Self Service Tools are extremely important
Hadoop in the Enterprise Eco System
Hadoop is designed to solve Big Data problems encountered by Web and Social companies. In doing so, lot of the features Enterprises need or want are put on the back seat. For example HDFS does not offer native support for security and authentication.
Hadoop is NOT cheap
Hardware Cost - lets say a Hadoop node is $5000. A 100 node cluster would be $500,000 for hardware.
IT and Operations costs - teams like : Network Admins, IT, Security Admins, System Admins. Also one needs to think about operational costs like Data Center expenses : cooling, electricity ..etc
Hadoop is still rough on the edges
The development and admin tools for Hadoop are still pretty new. Companies like Cloudera, Horton Works, MapR and Karmasphere have been working on this issue. How ever the tooling may not be as mature as Enterprises are used to (say Oracle Admin ..etc)