SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Scaling Big Data Mining Infrastructure:
The Smart Protection Network Experience
黃振修 (Chris Huang)
SPN 主動式雲端截毒技術架構師
About Me
• SPN 主動式雲端截毒技術架構師
• SPN Hadoop 基礎運算架構師
• Hadoop in Taiwan 2013 講師
• Hadoop.TW 活躍成員
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
The Journey to Big Data
3
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4
YesterdayYesterdayYesterdayYesterday
~40 Hadoop nodes
~15 Service/user accounts
3 Teams
<50 TB storage
<100 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5
TodayTodayTodayToday
~200 Hadoop nodes
~130 Service/user accounts
11 Teams
~500 TB storage
>16000 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6
1 MapReduce Job
Submitted
Each 5.4 Seconds
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7
Why?Why?Why?Why?
Raw Data
Actionable
Intelligence
Collaboration in the underground
網路威脅呈現爆炸性的成長
New Unique Malware Discovered
各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上
的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形
成嚴重的資安威脅
1M
unique
Malwares
every
month
1M
unique
Malwares
every
month
Reality Check
2011
New Unique Threats per Hour
(worldwide estimate*)
Network
Worms
Threats Found in Enterprises
(Real-world data from 150+ assessments*)
Data-Stealing
Malware
IRC
Bots
Targeting
Malware
COMPLEXITY
DANGER
Dangerous RisksSkyrocketing Volume Avoiding Detection
42%
56%
77%
100%
2010200920082007
12600
NEW
Threat Every
0.28
Seconds
2400
• 52% of companies failed to report or remediate a cyber breach
in 2011. --- SAIC, 2011
• Two new pieces of malwares are created every second. ---
Trend Micro, 2012
• A cyber intrusion occurs every 5 minutes. --- US CERT 2012
Traditional approach is no more sufficient!
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Exploration
17
New approach for cyber threat solution
Web CrawlerWeb Crawler
Trend Micro
Endpoint Protection
Trend Micro
Endpoint Protection
Trend Micro
Mail Protection
Trend Micro
Mail Protection
Trend Micro
Web Protection
Trend Micro
Web Protection
HoneypotHoneypot
CDN / xSPCDN / xSP Researcher
Intelligence
Researcher
Intelligence
3+ Billion Worldwide Sensors
SPN: Smart Protection Network
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19
Collects
Protects
Identifies
BIG
DATA
ANALYTICS
(Data Mining,
Machine Learning,
Modeling, Correlation)
DAILY STATS:
• 7.2 TB data correlated
• 1B IP addresses
• 90K malicious
threats identified
• 100+M good files
SPN High Level Architecture
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20
Receiver
Trend Message Exchange (Message Bus)
Hadoop Distributed File System (HDFS)
HBaseMapReduce
Adhoc-Query (Pig)
Oozie
CDN/xSP
Log
Honey
Pot
SPN
Feedback
Data SourcingData Sourcing
APP 1
MySPN Platform
Solr Cloud
API Server/Portal
Service Platform
APP 2
Service DeliveryService Delivery
MySPN Ecosystem
Portal
& API
Single
Entry-Point
SPN Infrastructure
APT KB Service
TopCVE Service
APT KB
VE DB
FB Logs
Census
MySPN
Market Place
Service Platform
SSO
New App
OPS RD / Team
Monitor SDK
All My
Guard
Threat
Connect
Dashboard
Service
Catalog
Census
Profile Alert
New App
Dispatcher
Access
Login
Trender
Need
Solution
Customer
Publish
ImplementOperate
Develop
Solution
backed-by
Data Catalogue
SPN Solution Architecture
File
URL
Web /
URL
Email
Domain
IP
File Reputation ServiceFile Reputation Service
Email Reputation ServiceEmail Reputation Service
Customer
SmartProtection
Community Intelligence
(Feedback loop)
Web Reputation ServiceWeb Reputation Service
Sourcing
Processing
& Analysis
Validate &
Create Solution
Quality
Assurance
Solution
Distribution
Solution
Adoption
SPN Correlation
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Case Study
23
Internet Web Server
4. Access page
1. Intercept URL
SPN Cloud
9/10/2013 24
200K+ new URL created every day
Case Study: Web Reputation Services
8+ billions URL process daily
User Traffic / Sourcing
CDN vender
Rating Server for Known
Threats
Unknown & Prefilter
Page Download
Threat
Analysis
8 billions/day
4.8 billions/day
860 millions/day
40% filtered
82% filtered
25,000 malicious URL /day
99.98% filtered
Trend Micro
Products / Technology
CDN Cache
High Throughput Web Service
Hadoop Cluster
Web Crawling
Machine Learning
Data Mining
Technology Process Operation
Block malicious URL within 15 minutes once it goes online!
WRS Architecture Overview
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Lesson Learned
27
How to Scale?
• Un-structure data first
• If you really need structure data
– Use Google Protocol Buffers or
– JSON string
• Purify your data before processing
• Leverage HBase more
– Well design row key to prevent hot-spot
• Use MapReduce to create Lucene index
• Leverage SolrCloud for complex real-time use cases
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
Our Learning
• Has clear strategy first
• Start small, scale quickly
• Chose right solution for right problem
Q&A
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31
Big Challenge
Big Opportunity
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionDatabricks
 
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...Jeff Hung
 
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis SystemsJeff Hung
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningCarol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsDatabricks
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaSpark Summit
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 

Was ist angesagt? (20)

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
 
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
 
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 

Ähnlich wie Scaling big-data-mining-infra2

Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...ActiveState
 
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaSharad Agarwal
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicAmazon Web Services
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsMarco Casassa Mont
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Datainside-BigData.com
 
Webinar splunk cloud saa s plattform für operational intelligence
Webinar splunk cloud   saa s plattform für operational intelligenceWebinar splunk cloud   saa s plattform für operational intelligence
Webinar splunk cloud saa s plattform für operational intelligenceGeorg Knon
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Hong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - KeynoteHong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - KeynoteAmazon Web Services
 
Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Accel
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak
 
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapRThe Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapRThe Hive
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 

Ähnlich wie Scaling big-data-mining-infra2 (20)

Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
 
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS Analytics
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
 
Webinar splunk cloud saa s plattform für operational intelligence
Webinar splunk cloud   saa s plattform für operational intelligenceWebinar splunk cloud   saa s plattform für operational intelligence
Webinar splunk cloud saa s plattform für operational intelligence
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Hong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - KeynoteHong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - Keynote
 
Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapRThe Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 

Mehr von Chris Huang

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learningChris Huang
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10Chris Huang
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Chris Huang
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsChris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong heChris Huang
 

Mehr von Chris Huang (20)

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
 

Kürzlich hochgeladen

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Scaling big-data-mining-infra2

  • 1. Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃振修 (Chris Huang) SPN 主動式雲端截毒技術架構師
  • 2. About Me • SPN 主動式雲端截毒技術架構師 • SPN Hadoop 基礎運算架構師 • Hadoop in Taiwan 2013 講師 • Hadoop.TW 活躍成員 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
  • 3. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. The Journey to Big Data 3
  • 4. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4 YesterdayYesterdayYesterdayYesterday ~40 Hadoop nodes ~15 Service/user accounts 3 Teams <50 TB storage <100 Jobs per day
  • 5. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5 TodayTodayTodayToday ~200 Hadoop nodes ~130 Service/user accounts 11 Teams ~500 TB storage >16000 Jobs per day
  • 6. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6 1 MapReduce Job Submitted Each 5.4 Seconds
  • 7. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7 Why?Why?Why?Why? Raw Data Actionable Intelligence
  • 8. Collaboration in the underground
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. 網路威脅呈現爆炸性的成長 New Unique Malware Discovered 各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上 的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形 成嚴重的資安威脅 1M unique Malwares every month 1M unique Malwares every month
  • 15. Reality Check 2011 New Unique Threats per Hour (worldwide estimate*) Network Worms Threats Found in Enterprises (Real-world data from 150+ assessments*) Data-Stealing Malware IRC Bots Targeting Malware COMPLEXITY DANGER Dangerous RisksSkyrocketing Volume Avoiding Detection 42% 56% 77% 100% 2010200920082007 12600 NEW Threat Every 0.28 Seconds 2400 • 52% of companies failed to report or remediate a cyber breach in 2011. --- SAIC, 2011 • Two new pieces of malwares are created every second. --- Trend Micro, 2012 • A cyber intrusion occurs every 5 minutes. --- US CERT 2012
  • 16. Traditional approach is no more sufficient!
  • 17. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Exploration 17
  • 18. New approach for cyber threat solution Web CrawlerWeb Crawler Trend Micro Endpoint Protection Trend Micro Endpoint Protection Trend Micro Mail Protection Trend Micro Mail Protection Trend Micro Web Protection Trend Micro Web Protection HoneypotHoneypot CDN / xSPCDN / xSP Researcher Intelligence Researcher Intelligence 3+ Billion Worldwide Sensors
  • 19. SPN: Smart Protection Network 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19 Collects Protects Identifies BIG DATA ANALYTICS (Data Mining, Machine Learning, Modeling, Correlation) DAILY STATS: • 7.2 TB data correlated • 1B IP addresses • 90K malicious threats identified • 100+M good files
  • 20. SPN High Level Architecture 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20 Receiver Trend Message Exchange (Message Bus) Hadoop Distributed File System (HDFS) HBaseMapReduce Adhoc-Query (Pig) Oozie CDN/xSP Log Honey Pot SPN Feedback Data SourcingData Sourcing APP 1 MySPN Platform Solr Cloud API Server/Portal Service Platform APP 2 Service DeliveryService Delivery
  • 21. MySPN Ecosystem Portal & API Single Entry-Point SPN Infrastructure APT KB Service TopCVE Service APT KB VE DB FB Logs Census MySPN Market Place Service Platform SSO New App OPS RD / Team Monitor SDK All My Guard Threat Connect Dashboard Service Catalog Census Profile Alert New App Dispatcher Access Login Trender Need Solution Customer Publish ImplementOperate Develop Solution backed-by Data Catalogue
  • 22. SPN Solution Architecture File URL Web / URL Email Domain IP File Reputation ServiceFile Reputation Service Email Reputation ServiceEmail Reputation Service Customer SmartProtection Community Intelligence (Feedback loop) Web Reputation ServiceWeb Reputation Service Sourcing Processing & Analysis Validate & Create Solution Quality Assurance Solution Distribution Solution Adoption SPN Correlation
  • 23. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Case Study 23
  • 24. Internet Web Server 4. Access page 1. Intercept URL SPN Cloud 9/10/2013 24 200K+ new URL created every day Case Study: Web Reputation Services
  • 25. 8+ billions URL process daily User Traffic / Sourcing CDN vender Rating Server for Known Threats Unknown & Prefilter Page Download Threat Analysis 8 billions/day 4.8 billions/day 860 millions/day 40% filtered 82% filtered 25,000 malicious URL /day 99.98% filtered Trend Micro Products / Technology CDN Cache High Throughput Web Service Hadoop Cluster Web Crawling Machine Learning Data Mining Technology Process Operation Block malicious URL within 15 minutes once it goes online!
  • 27. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Lesson Learned 27
  • 28. How to Scale? • Un-structure data first • If you really need structure data – Use Google Protocol Buffers or – JSON string • Purify your data before processing • Leverage HBase more – Well design row key to prevent hot-spot • Use MapReduce to create Lucene index • Leverage SolrCloud for complex real-time use cases 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
  • 29. Our Learning • Has clear strategy first • Start small, scale quickly • Chose right solution for right problem
  • 30. Q&A 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
  • 31. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31 Big Challenge Big Opportunity Thank You