[코세나, kosena] 금융권의 머신러닝 활용사례

AI, 머신러닝/딥러닝
- 금융권

머신러닝 분석 기법 적용 사례
3
Signal
Normalization
Feature Extraction /
Dimension
Reduction
Clustering /
Outlier Detection
Classification/
Prediction
Normalized
Signals
Temporal
Features
Clusters
Adaptation Feedback
Decision
Making
Raw Time
Series
Facts/
Truth
Signal processing techniques +
ICA- Independent Component Analysis
K-means
Random Fore
st
Conditions,
Unknowns

오픈소스
• 딥러닝 엔진 오픈소스화
• Open API 클라우드화
• 알고리즘의 오픈소스화
추천
동작인식
(키넥트)
이미지, 음성,
동영상 인식
스팸메일 탐지,
광고(애드센스),
질병진단
데이터 종류에 따라 다른 러닝 알고리즘 적용

AI가 주목받게된 이유
신경망 학습 방법 혁신
방대한 데이터세트
빠른 연산장치(GPU)
기존 방법의 한계를 뚫고!
지난 3년간 급속도로 발전
http://deview.kr/2013/detail.nhn?topicSeq=39

머신 러닝 구현 방법은
AI Platform

딥 러닝 구현 방법은
-엔비디아 딥러닝 전용 서버 :
파스칼 GPU 아키텍처 기반의 딥러닝 플랫폼
‘테슬라P4, P40’ 가속기와 함께 인공지능 추론 작업을 가속화하는
새로운 소프트웨어 ‘텐서RT’와 ‘딥스트림’ 아키텍쳐 제공
-IBM/HP…..????
델서버(?) + 엔비디아 GPU보드

머신러닝/딥러닝 프레임웍에서 지원되는 알고리즘
Google TensorFLow https://github.com/carpedm20/DCGAN-tensorflow
Mahout http://mahout.apache.org/users/basics/algorithms.html
Classification
Naïve Bayes, Hidden Markov Models, Logistic regression, Random Forest
Clustering
k-Means,Canpoy,Fuzzy k-Means,Streaming Kmeans,Spectral clustering
Spark https://spark.apache.org/docs/1.1.0/mllib-guide.html
Classification and regression
Linear models (SVM, Logistic regression, linear regression), decision tree, Naïve Bayes
Clustering
k-means
Collaborative filtering
alternating least squares (ALS)
Microsft Azure ML http://azure.microsoft.com/ko-kr/documentation/articles/machine-learning-algorithm-choice/
Clustering
K means
Classification
Decision Tree,SVM (Support Vector Machines),Naïve Bayes
Regression
Bayesian linear regression, Boosted decision tree regression, decision forest regression,linear regression, neural network regression, ordinal
regression, poisson regression

Bigger Data. Better Insights.™
스카이트리는 엔터프라이즈급의
머신러닝 전문업체입니다.
CONFIDENTIAL

2012, Alexander Gray, Ph.D., Associate Professor, Georgia Tech
Berkeley, Carnegie Mellon, NASA Jet Propulsion Lab
Software plaLorm that provides enterprise class Machine Learning
for Big Data that lets Data Scientists & BI Analysts create more Acc
urate Predictive Models in Less Time
Freemium download, software subscription, node-based pricing model.
On-prem or in-cloud deployment.
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’
Prof. David PaXerson, UC Berkeley: systems (inventor RISC, RAID)
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)
Prof. James Demmel, UC Berkeley: high-performance computing (LAPACK)
Extended scientiﬁc advisory council consists of top ML professors from 20 or so
universities (CMU, Princeton, Caltech, Purdue, Cambridge, etc)
Launched
Product
Business Model
Investors
Technical Advis
ory Board (Nat
l. Academy me
mbers)
Academic
Network
2
CONFIDENTIAL
스카이트리는 머신러닝 전문업체입니다.

Financial Services
Manufacturing Healthcare
Technology Services Other
Information Providers
CONFIDENTIAL
스카이트리의 고객들

머신런닝은…
4CONFIDENTIAL
통계학, 데이터 마이닝, 패턴 인식, 고급 예측 분석과 알고리즘을
활용하여 데이터로부터 패턴을 찾고 예측하는 현대 과학입니다.
기계학습 알고리즘(또는 software)는 수백 수천가지의 데이터에 대한
질문에 아주 정확한 최선의 답하도록 프로그램 되었습니다.
이 알고리즘은 새로운 데이터에 대하여 적응하며, 더 영리해져서 새로운
관계와 패턴을 정립하고 새로운 예측과 권고안을 제시하게 됩니다.

머신러닝 활용분야
Predict categories and classes
Predict values and numbers
Grouping and segmentation
Detection and characterization
Visualization and reduction
Find similar items
Classification R
egression Clu
stering
Density Estimation Di
mension Reduction Multidime
nsional Querying
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor,
Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions Outlier
Detection

Speed andScale
(Run MoreExperiments)
Skytree는 병렬 분산 컴퓨팅환경에서 처리합니다.
더 적은 양의 연산으로 계산할 수 있도록
알고리즘을 재구성합니다.
알고리즘은
C++ 와 Open Source Scripting 으로 작성됩니다.
1
4CONFIDENTIAL
스카이트리는 정확하고 빠르며 확장성을 제공합니다.

모든 데이터를 사용합니다.
샘플데이터 크기 제한이 없습니다.
구조적+비구조적 데이터
More Data
(See MoreIndicators)
Speed andScale
알고리즘은
1
5CONFIDENTIAL

Automation
(Ease of Use and Interpretability)
자동 문서화 기능
더 이상의 블랙박스는 없습니다.
고급 알고리즘을 제공합니다.
분석에 더 많은 인자를 활용할 수 있습니다.
1
6CONFIDENTIAL
모든 데이터를 사용합니다.
샘플데이터 크기 제한이 없습니다.
구조적+비구조적 데이터
More Data
(See MoreIndicators)
Speed andScale
알고리즘은

스카이트리는 대부분의 비즈니스 문제를 해결합니다.
완벽한 확장성으로 더 정확한 결과를 어떤 제품보다도 빠르게 도출합니다.
Examples of High Value Analytics UseCases
Customer
• Segmentation
• Recommendation
• Churn
• Lead Scoring
• Pricing
• Credit Scoring
Risk & Security
• Fraud Analysis
• Risk Analysis
• Anomaly Detection
• Cyber Security
• Situational Proﬁling
• Pattern of Life
Operational
• Prescriptive Maintenance
• Default/Fault Detection
• Supply Chain
• Cost Forecasting
• Operational Analysis
• Failure Analysis
CONFIDENTIAL

9
CONFIDENTIAL
“정확도가 개선된 모델이 필요
합니다.”
머신러닝? 막상 해보려면 너무 힘이 듭니다.
“전체 분석 과정에 대한 감사기능과
투명성이 손실 됩니다.”
“데이터 중복이 되지 않아야 합니다.”
“ 작업을 단순,자동화함으로써 더 쉽
게 기계학습을 적용하고 싶습니다.”
“머신러닝에 모든 CPU와 메모리 자
원이 소모되지 않아야 합니다.”
“성능을 위하여 샘플링을 통해 크기를
줄여야 합니다.”
“전체 학습성능이 너무 느립니다.”
“모델을 다시 만들어 식별한 결과를 최선이라고
신뢰할 수 있는 모델 해석능력이 부족합니
다.”
“모델을 구축하고 배치하는데 너무 많은 시간을 사용합니다.”
“데이터 과학자들이 예측 문제를 해결하는 데 더 많
은 시간을 사용할 수 있도록 일상적인 작업에 들어가는
시간이 줄었으면 좋겠습니다.”
“해석 능력과 모델 성능이 낮아서 적용할 수 있는
방법이 기본적인 수준입니다. 이러다 보니 정확성
이 매우 떨어집니다.”
“데이터 과학자를 고용해서 작업을 진행하는 것이
어렵습니다.”
“시간적인 제약으로 데이터 과학자들이 더 많은 실험을 하지 못합니다.”

전형적인 데이터분석:DisparateTools,ManualProcesses
Data Prep:
다양한 도구에 맞도록 데
이터를 변환, 융합
Validation:
정확도 검증작업 계속
Deployment:
실제 운영될 모델 구현
Method Selection:
수동으로 선택하여 재작업
Parameter Selection:
최선의 결과를 도출할 수 있도록 서
로 다른 인자를 사용하여 반복
Sampling:
한정된 성능을 이유로 일부 데이
터만 사용
t1 t4t3
Timeline
(Months/Quarters) t
CONFIDENTIAL
2
Prediction / Results
New Data
별도의 검증 데이터 준비
t0
Skill level: PhDsThroughout

Skill level: Data Analysts, freeing PhDs to focus on high leveragechallenges
스카이트리는 통합적인 방법을 제공:Automate& Sustain
Better Results-
Much Faster & Easier
Uniﬁed Skytree Environment
New DataAutomated Project Oriented Workspace
전체 데이터로
향상된 결과 도출
Single Click AutoModelTM
작업 시간 단축을 위한 원스텝 학습-조정-테스트
Timeline
(Months/Quarters) t0
t1 t4t3t2
CONFIDENTIAL
다양한 데이터
통합하여
향상된 결과 제공
데이터 변환
시간 단축 자동화된 방법론, 인자 선택, 유효성 검증 시간 단축
운영 모델 자동 추출
전체 프로세스 감사를 통한 모델 문서 자동화

CONFIDENTI
Customer 360oView
External DataBig Data
Environment
DataData
Data warehouse
E-MailCRM
Single Customer View
with improved decision making
capabilities based on Customer
data
Big Data
Enabling innovative products
& services, customer
satisfaction
Analytics
Churn propensity and prevention,
Product Sentiment, Recommendations and m
ore.
Internal Data

스카이트리 소개
Skytree는 데이터 과학자 및 IT 조직의 특정한 분석
요구사항을 충족시킵니다.
스카이트리는 Cloudera, Hortonworks, MapR 및 Amazon
EMR에 대해 Hadoop 인증을 받았습니다.
스카이트리는 익숙한 기계학습 방법을 PMML 및 JAR
파일을 포함한 업계 표준의 배포 옵션과 함께 제공합니다.
스카이트리의 기계학습 변환은 구조화된 데이터와
비정형화된 데이터를 융합 및 지원합니다. 또한 SPARK
통합, YARN 지원 및 프로젝트 기반 GUI를 제공합니다.

-고도로 확장가능한 알고리즘
Skytree는 오픈소스 옵션에 비해 최대
150배까지 기계학습 방법의 속도를
높입니다.
-깊이 최적화된 알고리즘을 사용하여
Skytree는 메모리 내에서 분석을 수행
하고 최신 고성능 컴퓨팅 기술을 사용
합니다.
-동일한 결과를 얻기 위해 수학적단계
를 줄임으로써 Skytree는 시장에서 가
장 빠른 기계학습 소프트웨어로 입증
되었습니다.

-데이터 과학자를 위한 인공지능으로 일반
데이터과학자도 Skytree의 획기적인
AutoModel 기술로 정확한 기계 학습 모델을
구축할 수 있습니다.
-특허 출원중인 글로벌 최적화 분석을 사용
하여 알고리즘 및 매개변수 선택을 자동화함
으로써 Skytree는 수주 또는 수개월의 노력
을 절약합니다.
-Skytree는 수백 가지의 실험을 수동으로 실
행하여 최상의 알고리즘과 매개 변수를 결정
하는 대신 한 번의 클릭으로 수행합니다.

자체 문서 모델링모델
-Skytree는 데이터 과학자들이 ML 결정의 논리를 시각화하고 이해할 수 있게합니다. -Skytree는 사용된 모든 데이터세트, 데이
터 분할 완료, 변환 적용, 알고리즘 실행 및 Skytree로 구축된 모든 모델에 대해 얻은 결과를 기록하는 시각적문서를 제공합니
다.

모델 해석 가능성
-결과는 Skytree의 독보적인 해석 도구로 동료, 관리 및 규제자에게 설명하고 정당화하기 쉽습니다.
-변수 중요성을 포함한 기계 학습 결과의 논리에 대한 가시성을 확보함으로써 과학자는 자동화된 모델링의 결정을 보
다 잘 이해하고 재현 할 수 있습니다.

엔드 - 투 - 엔드 플랫폼
-Skytree는 다른 도구세트 또는 기계학습 라이브러리 모음이 아닙니다. 큰 데이터에서 기계 학습을 위한 종단간 엔터프라이즈
플랫폼입니다.
-당사의 소프트웨어는 데이터 준비기능, 고급 기계학습 알고리즘 및 모델을 다양한 형식으로 구축 및 배포 할 수 있는 옵션을
통해 견고한 예측 문제를 해결할 수 있도록 설계되었습니다.

프로그래밍 방식 및 GUI 액세스
-관리자, 선임 데이터 과학자 및 시민 데이터 과학자 모두가 쉽게 채택 할 수있는 GUI를 통해 또는 Java, Python 또는 Skytree
Command Line Interface에서 프로그래밍 방식으로 Skytree에 액세스할 수 있습니다.
-모델에는 자체 문서화 기능이 있기 때문에 사용된 모든 데이터세트, 완료된 데이터 분할, 수행된 변환, 알고리즘 실행 및 결과를
기록하는 전체 감사추적 기능이 있습니다.

Skytree 고유의 차별화 : 기본 기술 혁신
Complexity of State-of-the-Art Machine Learning methods:
1. Querying: all-nearest-neighbors O(N2)
2. Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3)
3. Classification: logistic regression, decision tree, neural nets, nearest-neighbor classifier
O(N2), kernel discriminant O(N2), support vector machine O(N3),
4. Regression: linear regression, LASSO, kernel regression O(N2), regression tree, Gaussia
n process regression O(N3)
5. Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), maximum
variance unfolding O(N3); Gaussian graphical models, discrete graphical models
6. Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)
7. Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 2-sample
testing O(Nn), n=2, 3, 4, …
► Unfortunately O(N2), O(N3) are computationally prohibitive for big dataSkytree has invented a way to reduce the complexity of above metho
ds from O(N2) and O(N3) to O(N) or O(N log N).

성능비교
Up to 10,000x
speedups
(on one CPU)

스카이트리가 구현하는 방식
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed

35% 42%4 Sec
고객 활용사례
Major PainPoints: Speed & Accuracy
Current Solution: Hadoop/Mahout,Homegrown
You might enjoy:
~1500x execution speedup
20% improvement in reco
mmendation relevance
SKYTREE
Results
“We are literally speechless”
Recommendation Engine (like Netﬂix)
You’ve enjoyed:
- Skytree Customer
Skytree Impact
LEGACY
LEGACY
SKYTREE
97 Min
(5,820 sec)
Runtime
Precision
CONFIDENTIAL

100 Min 8 Min
Legacy Environment:
100 Node Hadoop Cluster:
1,200 Cores
Runtime: 100 Minutes
Accuracy (Gini): 57%
고객 활용사례
Major Pain Points: Speed & Accuracy
Current Solution: SAS, Hadoop,Homegrown
“I want our analysts to create models with Skytree rather than writing software”
Micro-TargetingApplication
SKYTREE Server:
Single Server:
12 Cores
Runtime: 8 Minutes
Accuracy (Gini): 60%
Skytree Impact
- Skytree Customer
SKYTREE
LEGACY
12.5x improvement on 1 node,
~1200x expected improvement
on 100 nodes
5% improvement in accuracy
Time
CONFIDENTIAL

Customer Pain: $500M+/year infraud
BeforeSkytree:
• Fraud model updatedannually
• Internally developed algorithms
• Model accuracy maxedout
• ModeldevelopedonLinuxServers
Client Win:
• Business Impact: Chargebacks greatly reduced
• Operational Impact: 300X shorter threat response time
• Financial Impact: $50M+ savings annually
With Skytree
Modelsupdatedweekly Model accuracyimproved~10%
"Skytree는 기계학습의 성과에 눈을 뜨게 해줬습니다. 심화되고 개인화된 고객 이해를 통
해서 리스크 관리 부서 모두를 생각을 변화시켰습니다. Skytree를 사용함으로써 우리가
1990년부터 지금까지 해온 사기행위탐지 성능을 10% 향상시켰습니다. 이것은 대단한 것
입니다.”
14
CONFIDENTIAL
고객 활용사례-FDS

R Skytree
고객 활용사례
Test Suite 1: 20-88x execution
speed-up on same data sets
Test 2: >50,000x increase in
data size and ran to completion
R Skytree
보험 : 이익 최적화 Skytree Impact
Application: Profit optimization through
• Loss Prediction
• Binding
• Retention
• Price Elasticity
Major Pain Points: Speed & Scale
Current Solution: R, Hadoop,
Homegrown
Speed-up Scale
Using up to 450 million rows and 450 attributes

비정상적인 보험 청구를 감지
비즈니스 과제
대형 보험 회사는 사기성 및 비정상적인 거래를 많이
겪을 수 있지만 모두 조사 할 수있는 자원이 없습니다.
사기 사례가 누락 됨
Skytree 솔루션
계정, 고객 및 거래에 여러 데이터 소스를 활용하여 허위
사례를 식별하고 허위 경보에 대한 허용 오차를 허용하는
Machine Learning 솔루션.
사기 거래를 암시하는 특정 패턴
비즈니스 이점
더 많은 사기 거래 적발
미래의 특정 종류의 사기를 방지하기 위해 조사 단계를
설정하는 데 사용되는 식별 된 패턴
36

CustomerPain:
• Highdata center equipment costs
• Outages hurt usersatisfaction
• Huge& rapidlygrowingmachinedata
volumefromthousandsof feeds
BeforeSkytree:
• Overprovision to cover anticipatedpeaks
• CapExwaste
• Outages went unnoticed untilcustomers
complained
• Reactive
Client Win
Business Impact: Higher user and merchant satisfaction
Operational Impact: Next gen architecture enabled Financ
ial Impact: Estimated $20-30M savings/year
Enabled With Skytree
• Provision only what’s reallyrequired
• Monitor thousands of systems, socialmediafeedsat
>25TB/hour
• Takeaction before merchantscomplain
“패턴과 이상징후 감지하여 사용자가 좋지않은 경험을 하지 않도록 행동을
취하는 것이 우리의 목표입니다. 우리는 Skytree로 기본적으로 모든 데이터
를 수집하고, 취합하여 모든 스트림에 대한 이벤트를 연관 분석하게 되었습
니다..”
고객 활용사례 - Datacenter Optimization
9
CONFIDENTIAL

스카이트리는 복잡하고 어려운 문제를 해결합니다
16
CONFIDENTIAL
전체 데이터를 사용하는 것은 이런 경우에 중요합니다.
실시간으로 사기행위의 첫 이벤트를 탐지해야 할 경우.(바늘 찾기)
• 작은 사이즈로 샘플링 되는 경우 현상을 거의 찾을 수 없습니다.
더욱 향상된 정확도로 다음 상품을 추천해야 할 경우.
• 고객의 행동은 시간이 지나면서 복잡하게 변화합니다.(“Rare Item” or “Hot Seller”)
더 정확하게 초기에 이탈고객을 예측해야 할 경우.
• 작은 데이터 집합에서는 이탈을 일으키는 약한 신호를 찾을 수 없습니다.

스카이트리 아키텍쳐 -Architected for Speed andAccuracy
Machine
Learning
Algorithms
Deeply
Optimized
In
Memory
Execution
P A R A L L E L
In Memory
Execution
CPU CPU
I Z E
In Memory
Execution
CPU CPU
• 심도있게 최적화된 알고리즘
(n,nlog(n)calculationsversusn2 andn3)
• 정확도의손실없는병렬처리
• 하둡 노드에서 직접 실행
• 메모리기반 실행
• 대용량 하둡 스케일링 (Hadoop scalingw/TrueScaletm)
• 내부노드 트래픽 최소화
Skytree Fast Internode Communication
CONFIDENTIAL

스카이트리 성능자료:Speed& Eﬃciency
Scikit-learn
R
MLlib
Skytree
26x
128x
153x
GBTR, Single Node, 13 million rows (in 1000s of seconds)
0 5 10 15 20 25 30 35
GBTR, Multi-node, 10M-100M Rows (in 1000s of seconds)
0
2000
4000
6000
8000
10000
0 20 40 60 80 100
Time
n
Skytree Deep Optimizations O(
n²), O(n³) vs. O(n), O(nlog(n))
n³
n²
nlog
(n)
n
0 10000 20000 30000 40000 50000
Single node
8 nodes
- Skytree
Mllib - did not complete
CONFIDENTIAL
MLlib 71x slower
- Skytree
• 단일 노드환경에서경쟁제품보다뛰어난성능을나타냅니다.
• 다중노드클러스터에서경쟁제품보다뛰어난성능을나타냅니다.
• 데이터가있는경우성능스케일은 O(nlog(n))입니다.

스카이트리의 장점
• GUI기반의 간편한 설정 변경
• ML데이터 준비, 모델 개발, 배치 작업 단순화
• 변경과 변경 행위자에 대한 추적감사
• AutoModel& SmartSearch : One step 학습-조정-테스트
• 도구모음 : GUI, CLI, Python & Java SDKs, REST API’s, ML 변환, 특
징 추출, 방대한 모델 선택
• GUI: Model comprehension, Variable importance,tree visualization;
모델 학습에 대한 결과 뷰
CONFIDENTIAL

스카이트리의 High LevelArchitecture
Flexible Delivery On PremisesCloud
Production
CONFIDENTIAL

Skytree: Machine Learning Built for theEnterprise
CONFIDENTIAL
이승훈 실장 kosena21@naver.com 010-9338-6400

[코세나, kosena] 금융권의 머신러닝 활용사례

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie [코세나, kosena] 금융권의 머신러닝 활용사례

Ähnlich wie [코세나, kosena] 금융권의 머신러닝 활용사례 (20)

Mehr von kosena

Mehr von kosena (8)

[코세나, kosena] 금융권의 머신러닝 활용사례

Hinweis der Redaktion