A Development of Log-based Game AI using Deep Learning

Introduce
Career:

•LINE+

•Samsung Electronics

•NCSoft

•Joyon

• S/W
Interesting:

•Beer

•BTS, Oasis

•Games

•3D Graphics Engine

•Deep Learning

•

1. Background

2. AI Requirements

3. Deep Learning

•Dataset, Preprocessing

•Neural Networks

•Train/Test, Inference

4. Demo

5. A/B Test
Index

Background
AI ' ', ‘ '
.
AI
FSM(Finite State Machine) ,
AI .
AI
..

Problem
State
 
.
..
경우의 수 경우의 수
경우의 수경우의 수
경우의 수경우의 수
경우의 수
경우의 수
…

:

“ 사람같은 AI 만들어주실 수 있나요? ” 
:
“ 네? 사람같은 AI요?
‘사람같음’ 을 정의하기 어렵고, 경우의 수가 너무 많아요.. ”

Project
A Log-based Human Like Game AI.
Oﬄine
AI Agent.

?
Our game planners needed a human-like defense AI
to play for players when they were ofﬂine..

AI requirements
1. When ( )
It predicts the Timing that a unit would be deployed.
2. Where ( )
It predicts the Location
that a unit would be deployed on a map (grid size: 42x35).
3. What ( )
It predicts the Cards that it would be picked.

Dataset
“ Action state 1446+1 Floating point ”
Index Meaningg
0 (Length: 1)
1~45 (Deck) (Length: 45)
46 ~ 945 (Length: 900)
946 ~ 1445 (Length: 500)
1446 Label (= Class, Action , ) (Length: 1)
0 1 2 3 4~6 7~9 10~12
•••
12 14 8 1
Unit Deck

2
Unit Deck
3
Unit Deck
4Player Unit
Cost
Unit Deck 1
Code Level Remain
Cost
46 47 48 49 50 51 52~57
•••
1446
10 6 21 13 0.78497 1
Unit 2
3
Unit 1 on Map
Code Level X Y Dead Rate Player Type Y(Class)
Index
Sample
Meaning
Index
Sample
Meaning
X DATA
Y DATA
Y = WX + B

Log format
1. => Players

2. => FrameCount

3. => BattleMap { ( / / /AI), ( / / / )}

4. => Action { ( ) }

Preprocessing
RAW ﬁle size 321.12 GB
Number of

battle counts
450,967
Tools Spark, Mesos, Zeppelin, Hadoop
Language Scala
System CPU 4 Cores, RAM 256 GB, HDD 14.5 TB
“ !!”

Parsing
using Spark, Zeppelin and Scala
“Zeppelin ”

Preprocessing
RAW data ﬁle size => 321.12 GB
Preprocessed ﬁle size => 48.8 MB
4 Cores, RAM 256 GB => 7 !!
;;
, ???
..

Preprocessing
RAW ﬁle size 321.12 GB
Number of

battle counts
450,967
Process time 21 mins
Preprocessed 
ﬁle size
48.8 MB
Tools Spark, Mesos, Zeppelin, Hadoop
Language Scala
System CPU 160 Cores, RAM 512 GB, HDD 14.5 TB
Time
4 Cores, RAM 256 GB => 7 hours

160 Cores, RAM 512 GB => 21 mins
Speed up(x20)

Neural Networks
What & Where & When
Binary Classiﬁcation
(Timing)
Regression
(Location)
using CNN
using LSTM
Multinomial Classiﬁcation
(Card)
using CNN
1D Convolution + Softmax
1D Convolution + Regression
Stacked LSTM + Softmax

What ( )
1446 x 1 1446 x 32 723 x 32 723 x 64 362 x 64 362 x 128 (1 x 1) x 46336 (1 x 1) x 512 (1 x 1) x 10
0
1
2
3
4
5
6
7
8
9
(1 x 1) x 10
Deck

Mask
Max Pooling

(1 x 2)
1D Conv

(1 x 5)
1D Conv

(1 x 5)
Max Pooling

(1 x 2)
1D Conv

(1 x 5) Flatten
Fully

connected

512 neurons
Fully

connected

10 neurons
Softmax
(1 x 1) x 128
Fully

connected

128 neurons
(1 x 1) x 10
Class

(One Hot Encoding Index) 0 7 5 2
Input
if Argmax = 0
Loss: Cross-Entropy
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(Y, Y’))
0.5
Probability
0.01
0.03
0.01
0.05
0.01
0.07
0.02
0.2
0.1
Sum = 1.0
• Log data: 1 Dimension.

• Property SNAPSHOT data.

• Not SCREENSHOT(Image) data.

Where ( )
1446 x 1 1446 x 32 723 x 32 723 x 64 362 x 64 362 x 128 (1 x 1) x 46336 (1 x 1) x 1024 (1 x 1) x 2
Max Pooling

(1 x 2)
1D Conv

(1 x 5)
1D Conv

(1 x 5)
Max Pooling

(1 x 2)
1D Conv

(1 x 5)
Flatten
Fully

connected

1024 neurons
Fully

connected

2 neurons
Regression
(1 x 1) x 128
Fully

connected

128 neurons
(1 x 1) x 2
Input
(10, 10)
<Game Map Grid>
Loss: L2 Distance
tf.reduce_mean(tf.squared_diﬀerence(POS_X, POS_X’))

tf.reduce_mean(tf.squared_diﬀerence(POS_Y, POS_Y’))
32
42
10 (X)
10 (Y)
Index Value
• Log data:1 Dimension.

• Property SNAPSHOT data.

• Not SCREENSHOT(Image) data.
0
1

When ( )
0.28 0.72 Output
LSTMLSTM
States States StatesInput
Time t-3 t-2 t-1
Softmax
t
Prediction
“Action” “Wait” “Wait” “Output”
Stacked

LSTM
Sequence length: 3
Sample
LSTMLSTM
(30 x 10)
Class

(One Hot Encoding Index) 0 1
LSTM
Loss: Cross-Entropy
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(Y, Y’))
Fully connected layer
LSTM
Action
if Argmax = 1
Wait ActionMeaning
http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf

When
(Regression , Distribution)
0: Wait
1: Action
0: Wait
1: Action
Frame count
Frame count
Frame count
Frame count

import tensorflow as tf
import numpy as np
base_ch = 32
layer1 = ConvLayer(X, IS_TRAINING, input_ch=1, output_ch=base_ch, kernel_size=5, name='conv1')
layer1 = PReLU(layer1, shape=base_ch, name='relu1')
layer1 = PoolingLayer(layer1, kernel_size=2, name='pooling1')
layer2 = ConvLayer(layer1, IS_TRAINING, input_ch=base_ch, output_ch=base_ch * 2, kernel_size=5, name='conv2')
layer2 = PReLU(layer2, shape=base_ch*2, name='relu2')
layer2 = PoolingLayer(layer2, kernel_size=2, name='pooling2')
layer3 = ConvLayer(layer2, IS_TRAINING, input_ch=base_ch * 2, output_ch=base_ch * 4, kernel_size=5, name='conv3')
layer3 = PReLU(layer3, shape=base_ch * 4, name='relu3')
flattened_shape = np.prod([s.value for s in layer3.get_shape()[1:]])
flatten = tf.reshape(layer3, [-1, flattened_shape], name="flatten")
fc1 = FCLayer(flatten, name="fc1", n_out=512)
fc1 = ReLUDropout(fc1);
fc2 = FCLayer(fc1, name="fc2", n_out=128)
fc2 = ReLUDropout(fc2);
fc3 = FCLayer(fc2, name="fc3", n_out=num_of_classes)
output_layer = fc3
Train

X = tf.placeholder(tf.float32, [None, num_of_feature_elements, 1], name="input")
Y = tf.placeholder(tf.int32, [None, 1])
MASK = tf.placeholder(tf.float32, [None, num_of_classes], name="mask")
IS_TRAINING = tf.placeholder(tf.bool, [], name="is_training")
DROPOUT_KEEP_PROB = tf.placeholder(tf.float32, name="dropout_keep_prob")
output_layer = tf.multiply(output_layer, MASK)
HYPOTHESIS = tf.nn.softmax(output_layer)
COST = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output_layer, labels=Y_one_hot))
OPTIMIZER = tf.train.AdamOptimizer(learning_rate).minimize(COST)
PREDICTION = tf.argmax(HYPOTHESIS, 1, name="prediction")
correct_prediction = tf.equal(prediction, tf.argmax(Y_one_hot, 1))
ACCURACY = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
x_batch_data = x_batch_data[shuffle_idx]
y_batch_data = y_batch_data[shuffle_idx]
mask_batch_data = mask_batch_data[shuffle_idx]
_, loss, acc = sess.run([OPTIMIZER, COST, ACCURACY],
feed_dict={X: np.expand_dims(x_batch_data, axis=2),
Y: y_batch_data, MASK: mask_batch_data,
DROPOUT_KEEP_PROB: 0.7, IS_TRAINING: True})
Train
Deck
Backpropagation .
<https://en.wikipedia.org/wiki/Gradient_descent, https://lazyprogrammer.me/tag/gradient-descent>

Test
x_batch_data = x_data[batch_mask]
y_batch_data = y_data[batch_mask]
mask_batch_data = mask_data[batch_mask]
pred, pred_values = sess.run([PREDICTION, HYPOTHESIS],
feed_dict={X: np.expand_dims(x_batch_data, axis=2),
MASK: mask_batch_data, DROPOUT_KEEP_PROB: 1.0,
IS_TRAINING: False})
for p, y in zip(pred, y_batch_data.flatten()):
class_try_count_list[p][0] += 1
if p == int(y):
total_correct_count += 1
class_true_count_list[int(y)][0] += 1

Save training logs
Import os
import shutil
import datetime
import time
timestamep = datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
save_path = root_path + "LV10_Save_CNN_WHAT_" + timestamep + "/"
result_train_file_path = save_path + "result_" + timestamep + ".txt"
if os.path.exists(save_path):
shutil.rmtree(save_path)
os.makedirs(save_path)
train_log_file = open(result_train_file_path, 'w')
def SaveAndPrint(file, str_message):
print(str_message, end="")
file.write(str_message)
file.flush()

Training logs
Hyperparameter
Input ﬁle
“ hidden size ?”
“ ?”
“ ?”
“X, Y Matrix shape ?”
X, Y

Training logs
Class
“ Step ?”
“Class ?”
“Ground Truth ?”

Training logs
Model
“ ?”
“ ?”
“ ?”
“ ?”
Log DB .

Training logs
Step Epoch Model Validation Code ,
Check .
,
Stop;
Hyperparameter .
(Batch size, Hidden size, Learning rate, RNN Sequence length, RNN Multi count, CNN layer count, CNN Filter size .. .. ;;)
AutoML ..

Experimental results
Gathering period 2018/03/13 ~ 04/24 ( 43 )
Number of battle counts 450,967
Total ﬁle size 321.12 GB
Neural Networks
CNN (Convolutional Neural Network)

LSTM (Long Short-Term Memory)
Tools Spark, Mesos, Zeppelin, Hadoop, Tensorﬂow, Unity3D
Language Python, Scala, C#, C
System GTX 1080, CPU i7-6700K, RAM 32GB, HDD 2TB
Accuracy for Card 83.99%
Accuracy for Timing 70.94%
Accuracy for Location 4.15 Distance (cm), (= 1 )

Inference:
LineGameTensorFlow library
libLineGameTensorﬂow.dylib / .so
TensorFlow Library , 
Model Inference Native Library

Inference: Game Loop
Hybrid
• RuleSet
•
• AI On/Off

Card PickCard(DeckInfo userDeckInfo)
{
Card card = RuelSet.RandomPick( userDeckInfo );
if( bUseDeepAI ) {
Card deepAICard = PickByDeepAI( userDeckInfo );
if( null != deepAICard ) {
card = deepAICard;
}
}
return card;
}
Real-time inference
1 : ( )
2 : / ( , , )
1
2 if AI ,
// .
" C# MonoBehaviour Inference ”
// Inference .
// ( )

!!
(RuleSet) AI
V.S.
AI
A deep learning AI was trained by the player logs. 
So, does it play like human?
Let’s watch clips!!

· 적절한 타이밍에 전략 구사
· 매번 다른 전략 구사
· 각개격파 전술 구사
· 즉각적
· 빠른 판단(대응)
· 코어 유저
· 능숙함
· 생각하게 함
· 기다렸다 유닛 소환
· 공격자 유닛 확인 뒤 방어 유닛 배치
· 탱커 앞 세우고 뒤에 딜러 소환
· 유저의 선택에 따라 적절한 유닛 소환
· 가끔 허를 찌르는 위치로 배치
· 적절하지 않는 곳에 유닛 배치
· 다양한 공격 방향과 진행에 잘 대응 못함
· 장기적 대응
· 클리어하기 어려움
· 단순한 패턴
· 적절한 유닛 소환
· 의미없는 유닛 소환
· 랜덤하게 유닛 소환
· 비효율적인 유닛 소환
· 상대를 공격할 수 없는 유닛 소환
· 건물 파괴 시점에 적절한 유닛 소환
· 미리 유닛 소환
· 실수가 덜한 느낌
· 적절한 배치
· 누군가가 지켜보고 있는 것 같음
· 라인 계속 유지
· 공격적인 패턴
· 단기적 대응
A/B Test
‣ /
‣
‣
‣ (Rule)
· 비슷한 플레이 패턴
· 다른 패턴
· 멍청함
· 긴장감
· 상성 유닛 소환
· 상황에 맞게 잘 대처함
· 능동적인(적극적인) 대응
· 구분하기 힘들다
· 둘다 재밌다
알고리듬 AI 딥러닝 AI

A Development of Log-based Game AI using Deep Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie A Development of Log-based Game AI using Deep Learning

Ähnlich wie A Development of Log-based Game AI using Deep Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Development of Log-based Game AI using Deep Learning