由點、線至面：從影像分析角度探討漫畫的組成與風格－朱威達

由點、線至面：從影像分析角度
探討漫畫的組成與風格
朱威達
中正大學資訊工程學系
wtchu@ccu.edu.tw
1

Fair Use Declaration
• This statement is submitted for elaborating the legitimate status for illustrating all the “Screen Printings” and
“Comics” in the “由點、線至面：從影像分析角度探討漫畫的組成與風格” are cited under the doctrine of “Fair
Use” for research purpose if copyright protection applies on them.
• The legal doctrine establishes globally that originality is needed to be seen for a work pursuing copyright protection,
namely, the originality is the very essence of creation in intellectual domain. On account of that, an automatically
recorded screen motion of the interactive computer games can be deemed as no copyright protection on it,
therefore it can be lawfully applied in the “由點、線至面：從影像分析角度探討漫畫的組成與風格” as part of
the research materials without the written permission from the copyright owner of the computer games. However,
some people might treat them as copyright protected materials still for the drawings or similar creations in the
background of the animations or comics, if that applies, according to the international intellectual property
agreements and copyright law in respective jurisdictions, such as Agreement on Trade-Related Aspects of
Intellectual Property Rights (TRIPS) article 13, Berne Convention for the Protection of Literary and Artistic Works
article 9(2), EU Copyright Directive article 5(5), Copyright Law of the United States of America section 107 and
Taiwan Copyright Act article 65, the Fair Use and Fair Dealing of a copyrighted work based on teaching, scholarship,
or research shall applied under the circumstances to sustain the citation for all the “Screen Printings“ and “Comics”
in the “由點、線至面：從影像分析角度探討漫畫的組成與風格” as legitimate action abided by at law, which do
not conflict with a normal exploitation of the works and do not unreasonably prejudice the legitimate interests of
the right holders.
2

Introduction
• Comics-based presentation for
movie, animation, and photos,
emerges recently.
• Comics are believed to be an ideal
medium for visual storytelling
because of rich expressivity, high
interactivity, and high portability.
3
Sample generated comic pages from the animation “Neon Genesis
Evangelion” (top) and from the animation “Summer Wars” (bottom).

Introduction
Three key constituents of manga [1].
1. Drawing/絵絵絵絵
2. Language/言葉言葉言葉言葉
3. Panel/コマコマコマコマ
4[1] 夏目房之介 (1997). マンガはなぜ面白いのか―その表現と文法. NHKライブラリー.
點
線
面
Drawing
Panel

Outline
• Part 1: Manga Style Analysis
• Part 2: Comics-based Storytelling
5

Motivation
• As the internet and mobile devices become popular, digital mangas
are widely accessible.
• Different mangas may have different styles. We focus on which
features can be used to distinguish different manga styles.
6

From bounding box of each panel, we
extract features to describe characteristics
of layout.
1) : average panel height
(derived from bounding boxes)
2) : average panel width
3) : standard deviation of
7
Panel Feature Extraction

5) : the ratio of total panel area to the
whole page
6) : average panel area
8) : average slope of vertical panel
boundaries
9) : average slope of horizontal panel
boundaries
8

9
Top row: sample manga pages from three different artists.
Bottom row: panel feature distributions corresponding to these pages.

Screentone Detection
11
• Screentone is a technique for applying textures
and shades to drawings, used as an alternative to hatching.
• Different authors have different habits to use screentone.

Screentone Detection
11
1. Image binarization.
2. Dilation.
3. Delete small areas.
4. Get screentone areas.
5. Extract patches from screentone
areas.

Screentone Feature Extraction
12
• Two screentone features are proposed:
– The ratio of screentone areas to the whole panel area ( ).
– Bag of screentone ( ).
• Gabor wavelet texture
• Use affinity propagation to cluster features, and use the bag of
word model to describe screentone.
B.J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 2007.

Screentone Feature Extraction
13
Top row: sample manga pages from three different artists.
Bottom row: the BoP distributions corresponding to these artists.

Character Detection
14
• Apply the eye detection model in a sliding window manner to detect eyes.
• Expand the areas from eye regions. The big regions extended by all detected eye
regions are then covered by a minimum bounding box, which is finally the
determined character’s head region.

Line Feature Extraction
15
• Canny edge detection
• Edge linking
P. Kovesi, School of Computer Science & Software Engineering, The
University of Western Australia, http://www.csse.uwa.edu.au, 2001.
(1) Face image (2) Canny edge
image
(3) Edge linking (4) Straight line
segmentation.

16
• Included angle between lines ( ): For two spatially adjacent segment
lines, we calculate the included angle between them. The feature can be
represented as a 12-dimensional histogram.
Shonen
Shojo

• Line orientation ( ): Orientation of a line segment is defined as
the included angle between it and the horizontal axis. The feature
can be represented as a 12-dimensional orientation histogram.
17
Mitsuru Adachi
Terajima Yuji

18
• Density of line segments ( ): We calculate the number of lines in its
neighborhood, and the information over all line segments are gathered to
form the feature. It can be represented by 20-dimensional histogram.
Mitsuru Adachi
Terajima Yuji

19
• Orientation of nearby lines ( ): Orientations of a line segment’s
nearby lines are calculated, in the representation of a 12-dimensional
orientation histogram
Mitsuru Adachi
Terajima Yuji

• Number of nearby lines with similar orientation ( ): To a line
segment L, we calculate the number of its nearby lines that have
similar orientation to L. Such information over all line segments is
gathered to form a 20-dimensional histogram.
20
Shonen
Shojo

21
• Line strength varied ( ): We use twenty different threshold
settings for Canny edge detection. The ratio of detection results
to standard result is the feature. It is a 20-
dimensional vector .
Shonen
Shojo

Feature Analysis
23
• Comparison between mangas of different types of magazines.
Shone manga: 3 different mangas, totally 300 pages.
Shojo manga: 3 different mangas, totally 300 pages.
(4) “ I love flowers and Mr.”,
Kumaoka Fuyuyu.
(5) “ The first love honey”,
Minase Ai.
(6) “ From me to you”,
Shiina Karuho.
(1) “Nisekoi”, Komi Naoshi. (2) “ Yamada-kun and the seven
witches”, Miki Yoshikawa.
(3) “ Agatsuma's my daughter”,
Nishikida Keikokorozashi.

Feature Analysis
23
– Comparison between mangas with the same topic but
drawn by different artists.
– Use statistical comparison to analyze the proposed
features.
Baseball manga: 3 different mangas, totally 300 pages.
(1) “Ace of Diamond”, Terajima
Yuji.
(2) “Mix”, Mitsuru Adachi. (3) “Big Windup”,
Mizushima Tsutomu.

Feature Analysis
24
– Comparison between mangas of different types of
magazines.
P-value 0.039 0.414 0.151 0.429 0.017 0.003 0.044 0.000
Shonen Shojo

Feature Analysis
25
• Distance map (Shonen mangas v.s Shojo mangas):
( : 0.017)
( : 0.003)
( : 0.414) ( : 0.429)
shonen
shonen shojo
shojo

Feature Analysis
26
– Comparison between mangas with the same topic but drawn by
different artists.
– P-value:
TY v.s MA 0.037 0.183 0.000 0.277 0.000 0.000 0.6 0.000
TY v.s MT 0.105 0.006 0.075 0.007 0.199 0.074 0.47 0.000
MA v.s MT 0.325 0.091 0.161 0.061 0.011 0.000 0.14 0.000
Terajima Yuji (TY) Mitsuru Adachi (MA) Mizuhima Tsutomu (MT)

Feature Analysis
27
• Spider chart (based on skewness of features):

Feature Analysis
34
• Comparison between mangas of different types of magzines.
– SVM test: 5-fold cross-validation.
– Comparison between mangas with the same topic but drawn by
different artists.
– SVM test: 5-fold cross-validation
accuracy 71.6 61.5 60.5 56.8 70.3 74.5 82 75 79.3 80
TY v.s MA 74.2 64.2 70 62.1 77.8 90.7 90 63.3 71.6 72
TY v.s MT 65 72.8 62.8 72.8 50 69.2 76.1 56.6 88 88
MA v.s MT 71.4 67.1 64.2 68.5 74.2 86.4 86.1 66.6 82 81

Latent Style Model
• Developing a style model based on Latent Dirichlet Allocation (LDA)
to discover style elements.
• Documents can be represented as mixtures of latent topics, where
each topic is formed by a distribution over words.
29
……
1 2 3
Document Topic Word
~
~
, ,
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3,
993-1022.

Latent Style Model
30
Attribute of Latent Dirichlet
Allocation
Text document Latent topics Word
Attribute of Latent Style
Model
Manga pages of the
same artist
Latent style elements Visual word
(manga page)
Given a set of documents , … , ! with the observed visual
words, we can efficiently learn the model by the Gibbs sampling
algorithm.
Style probabilities of a document can be estimated, which enable
us to represent a document as a distribution of style elements.

31
Style Element Distributions
Top: sample manga pages from three different documents.
Bottom: style element distributions corresponding to these documents.

Artists in Dataset 1
32
(F)“天地を喰らう”,本宮ひろ志.
(G)“北斗の拳”,原哲夫.
(H)“魁!!男塾”,宮下あきら.
(C) “うしおととら”,藤田和日郎.
(D)“金色のガッシュ!!”,雷句誠.
(E)“呪法解禁!!”,麻生羽呂.
(A)“Fairy Tail”,真島ヒロ.
(B) “ヤンキー君とメガネちゃん”,吉河美希.
(A) (B) (D)(C) (E)
(F) (G) (H)
100 manga pages from eight different
artists, consisting of totally 800 manga
pages.

34
(B) “ヤンキー君とメガネちゃん”
吉河美希.
(E) “呪法解禁!!”
麻生羽呂.
(G)“北斗の拳”
原哲夫.
Artist Style Element Distributions
Top: sample manga pages from three different artists.
Bottom: style element distributions corresponding to these artists.

Style-Based Art Movement Retrieval
35
Given a query, we would like to retrieve manga documents
produced by artists of the same movement.
0.65
0.7
0.75
0.8
0.85
0.9
10 styles 20 styles 30 styles 40 styles
MAP@10
hist. intersection(line features) chi-square(line features)
hist. intersection(all features) chi-square(all features)
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.7093 0.7152 0.7329 0.7158
line
features
chi square 0.7024 0.719 0.7443 0.7383
all features
histogram
intersection
0.8413 0.8472 0.8483 0.8125
all features chi square 0.8358 0.8518 0.8544 0.8196
MAP@10

Style-Based Artist Retrieval
36
Given an artist’s manga document, we would like to retrieve
other documents produced by the same artist.
0.55
0.6
0.65
0.7
0.75
0.8
0.85
MAP@10
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.6401 0.6404 0.6460 0.6323
line
features
chi square 0.6385 0.6457 0.6541 0.6537
all features
histogram
intersection
0.7627 0.7663 0.7854 0.7654
MAP@10

Artwork Period Retrieval
We take the manga JoJo's Bizarre Adventure for analysis, which is
created by Hirohiko Araki from 1987 to now. Totally 300 pages.
37
ジョジョの奇妙な冒険
Part 3 (1989-1992)
Part 8 (2011-ongoing)
Part 1 (1987)

Sample results of the query and top returned documents.
38

39
Given an artist’s manga document, we would like to retrieve
other documents produced by the same period.
0.5
0.55
0.6
0.65
0.7
0.75
MAP@10
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.5703 0.6247 0.6377 0.6428
line
features
chi square 0.5779 0.6446 0.6581 0.6622
all features
histogram
intersection
0.6321 0.6521 0.6698 0.6751
MAP@10

Summary
• Manga style analysis
– Manga-specific features
– Based on LDA, implicit style elements are discovered in a
probabilistic framework.
– Analysis can be achieved at the style level rather than the
feature level.
• Applications
– Style-based browsing
– Influence discovery
– Relationship between style and other properties
40

Part 2: Comics-Based Storytelling
朱威達
中正大學資訊工程學系
wtchu@ccu.edu.tw
41

Comics-Based Storytelling
• Goal: Develop a systematic framework to enable comics-based
storytelling of temporal image sequences
– Comic design theory
– Formulate core components as optimization problems and
systematically solve them
– Interactivity
42

Challenges
• Q1. How to segment the given temporal image sequence, so that
images in the same subsequence present similar
semantics/events/scenes and are appropriately to be put into the
same comic page?
43

Challenges
• Q2. What is the best layout to arrange panels in the same page?
44
?? ?

Challenges
• Q3. How to place speech balloons, so that important content in
images are not occluded by balloons, and balloons’ positions direct
viewer’s gaze to build a pleasing reading trajectory?
45

Optimized Page Allocation
• Allocate appropriate number of comic pages that may include
various numbers of cells.
– Visual coherence: Consecutive or similar visual content tends to be put into
the same comic page.
– Browsing pace: Keyframes conveying high motion are tended to be put into
the same pages containing more panels to build tense browsing experience.
• A labeling problem, with the temporal continuity constraint
– Solution: Genetic algorithm (GA)
46
1 1 1 2 2 2 2 3 3 4 4 4
Q1. How to segment the given temporal image sequence?

Optimized Page Allocation
47
1 1 1 2 2 2 2 3 3 4 4 4

Objective Function (Fitness)
Page 1
Page 2
Page 3
Page 1
Page 2
Page 1
Page 2
At the 5th iteration
0.7
0.75
0.8
0.85
0.9
0.95
1 11 21 31 41 51 61 71 81 91 101
Best
Average
Worst
5th iteration
20th iteration
90th iteration
48
Iteration
fitness

Optimized Layout Selection
• Desired properties
– More important images should be allocated larger panels
– Keyframes extracted from the same shot or photos
consecutively taken in the same place are better to be put in the
same row of panels
– Keyframes with more subtitle words or photos with more
annotation are to be allocated larger panels.
• Idea
– Determine the images-layout pair that has the most similar
“importance” distributions.
49
Q2. What is the best layout to arrange panels in the same page?

Image Importance
• From each keyframe, the region of interest (ROI) is extracted based
on color contrast [Cheng’11].
• Assume that the keyframes are determined to put
at the same page. The importance value of a keyframe is defined as
ratio of the area
of ROI
ratio of the number of
subtitle words
the minimum color histogram distance
from this frame to other frames
50
M.-M. Cheng, G.-X. Zhang, N.J. Mitra, X. Huang, and S.-M. Hu. “Global contrast based salient region detection.” Proc. of IEEE Conference on Computer Vision
and Pattern Recognition, pp. 409-416, 2011.

Layout Design
51
...
...
...
.........
1 panel layout 2 panels layout 3 panels layout 4 panels layout
5 panel layout 6 panels layout 7 panels layout 8 panels layout

Layout Importance
• Layout importance
• To measure how appropriately a layout matches with the given
image sequence
– Inner product:
52
1/3
1/3
1/3
0.5
0.25 0.25
0.25 0.25
0.5 . . . . . . . .
: the ratio of the area of the jth panel to the area
of the whole page.

Layout Importance
• Binary vectors to show how panels are arranged into rows
– How different panel arrangements fit with shot:
• Importance distribution in terms of numbers of spoken words
– Inner product:
53
1st row
2nd row
3rd row
1st row
2nd row
r1=(01100) r2=(00110) r3=(01000)

1/3
1/3
1/3
0.5
0.25 0.25
0.25 0.25
0.5 . . . . . . . .
Layout Selection
54
r1=(011) r2=(010) r1=(001)
Shot # 1 2 2
Images
q 0 1 0
The best layout is selected by:

Crop
Paste+Resize
Find
ROI
Find
center
Extend
Composition
55

Layout Selection Comparison
56
Example 2: Layout selected by the
proposed method (c) and two
different equally-allocated layouts
(d)(e).
Example 1: Layout selected by the
proposed method (a) and by equal
allocation (b).
(a) (b)
(c) (d) (e)

Balloon Placement
• Optimal positions are determined by jointly considering the
following factors:
– Balloons should not overlap with the regions of interest (ROIs) in images.
– Balloons should be placed as close as the ROI in images.
– When there are multiple balloons in a panel, the sentences spoken earlier
should be placed closer to the left-top corner of the panel. This is to maintain
correct reading order.
– Balloons should not overlap with each other.
– Reading trajectory should be built so that reading order is not only correct
but also vivid.
57
Q3. How to place speech balloons?

Optimized Speech Balloon Placement
• Finally, the five factors are linearly
combined:
• This problem can be intuitively
mapped to the one efficiently solved
by the particle swarm optimization
algorithm (PSO).
58
local region
global region

59
Left: demonstration of PSO in
200 iterations
Right: ROI of comic page
Comparison of balloon placement considering different factors. (a)(c) The placement results if all factors are jointly
considered. (b) The placement result if overlapping between balloons is not taken into account. (d) The placement result
if overlapping between balloons and ROIs is not taken into account.
Optimized Speech Balloon Placement

Summary
61
• We have presented a system that automatically
transforms temporal image sequences into comics-based
storytelling.
– Optimized page allocation
– Optimized layout selection
– Optimized speech balloon placement
• Future work
– ROI analysis techniques specially designed for animation
– Investigation of semantics on automatic comics generation

Questions?
Wei-Ta Chu (朱威達)
National Chung Cheng University
wtchu@ccu.edu.tw
62

由點、線至面：從影像分析角度探討漫畫的組成與風格－朱威達

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie 由點、線至面：從影像分析角度探討漫畫的組成與風格－朱威達

Ähnlich wie 由點、線至面：從影像分析角度探討漫畫的組成與風格－朱威達 (20)

Mehr von 台灣資料科學年會

Mehr von 台灣資料科學年會 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

由點、線至面：從影像分析角度探討漫畫的組成與風格－朱威達