2. Fair Use Declaration
• This statement is submitted for elaborating the legitimate status for illustrating all the “Screen Printings” and
“Comics” in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” are cited under the doctrine of “Fair
Use” for research purpose if copyright protection applies on them.
• The legal doctrine establishes globally that originality is needed to be seen for a work pursuing copyright protection,
namely, the originality is the very essence of creation in intellectual domain. On account of that, an automatically
recorded screen motion of the interactive computer games can be deemed as no copyright protection on it,
therefore it can be lawfully applied in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” as part of
the research materials without the written permission from the copyright owner of the computer games. However,
some people might treat them as copyright protected materials still for the drawings or similar creations in the
background of the animations or comics, if that applies, according to the international intellectual property
agreements and copyright law in respective jurisdictions, such as Agreement on Trade-Related Aspects of
Intellectual Property Rights (TRIPS) article 13, Berne Convention for the Protection of Literary and Artistic Works
article 9(2), EU Copyright Directive article 5(5), Copyright Law of the United States of America section 107 and
Taiwan Copyright Act article 65, the Fair Use and Fair Dealing of a copyrighted work based on teaching, scholarship,
or research shall applied under the circumstances to sustain the citation for all the “Screen Printings“ and “Comics”
in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” as legitimate action abided by at law, which do
not conflict with a normal exploitation of the works and do not unreasonably prejudice the legitimate interests of
the right holders.
2
3. Introduction
• Comics-based presentation for
movie, animation, and photos,
emerges recently.
• Comics are believed to be an ideal
medium for visual storytelling
because of rich expressivity, high
interactivity, and high portability.
3
Sample generated comic pages from the animation “Neon Genesis
Evangelion” (top) and from the animation “Summer Wars” (bottom).
4. Introduction
Three key constituents of manga [1].
1. Drawing/絵絵絵絵
2. Language/言葉言葉言葉言葉
3. Panel/コマコマコマコマ
4[1] 夏目 房之介 (1997). マンガはなぜ面白いのか―その表現と文法. NHKライブラリー.
點
線
面
Drawing
Panel
5. Outline
• Part 1: Manga Style Analysis
• Part 2: Comics-based Storytelling
5
6. Motivation
• As the internet and mobile devices become popular, digital mangas
are widely accessible.
• Different mangas may have different styles. We focus on which
features can be used to distinguish different manga styles.
6
7. From bounding box of each panel, we
extract features to describe characteristics
of layout.
1) : average panel height
(derived from bounding boxes)
2) : average panel width
3) : standard deviation of
4) : standard deviation of
7
Panel Feature Extraction
8. 5) : the ratio of total panel area to the
whole page
6) : average panel area
7) : standard deviation of
8) : average slope of vertical panel
boundaries
9) : average slope of horizontal panel
boundaries
10) : standard deviation of
11) : standard deviation of
8
Panel Feature Extraction
9. 9
Panel Feature Extraction
Top row: sample manga pages from three different artists.
Bottom row: panel feature distributions corresponding to these pages.
10. Screentone Detection
11
• Screentone is a technique for applying textures
and shades to drawings, used as an alternative to hatching.
• Different authors have different habits to use screentone.
11. Screentone Detection
11
1. Image binarization.
2. Dilation.
3. Delete small areas.
4. Get screentone areas.
5. Extract patches from screentone
areas.
12. Screentone Feature Extraction
12
• Two screentone features are proposed:
– The ratio of screentone areas to the whole panel area ( ).
– Bag of screentone ( ).
• Gabor wavelet texture
• Use affinity propagation to cluster features, and use the bag of
word model to describe screentone.
B.J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 2007.
13. Screentone Feature Extraction
13
Top row: sample manga pages from three different artists.
Bottom row: the BoP distributions corresponding to these artists.
14. Character Detection
14
• Apply the eye detection model in a sliding window manner to detect eyes.
• Expand the areas from eye regions. The big regions extended by all detected eye
regions are then covered by a minimum bounding box, which is finally the
determined character’s head region.
15. Line Feature Extraction
15
• Canny edge detection
• Edge linking
P. Kovesi, School of Computer Science & Software Engineering, The
University of Western Australia, http://www.csse.uwa.edu.au, 2001.
(1) Face image (2) Canny edge
image
(3) Edge linking (4) Straight line
segmentation.
16. Line Feature Extraction
16
• Included angle between lines ( ): For two spatially adjacent segment
lines, we calculate the included angle between them. The feature can be
represented as a 12-dimensional histogram.
Shonen
Shojo
17. Line Feature Extraction
• Line orientation ( ): Orientation of a line segment is defined as
the included angle between it and the horizontal axis. The feature
can be represented as a 12-dimensional orientation histogram.
17
Mitsuru Adachi
Terajima Yuji
18. Line Feature Extraction
18
• Density of line segments ( ): We calculate the number of lines in its
neighborhood, and the information over all line segments are gathered to
form the feature. It can be represented by 20-dimensional histogram.
Mitsuru Adachi
Terajima Yuji
19. Line Feature Extraction
19
• Orientation of nearby lines ( ): Orientations of a line segment’s
nearby lines are calculated, in the representation of a 12-dimensional
orientation histogram
Mitsuru Adachi
Terajima Yuji
20. Line Feature Extraction
• Number of nearby lines with similar orientation ( ): To a line
segment L, we calculate the number of its nearby lines that have
similar orientation to L. Such information over all line segments is
gathered to form a 20-dimensional histogram.
20
Shonen
Shojo
21. Line Feature Extraction
21
• Line strength varied ( ): We use twenty different threshold
settings for Canny edge detection. The ratio of detection results
to standard result is the feature. It is a 20-
dimensional vector .
Shonen
Shojo
22. Feature Analysis
23
• Comparison between mangas of different types of magazines.
Shone manga: 3 different mangas, totally 300 pages.
Shojo manga: 3 different mangas, totally 300 pages.
(4) “ I love flowers and Mr.”,
Kumaoka Fuyuyu.
(5) “ The first love honey”,
Minase Ai.
(6) “ From me to you”,
Shiina Karuho.
(1) “Nisekoi”, Komi Naoshi. (2) “ Yamada-kun and the seven
witches”, Miki Yoshikawa.
(3) “ Agatsuma's my daughter”,
Nishikida Keikokorozashi.
23. Feature Analysis
23
– Comparison between mangas with the same topic but
drawn by different artists.
– Use statistical comparison to analyze the proposed
features.
Baseball manga: 3 different mangas, totally 300 pages.
(1) “Ace of Diamond”, Terajima
Yuji.
(2) “Mix”, Mitsuru Adachi. (3) “Big Windup”,
Mizushima Tsutomu.
24. Feature Analysis
24
– Comparison between mangas of different types of
magazines.
P-value 0.039 0.414 0.151 0.429 0.017 0.003 0.044 0.000
Shonen Shojo
28. Feature Analysis
34
• Comparison between mangas of different types of magzines.
– SVM test: 5-fold cross-validation.
– Comparison between mangas with the same topic but drawn by
different artists.
– SVM test: 5-fold cross-validation
accuracy 71.6 61.5 60.5 56.8 70.3 74.5 82 75 79.3 80
TY v.s MA 74.2 64.2 70 62.1 77.8 90.7 90 63.3 71.6 72
TY v.s MT 65 72.8 62.8 72.8 50 69.2 76.1 56.6 88 88
MA v.s MT 71.4 67.1 64.2 68.5 74.2 86.4 86.1 66.6 82 81
29. Latent Style Model
• Developing a style model based on Latent Dirichlet Allocation (LDA)
to discover style elements.
• Documents can be represented as mixtures of latent topics, where
each topic is formed by a distribution over words.
29
……
1 2 3
Document Topic Word
~
~
, ,
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3,
993-1022.
30. Latent Style Model
30
Attribute of Latent Dirichlet
Allocation
Text document Latent topics Word
Attribute of Latent Style
Model
Manga pages of the
same artist
Latent style elements Visual word
(manga page)
Given a set of documents , … , ! with the observed visual
words, we can efficiently learn the model by the Gibbs sampling
algorithm.
Style probabilities of a document can be estimated, which enable
us to represent a document as a distribution of style elements.
31. 31
Style Element Distributions
Top: sample manga pages from three different documents.
Bottom: style element distributions corresponding to these documents.
32. Artists in Dataset 1
32
(F)“天地を喰らう”,本宮 ひろ志.
(G)“北斗の拳”,原 哲夫.
(H)“魁!!男塾”,宮下 あきら.
(C) “うしおととら”,藤田 和日郎.
(D)“金色のガッシュ!!”,雷句 誠.
(E)“呪法解禁!!”,麻生 羽呂.
(A)“Fairy Tail”,真島 ヒロ.
(B) “ヤンキー君とメガネちゃん”,吉河 美希.
(A) (B) (D)(C) (E)
(F) (G) (H)
100 manga pages from eight different
artists, consisting of totally 800 manga
pages.
34. 34
(B) “ヤンキー君とメガネちゃん”
吉河 美希.
(E) “呪法解禁!!”
麻生 羽呂.
(G)“北斗の拳”
原 哲夫.
Artist Style Element Distributions
Top: sample manga pages from three different artists.
Bottom: style element distributions corresponding to these artists.
35. Style-Based Art Movement Retrieval
35
Given a query, we would like to retrieve manga documents
produced by artists of the same movement.
0.65
0.7
0.75
0.8
0.85
0.9
10 styles 20 styles 30 styles 40 styles
MAP@10
hist. intersection(line features) chi-square(line features)
hist. intersection(all features) chi-square(all features)
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.7093 0.7152 0.7329 0.7158
line
features
chi square 0.7024 0.719 0.7443 0.7383
all features
histogram
intersection
0.8413 0.8472 0.8483 0.8125
all features chi square 0.8358 0.8518 0.8544 0.8196
MAP@10
36. Style-Based Artist Retrieval
36
Given an artist’s manga document, we would like to retrieve
other documents produced by the same artist.
0.55
0.6
0.65
0.7
0.75
0.8
0.85
10 styles 20 styles 30 styles 40 styles
MAP@10
hist. intersection(line features) chi-square(line features)
hist. intersection(all features) chi-square(all features)
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.6401 0.6404 0.6460 0.6323
line
features
chi square 0.6385 0.6457 0.6541 0.6537
all features
histogram
intersection
0.7627 0.7663 0.7854 0.7654
all features chi square 0.7470 0.7553 0.7939 0.7824
MAP@10
37. Artwork Period Retrieval
We take the manga JoJo's Bizarre Adventure for analysis, which is
created by Hirohiko Araki from 1987 to now. Totally 300 pages.
37
ジョジョの奇妙な冒険
Part 3 (1989-1992)
ジョジョの奇妙な冒険
Part 8 (2011-ongoing)
ジョジョの奇妙な冒険
Part 1 (1987)
38. Sample results of the query and top returned documents.
38
Artwork Period Retrieval
39. 39
Given an artist’s manga document, we would like to retrieve
other documents produced by the same period.
0.5
0.55
0.6
0.65
0.7
0.75
10 styles 20 styles 30 styles 40 styles
MAP@10
hist. intersection(line features) chi-square(line features)
hist. intersection(all features) chi-square(all features)
distance
measure
10
styles
20
styles
30 styles 40 styles
line
features
histogram
intersection
0.5703 0.6247 0.6377 0.6428
line
features
chi square 0.5779 0.6446 0.6581 0.6622
all features
histogram
intersection
0.6321 0.6521 0.6698 0.6751
all features chi square 0.6376 0.6641 0.6781 0.6899
MAP@10
Artwork Period Retrieval
40. Summary
• Manga style analysis
– Manga-specific features
– Based on LDA, implicit style elements are discovered in a
probabilistic framework.
– Analysis can be achieved at the style level rather than the
feature level.
• Applications
– Style-based browsing
– Influence discovery
– Relationship between style and other properties
40
42. Comics-Based Storytelling
• Goal: Develop a systematic framework to enable comics-based
storytelling of temporal image sequences
– Comic design theory
– Formulate core components as optimization problems and
systematically solve them
– Interactivity
42
43. Challenges
• Q1. How to segment the given temporal image sequence, so that
images in the same subsequence present similar
semantics/events/scenes and are appropriately to be put into the
same comic page?
43
44. Challenges
• Q2. What is the best layout to arrange panels in the same page?
44
?? ?
45. Challenges
• Q3. How to place speech balloons, so that important content in
images are not occluded by balloons, and balloons’ positions direct
viewer’s gaze to build a pleasing reading trajectory?
45
46. Optimized Page Allocation
• Allocate appropriate number of comic pages that may include
various numbers of cells.
– Visual coherence: Consecutive or similar visual content tends to be put into
the same comic page.
– Browsing pace: Keyframes conveying high motion are tended to be put into
the same pages containing more panels to build tense browsing experience.
• A labeling problem, with the temporal continuity constraint
– Solution: Genetic algorithm (GA)
46
1 1 1 2 2 2 2 3 3 4 4 4
Q1. How to segment the given temporal image sequence?
48. Objective Function (Fitness)
Page 1
Page 2
Page 3
Page 1
Page 2
Page 1
Page 2
At the 5th iteration
At the 20th iteration
At the 90th iteration
0.7
0.75
0.8
0.85
0.9
0.95
1 11 21 31 41 51 61 71 81 91 101
Best
Average
Worst
5th iteration
20th iteration
90th iteration
48
Iteration
fitness
49. Optimized Layout Selection
• Desired properties
– More important images should be allocated larger panels
– Keyframes extracted from the same shot or photos
consecutively taken in the same place are better to be put in the
same row of panels
– Keyframes with more subtitle words or photos with more
annotation are to be allocated larger panels.
• Idea
– Determine the images-layout pair that has the most similar
“importance” distributions.
49
Q2. What is the best layout to arrange panels in the same page?
50. Image Importance
• From each keyframe, the region of interest (ROI) is extracted based
on color contrast [Cheng’11].
• Assume that the keyframes are determined to put
at the same page. The importance value of a keyframe is defined as
ratio of the area
of ROI
ratio of the number of
subtitle words
the minimum color histogram distance
from this frame to other frames
50
M.-M. Cheng, G.-X. Zhang, N.J. Mitra, X. Huang, and S.-M. Hu. “Global contrast based salient region detection.” Proc. of IEEE Conference on Computer Vision
and Pattern Recognition, pp. 409-416, 2011.
52. Layout Importance
• Layout importance
• To measure how appropriately a layout matches with the given
image sequence
– Inner product:
52
1/3
1/3
1/3
0.5
0.25 0.25
0.25 0.25
0.5 . . . . . . . .
: the ratio of the area of the jth panel to the area
of the whole page.
53. Layout Importance
• Binary vectors to show how panels are arranged into rows
– How different panel arrangements fit with shot:
• Importance distribution in terms of numbers of spoken words
– Inner product:
53
1st row
2nd row
3rd row
1st row
2nd row
r1=(01100) r2=(00110) r3=(01000)
56. Layout Selection Comparison
56
Example 2: Layout selected by the
proposed method (c) and two
different equally-allocated layouts
(d)(e).
Example 1: Layout selected by the
proposed method (a) and by equal
allocation (b).
(a) (b)
(c) (d) (e)
57. Balloon Placement
• Optimal positions are determined by jointly considering the
following factors:
– Balloons should not overlap with the regions of interest (ROIs) in images.
– Balloons should be placed as close as the ROI in images.
– When there are multiple balloons in a panel, the sentences spoken earlier
should be placed closer to the left-top corner of the panel. This is to maintain
correct reading order.
– Balloons should not overlap with each other.
– Reading trajectory should be built so that reading order is not only correct
but also vivid.
57
Q3. How to place speech balloons?
58. Optimized Speech Balloon Placement
• Finally, the five factors are linearly
combined:
• This problem can be intuitively
mapped to the one efficiently solved
by the particle swarm optimization
algorithm (PSO).
58
local region
global region
59. 59
Left: demonstration of PSO in
200 iterations
Right: ROI of comic page
Comparison of balloon placement considering different factors. (a)(c) The placement results if all factors are jointly
considered. (b) The placement result if overlapping between balloons is not taken into account. (d) The placement result
if overlapping between balloons and ROIs is not taken into account.
Optimized Speech Balloon Placement
61. Summary
61
• We have presented a system that automatically
transforms temporal image sequences into comics-based
storytelling.
– Optimized page allocation
– Optimized layout selection
– Optimized speech balloon placement
• Future work
– ROI analysis techniques specially designed for animation
– Investigation of semantics on automatic comics generation