Applied research of an automated method to quickly emerge better encoding profiles, using Genetic Algorithms with many source video samples to converge to the specified limits of processing time, video quality and file size.
Optimization of Video Compression Parameters through Genetic Algorithms (2008)
1. Optimization of Video Compression Parameters through
Genetic Algorithms
Carlo "zED" Caputo
globo.com – webmedia
Avenida das Americas, 500, Bl. 18, Sala 103
Rio de Janeiro, Brazil
carlo.caputo@corp.globo.com
ABSTRACT 2. METHOD
Applied research of an automated method to quickly emerge Given the difficulty to optimize more than one variable [2]
better encoding profiles, using Genetic Algorithms with many and the short time we had to implement a solution, it seemed
source video samples to converge to the specified limits of necessary to use an evolutionary method to make the nec-
processing time, video quality and file size. essary automation possible.
Genetic Algorithms were chosen for the first prototype, us-
Categories and Subject Descriptors ing two-point crossover and roulette wheel selection – to
I.4.2 [Data]: Image Processing and Computer Vision—Com- avoid to be stuck with at a local maxima. So most of the
pression (Coding); I.2.m [Computing Methodologies]: work were employed in finding the appropriate variables for
Artificial Intelligence—Miscellaneous, Evolutionary comput- to forge an adequate fitness function.
ing and genetic algorithms; C.2.4 [Computer Systems
Organization]: Computer-Communication Networks—Dis-
tributed Systems
2.1 Encoding Parameters to Optimize
The worst problem were the difficulty to find the relation
between the encoder tool parameters [1] and the final vari-
Keywords ables to optimize. A raw search would explore a domain of
Video Compression, Encoding Profile, 2-Pass Encoding, Bi- 3.18981e+78 combinations in 91 parameters (for 1st and 2nd
trate, ffmpeg, Peak Signal Noise Ratio (PSNR), Structural pass), whose values could usually range from 0 to 1 or even
SIMilarity (SSIM), Genetic Algorithms, Two-point Crossover, from 0 to millions, but that in our case were represented by
Roulette Wheel Selection, Global Optimization, Local Max- a discrete sample of meaningful values.
ima, Master-Worker Paradigm, Distributed Encoding, UDP
Broadcast, Webmedia 2.2 Evaluation Variables to Optimize
The result of encoding would be evaluated into the following
1. INTRODUCTION few variables that would be optimized without worrying to
Traditionally the encoding profile tunning has being per- keep track of their complex relation to the overwhelming
formed as a time-consuming craft, by manual tinkering of number of encoding parameters.
the parameters and intense observation. In this process the
artisan should have a good eye to spot what each parameter 2.2.1 Size of Encoded Video File (bitrate)
caused on the encoded video; such labor were quite mislead- The main variable is bitrate, either for the network or storage
ing and error prone, because some combinations, despite restrictions, that can be a great concern for those who deal
bringing a little quality improvement, could cause a great with a large audience or production.
slow down, or a seemingly harmless changes could crash the
encoder for some unusual source video. In any circumstances, it would not make sense to compare
video quality without controlling the bitrate.
As there weren’t much time, nor interest, to understand
what parameters produced the desired result, I had to con- 2.2.2 Time of Encoding (timerate)
ceive a tool capable of accelerating the research of profiles This variable really changes from profile to profile and were
under the ever changing technological restrictions. Some- the one hardest to optimize by hand; it’s easy to achieve
thing that mixed smart fine tunning of the encoded videos, great quality with an extremely long processing time, but it’s
with robust stress testing of the profile under the encoder of very hard to make it very good with a very short encoding
choice [1]. time.
As I am not against the manual tinkering and believe that This variable were called timerate, for short, and calculated
it’s the only way to reach a truly pleasant video quality, using encoding time over duration of the encoded video.
I also believe that a profile must match the source being
encoded and ideally each source video must have a specific At first, on the single-processed prototype, wall-clock time
profile fine tunned just for it. Well, clearly it’s not a task were used as encoding time and later, as there were multiple
for human endeavor. machines with many processes each, CPU time were used.
2. 2.2.3 Picture Quality (PSNR) fact, the more the profile was tested, the surer one
After controlling bitrate and timerate, there is the video qual- can be that the fitness is valid. It is necessary for two
ity, roughly represented by the PSNR of the encoded frames reasons: (a) there is an optimization that reduces the
against the equivalent source ones. In our case, a great ad- duration of the source video tested to speedup the evo-
vantage of using PSNR is that the encoder had it built-in and lution; (b) even if there is an error in the middle of the
the evaluation cost were almost insignificant, but there are processing the profiles that went farther are benefited.
problems associated with this method, specially when op-
erations of different nature were applied to the frames (like • Minimize the errors and warnings – in a similar spirit of
the blurriness noted below, on the problem between scaling the above, profiles are penalized by the amount of error
and compression), if this becomes unacceptable an options and warning messages. Either to minimize problems
is to implement an independent SSIM evaluation [4]. on working profiles or to make broken ones to approach
a working state.
But in all cases, to use it in the fitness formula the PSNR of
• Minimize the number of parameters – slightly avoid
each video frame must be combined into one single number;
useless combinations of parameters that would not bring
average them will not do, because small slips in each frame
any improvement and may raise the risk of instability,
would look the same as a huge damage in few frames, and the
by bringing seldom tested options into the profiles.
latter is very undesirable. So as the intra-frame evaluation
used PSNR the inter-frame must use it as well, let’s call it
PSNR’ and define as follows: 2.3 Hacks to Speedup
n−1 • Abort soon if too slow – monitor encodings on-the-
1 X
fly and abort, if it is taking longer than minimally
M SE (S, E) = max (0, P SN Re − P SN R (S(k), E(k)))
n acceptable for the total time of encoding expected.
k=0
! • One encode per individual tested – pick at random only
P SN Re
P SN R (S, E) = 20 · log 10 p one source video to process for each individual, since
M SE (S, E) processing all of them would make the evolution many
S(k) and E(k) are the kth source and encoded frames; n times slower. Happened, as expected, that good pro-
are the number of video frames encoded; M SE (S, E) is the files appeared in genes of many individuals and were
mean square root of the encoded frames against the source verified against multiple sources, which made them us-
ones; P SN Re , or target PSNR, is the maximum expected able globally.
PSNR for each frame.
• Process small bits at beginning – encode only a small
part of each source video at the beginning of the evo-
In other words, the encoded video’s PSNR’ is the PSNR of
lution, when the entropy is higher and there is a lot
each frame’s PSNR against a target PSNR.
of wasted processing with completely broken profiles.
Then, as the fitness raises, encode longer segments of
It could be noted that, if multiple source videos were being
the source videos to fine tune the surviving profiles.
encoded for each profile tested, it would be necessary to
pick the PSNR’ from each video and generate a PSNR” in a • Profile injection – profiles could be injected on-the-fly
similar way than before, and this new variable is the one that at the beginning of any generation, including the first
should be used on the fitness calculation. In this project it one, which needs it most. No process had to be stopped
didn’t happened because of an optimization described below. for this; a file with the command-line of the encoding
tool should be placed on a watch folder and the con-
Also should be noted that, to compare source and encoded trol process would map it’s parameters to genes of new
frames using PSNR, both must have the same number of individuals. It’s very streamlined, because the state
pixels. And, since the resize must happen before this com- of each generation is also stored as the command-line
parison, the scale method – which have many parameters of each individual commented with the fitness value.
itself – is not taken in account by the PSNR calculation. So, This way, even changes on the definition of the genetic
if those parameters are being optimized, the evolution will strip does not break the gene pool, because it’s state
make sure the scaling method used make the smaller amount can be loaded as usual, mapping the parameters to the
of compression artifacts, for this reason the blurriest scaler appropriate genes. For this reason, injections can hap-
will be automatically selected. A possible solution to this pen much smoothly for profiles foreign to this system
problem is to use the best scaler you have to generate the (e.g., exchanged on video forums). Usually to inject a
reference frames as close as possible to the source ones. good profile on the gene pool would bring it’s qualities
to many individuals on subsequent generations, being
2.2.4 Other Optimizations it a faster processing, better quality or more stability.
• Have a working profile, above all – it’s critical to distin-
guish whose profiles are working, for it there were two
implicit bands for fitness values: (a) 0 ≤ fitness < 10
2.4 Distributed Processing
The first prototype could only run as a single process, and
for profiles that broke at some point of the process and
despite it had greatly improved the testing of candidate pro-
(b) fitness ≥ 10 for those that processed all the way
files for one source video, the result profile were only capable
to the end of the requested duration.
of encoding that video well. This behavior was expected, but
• Maximize duration of source encoded – the longer the to overcome it the gene pool had to be evolved with mul-
source video processed the higher the fitness value; in tiple source videos. That would use a lot more processing
3. power per generation and add a lot of entropy, requiring Finally, fitness of a working individual:
to raise the population limit per generation, to avoid loos-
ing valuable genes in chaos. Again, more individuals means fi =10+ (1)
even more processing power, so we had to quickly integrated min(0.5 + (fg /fe ), 1)· (2)
the evolution control process with distributed workers [3] to pi · (3)
perform the encoding. 2
min(be /bi , 1) · (4)
2
At start, every worker process binds an UDP port and keep min(te /ti , 1) · (5)
standing-by, waiting for the control process to broadcast a „„ « «
clamp(te /2, ti , te )
job offering. Upon the offer a simple protocol verifies that 1−2· · 0.001 + 0.999 · (6)
only one worker get the job. This way, the control process te
„„ « «
generated new individuals and asked for the workers to per- gi
1− · 0.0001 + 0.9999 (7)
form the evaluation by giving them the profiles with the ge
command-line of the encoder – much like they would be in
production –, and waiting for the processing log, that had pe , be and te are the PSNR’, bitrate and timerate targets for
all the information necessary to compose the individual fit- the evolution; the starting value of 10 is the base fitness for
ness. All the job control is handled by UDP, but the videos, working profiles; fg is the best fitness on the last closed gen-
profiles and logs are transfered through common LAN file eration and fe is the target fitness of the whole evolution,
sharing. The whole design is in such fashion that to plug or then fg /fe represents how mature the evolution is so that
unplug workers on-the-fly would not compromise the evalu- no steep turn in fitness are accepted, this may avoid local
ation of any individual on the gene pool. maxima, specially because the start of the evolution is pro-
cessed with smaller samples and those fitness values worth
less (see tlg , below); gi is the number of genes (parameters)
2.5 Evolution that assumed non-default values and ge the total number of
Assuming that there are values in a gene, genes in an in- genes per individuals, in this run.
dividual, individuals in a generation and generations in an
evolution (v ∈ g ∈ I ∈ G ∈ E). In other words, each gene In case of fatal errors the fitness got a special value:
g holds a position on the genetic strip of the individual I
and can assume some values v preselected from a reasonable fi =1 + 9· (8)
range, to reduce the search space. At the end of each gen- min(0.5 + (fg /fe ), 1)· (9)
eration G, the number of individuals must be trimmed to a 1
maximum population. This is accomplished by sorting the · (10)
1 + errorsi /99
individuals by the value of the fitness function and discard-
pi
ing the less fit. The target at each generation is to select · (11)
the fittest individuals, to determine the highest fitness of pe
the generation (fg ) and ultimately achieve the determined min(progressi /durationi ), 1) (12)
target fitness of the whole evolution (fe ). At the beginning
errorsi are the number of errors and warnings received from
it loads the individuals from the saved gene pool file, or gen-
the encoder (fatal errors add 99 to this value and warnings
erate them randomly, to fill the maximum population. And
add only 1); progressi is how much of the requested work
after that, it looks for some profiles in the injection watch
was completed – it were designed to work with 2-pass en-
folder, as it does at every generation start.
codings, so the 1st pass is accounted as 25% of the whole
progress, and the 2nd pass start on 25% and goes through
2.5.1 Fitness Function 100%.
The design of the fitness function were highly empirical, so,
to simplify it’s formulation – along with the well known
2.5.2 Generation End
max() and min() – the following helper function and vari-
At this point we know which is the most fitted individual of
ables were used:
8 the generation (fg ), the population can be trimmed to the
< a (x < a) maximum allowed and it’s gene pool can be saved to disk.
clamp (a, x, b) = b (x > b) But besides that, some variables have to be readjusted:
: x (a ≤ x ≤ b)
• timelimit – duration of the source videos to be evalu-
pi = min(P SN R (Si , Ei ), pe )
ated per individual profile. As seen before, fg /fe reg-
„ „ «« ulates the proximity to the end of the evolution, so the
sizei · 8 advance of this value is a smooth interactive process,
bi = max 0.001,
durationi · 1024 grows from tl0 to tl1 :
ti = max(0.001, cputimei /durationi ) tlg = 1 + tl0 + min(fg /fe , 1) ∗ (tl1 − tl0 − 1)
Si are the frames of the source video sample; Ei are the • timeout timerate – this is the variable that kill the en-
encoded frames of that video; sizei is the encoded file size (in coding if it’s taking too long to process. Based on the
bytes); durationi is the duration (in seconds) of the encoded same fg /fe , it shrinks from tt0 to tt1 :
video; cputimei is the time that the process took to execute
and pe is the target PSNR’. ttg = tt0 + min(fg /fe , 1) ∗ (tt1 − tt0 )
4. 2.7 Perceptual Evaluation
It’s strongly recommended that at any point of the evolution
the encoded videos are verified by eye and adjusts applied
to the gene structure or the automated evaluation method.
And the final selection of profiles should be tested against
as many sources as possible, on the types of content that
will be used in production (e.g., baseball games, soap op-
eras, news reports or all of them, for general usage). For
that, the final test were performed using the winner profile
to encode 675 source videos – representing an average daily
production. All the encoded videos had their resulting bi-
trate and timerate verified and they were played-back in loop
on the monitor screens around the work area for scrutiny of
all Webmedia personnel.
3. CONCLUSION
Genetic Algorithms can quickly bring great improvement on
the quality of hand-crafted video profiles; from the design to
the end of the first run, it took less than two weeks.
But the clear disadvantage is that the profile can fail badly
with sources outside the trained content type. On the other
side, using too much video sources to train it, would not
Figure 1: evolution chart displaying a normalized only slow down the process, but can also generate mediocre
pi
combination of the main variables optimized ( bi ·ti ) profiles. A better solution is to create a profile for every
different source type, like every TV show or every different
over about 5 days
sport game.
It is important to be noted that the greatest benefit of the
2.6 Reference Run first run of this project were to successfully bring the legacy
The Genetic Algorithms had crossover rate at 80%, mutation codec – Sorenson Spark, that had a more widespread adop-
at 2.5% and population pruning to 250 individuals. The tion among our user-base – on par with a newer technology
basic targets and ranges were set like this: target bitrate, – On2 VP6, that only had proprietary tools which did not
on this reference run, were 600kbps; target timerate were integrate well on our production system. This brought us
2, so CPU time could be around two times the duration of time for the stabilization of the technology and the adoption
the encoded video; target PSNR were set to 40db, above of H.264 as the standard encoding, instead of lesser, propri-
which any improvement would not be significant, and target etary alternatives that pressed quality improvements in the
PSNR’ to 60; target fitness were 70, target PSNR’ plus the gap between the standardization of the convergence codec
base fitness for working profiles (10); timelimit ranges from and the stability of the legacy one.
5 to 45 and timeout timerate ranges from 50 to 2, from the
loosest to the strictest timeout policy, at the end. Finally, for the organization, it brings the security of charted
evolution, above the insecurity of sole subjective evaluation.
The evolution started with one process and finished on six
machines, using a total of thirteen 3GHz cores. One ma- 4. ACKNOWLEDGMENTS
chine were used for the genetic control and the others, with The distributed processing would not be possible without
two worker processes each, dedicated to encoding. More the help of Fernando Luiz Valente de Souza.
processes were plugged on-the-fly, as the machines became
available. And, at the beginning, there were injected the
best quality profile (extremely slow to encode), the fastest
5. REFERENCES
[1] F. Bellard. FFmpeg Documentation.
one (of subpar quality) and the one used in production (more
http://www.ffmpeg.org/, 2004-2008.
stable). Finally, the winner choice took in account the pro-
files that had the better fitness for the most source videos. [2] D. E. G. K. Deb and J. Horn. Genetic algorithms. In in
Search, Optimization, and Machine Learning.
Many breakthroughs are distinguishable on the progress chart Addison-Wesley, 1989.
(Figure 1), some associated with changes in the circum- [3] J. pierre Goux; Jeff Linderoth and M. Yoder.
stances described below, by the birth count: at the begin- Metacomputing and the master-worker paradigm. In
ning there were one process, only one source video were eval- Preprint MCS/ANL-P792-0200, Mathematics and
uated and the bitrate constraint was 420kbps; at the indi- Computer Science Division, Argonne National
vidual 4000 it changed to 600kbps; at 7000 many machines Laboratory, Argonne, 2000.
entered the game and there were several changes on the code; [4] Z. Wang, A. C. Bovik, H. R. Sheikh, S. Member, E. P.
at 20000 the fitness function changed to allow working pro- Simoncelli, and S. Member. Image quality assessment:
files with warnings to evolve freely, as they can work sur- From error visibility to structural similarity. IEEE
prisingly well in some cases. Transactions on Image Processing, 13:600–612, 2004.