SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Optimization of Video Compression Parameters through
                       Genetic Algorithms

                                                  Carlo "zED" Caputo
                                                globo.com – webmedia
                                      Avenida das Americas, 500, Bl. 18, Sala 103
                                                 Rio de Janeiro, Brazil
                                           carlo.caputo@corp.globo.com

ABSTRACT                                                        2.    METHOD
Applied research of an automated method to quickly emerge       Given the difficulty to optimize more than one variable [2]
better encoding profiles, using Genetic Algorithms with many     and the short time we had to implement a solution, it seemed
source video samples to converge to the specified limits of      necessary to use an evolutionary method to make the nec-
processing time, video quality and file size.                    essary automation possible.

                                                                Genetic Algorithms were chosen for the first prototype, us-
Categories and Subject Descriptors                              ing two-point crossover and roulette wheel selection – to
I.4.2 [Data]: Image Processing and Computer Vision—Com-         avoid to be stuck with at a local maxima. So most of the
pression (Coding); I.2.m [Computing Methodologies]:             work were employed in finding the appropriate variables for
Artificial Intelligence—Miscellaneous, Evolutionary comput-      to forge an adequate fitness function.
ing and genetic algorithms; C.2.4 [Computer Systems
Organization]: Computer-Communication Networks—Dis-
tributed Systems
                                                                2.1    Encoding Parameters to Optimize
                                                                The worst problem were the difficulty to find the relation
                                                                between the encoder tool parameters [1] and the final vari-
Keywords                                                        ables to optimize. A raw search would explore a domain of
Video Compression, Encoding Profile, 2-Pass Encoding, Bi-        3.18981e+78 combinations in 91 parameters (for 1st and 2nd
trate, ffmpeg, Peak Signal Noise Ratio (PSNR), Structural        pass), whose values could usually range from 0 to 1 or even
SIMilarity (SSIM), Genetic Algorithms, Two-point Crossover,     from 0 to millions, but that in our case were represented by
Roulette Wheel Selection, Global Optimization, Local Max-       a discrete sample of meaningful values.
ima, Master-Worker Paradigm, Distributed Encoding, UDP
Broadcast, Webmedia                                             2.2    Evaluation Variables to Optimize
                                                                The result of encoding would be evaluated into the following
1.   INTRODUCTION                                               few variables that would be optimized without worrying to
Traditionally the encoding profile tunning has being per-        keep track of their complex relation to the overwhelming
formed as a time-consuming craft, by manual tinkering of        number of encoding parameters.
the parameters and intense observation. In this process the
artisan should have a good eye to spot what each parameter       2.2.1 Size of Encoded Video File (bitrate)
caused on the encoded video; such labor were quite mislead-     The main variable is bitrate, either for the network or storage
ing and error prone, because some combinations, despite         restrictions, that can be a great concern for those who deal
bringing a little quality improvement, could cause a great      with a large audience or production.
slow down, or a seemingly harmless changes could crash the
encoder for some unusual source video.                          In any circumstances, it would not make sense to compare
                                                                video quality without controlling the bitrate.
As there weren’t much time, nor interest, to understand
what parameters produced the desired result, I had to con-       2.2.2 Time of Encoding (timerate)
ceive a tool capable of accelerating the research of profiles    This variable really changes from profile to profile and were
under the ever changing technological restrictions. Some-       the one hardest to optimize by hand; it’s easy to achieve
thing that mixed smart fine tunning of the encoded videos,       great quality with an extremely long processing time, but it’s
with robust stress testing of the profile under the encoder of   very hard to make it very good with a very short encoding
choice [1].                                                     time.

As I am not against the manual tinkering and believe that       This variable were called timerate, for short, and calculated
it’s the only way to reach a truly pleasant video quality,      using encoding time over duration of the encoded video.
I also believe that a profile must match the source being
encoded and ideally each source video must have a specific       At first, on the single-processed prototype, wall-clock time
profile fine tunned just for it. Well, clearly it’s not a task    were used as encoding time and later, as there were multiple
for human endeavor.                                             machines with many processes each, CPU time were used.
2.2.3 Picture Quality (PSNR)                                             fact, the more the profile was tested, the surer one
After controlling bitrate and timerate, there is the video qual-          can be that the fitness is valid. It is necessary for two
ity, roughly represented by the PSNR of the encoded frames                reasons: (a) there is an optimization that reduces the
against the equivalent source ones. In our case, a great ad-              duration of the source video tested to speedup the evo-
vantage of using PSNR is that the encoder had it built-in and             lution; (b) even if there is an error in the middle of the
the evaluation cost were almost insignificant, but there are               processing the profiles that went farther are benefited.
problems associated with this method, specially when op-
erations of different nature were applied to the frames (like           • Minimize the errors and warnings – in a similar spirit of
the blurriness noted below, on the problem between scaling               the above, profiles are penalized by the amount of error
and compression), if this becomes unacceptable an options                and warning messages. Either to minimize problems
is to implement an independent SSIM evaluation [4].                      on working profiles or to make broken ones to approach
                                                                         a working state.
But in all cases, to use it in the fitness formula the PSNR of
                                                                       • Minimize the number of parameters – slightly avoid
each video frame must be combined into one single number;
                                                                         useless combinations of parameters that would not bring
average them will not do, because small slips in each frame
                                                                         any improvement and may raise the risk of instability,
would look the same as a huge damage in few frames, and the
                                                                         by bringing seldom tested options into the profiles.
latter is very undesirable. So as the intra-frame evaluation
used PSNR the inter-frame must use it as well, let’s call it
PSNR’ and define as follows:                                         2.3    Hacks to Speedup
                     n−1                                               • Abort soon if too slow – monitor encodings on-the-
                 1   X
                                                                         fly and abort, if it is taking longer than minimally
M SE (S, E) =              max (0, P SN Re − P SN R (S(k), E(k)))
                 n                                                       acceptable for the total time of encoding expected.
                     k=0
                                                       !               • One encode per individual tested – pick at random only
                                          P SN Re
        P SN R (S, E) = 20 · log 10     p                                one source video to process for each individual, since
                                         M SE (S, E)                     processing all of them would make the evolution many
S(k) and E(k) are the kth source and encoded frames; n                   times slower. Happened, as expected, that good pro-
are the number of video frames encoded; M SE (S, E) is the               files appeared in genes of many individuals and were
mean square root of the encoded frames against the source                verified against multiple sources, which made them us-
ones; P SN Re , or target PSNR, is the maximum expected                  able globally.
PSNR for each frame.
                                                                       • Process small bits at beginning – encode only a small
                                                                         part of each source video at the beginning of the evo-
In other words, the encoded video’s PSNR’ is the PSNR of
                                                                         lution, when the entropy is higher and there is a lot
each frame’s PSNR against a target PSNR.
                                                                         of wasted processing with completely broken profiles.
                                                                         Then, as the fitness raises, encode longer segments of
It could be noted that, if multiple source videos were being
                                                                         the source videos to fine tune the surviving profiles.
encoded for each profile tested, it would be necessary to
pick the PSNR’ from each video and generate a PSNR” in a               • Profile injection – profiles could be injected on-the-fly
similar way than before, and this new variable is the one that           at the beginning of any generation, including the first
should be used on the fitness calculation. In this project it             one, which needs it most. No process had to be stopped
didn’t happened because of an optimization described below.              for this; a file with the command-line of the encoding
                                                                         tool should be placed on a watch folder and the con-
Also should be noted that, to compare source and encoded                 trol process would map it’s parameters to genes of new
frames using PSNR, both must have the same number of                     individuals. It’s very streamlined, because the state
pixels. And, since the resize must happen before this com-               of each generation is also stored as the command-line
parison, the scale method – which have many parameters                   of each individual commented with the fitness value.
itself – is not taken in account by the PSNR calculation. So,            This way, even changes on the definition of the genetic
if those parameters are being optimized, the evolution will              strip does not break the gene pool, because it’s state
make sure the scaling method used make the smaller amount                can be loaded as usual, mapping the parameters to the
of compression artifacts, for this reason the blurriest scaler           appropriate genes. For this reason, injections can hap-
will be automatically selected. A possible solution to this              pen much smoothly for profiles foreign to this system
problem is to use the best scaler you have to generate the               (e.g., exchanged on video forums). Usually to inject a
reference frames as close as possible to the source ones.                good profile on the gene pool would bring it’s qualities
                                                                         to many individuals on subsequent generations, being
2.2.4    Other Optimizations                                             it a faster processing, better quality or more stability.
   • Have a working profile, above all – it’s critical to distin-
     guish whose profiles are working, for it there were two
     implicit bands for fitness values: (a) 0 ≤ fitness < 10
                                                                    2.4    Distributed Processing
                                                                    The first prototype could only run as a single process, and
     for profiles that broke at some point of the process and
                                                                    despite it had greatly improved the testing of candidate pro-
     (b) fitness ≥ 10 for those that processed all the way
                                                                    files for one source video, the result profile were only capable
     to the end of the requested duration.
                                                                    of encoding that video well. This behavior was expected, but
   • Maximize duration of source encoded – the longer the           to overcome it the gene pool had to be evolved with mul-
     source video processed the higher the fitness value; in         tiple source videos. That would use a lot more processing
power per generation and add a lot of entropy, requiring          Finally, fitness of a working individual:
to raise the population limit per generation, to avoid loos-
ing valuable genes in chaos. Again, more individuals means          fi =10+                                                           (1)
even more processing power, so we had to quickly integrated               min(0.5 + (fg /fe ), 1)·                                    (2)
the evolution control process with distributed workers [3] to             pi ·                                                        (3)
perform the encoding.                                                                      2
                                                                          min(be /bi , 1) ·                                           (4)
                                                                                          2
At start, every worker process binds an UDP port and keep                 min(te /ti , 1) ·                                       (5)
standing-by, waiting for the control process to broadcast a               „„                                  «                «
                                                                                       clamp(te /2, ti , te )
job offering. Upon the offer a simple protocol verifies that                    1−2·                               · 0.001 + 0.999 · (6)
only one worker get the job. This way, the control process                                      te
                                                                          „„          «                       «
generated new individuals and asked for the workers to per-                       gi
                                                                             1−          · 0.0001 + 0.9999                        (7)
form the evaluation by giving them the profiles with the                           ge
command-line of the encoder – much like they would be in
production –, and waiting for the processing log, that had        pe , be and te are the PSNR’, bitrate and timerate targets for
all the information necessary to compose the individual fit-       the evolution; the starting value of 10 is the base fitness for
ness. All the job control is handled by UDP, but the videos,      working profiles; fg is the best fitness on the last closed gen-
profiles and logs are transfered through common LAN file            eration and fe is the target fitness of the whole evolution,
sharing. The whole design is in such fashion that to plug or      then fg /fe represents how mature the evolution is so that
unplug workers on-the-fly would not compromise the evalu-          no steep turn in fitness are accepted, this may avoid local
ation of any individual on the gene pool.                         maxima, specially because the start of the evolution is pro-
                                                                  cessed with smaller samples and those fitness values worth
                                                                  less (see tlg , below); gi is the number of genes (parameters)
2.5     Evolution                                                 that assumed non-default values and ge the total number of
Assuming that there are values in a gene, genes in an in-         genes per individuals, in this run.
dividual, individuals in a generation and generations in an
evolution (v ∈ g ∈ I ∈ G ∈ E). In other words, each gene          In case of fatal errors the fitness got a special value:
g holds a position on the genetic strip of the individual I
and can assume some values v preselected from a reasonable                         fi =1 + 9·                                         (8)
range, to reduce the search space. At the end of each gen-                             min(0.5 + (fg /fe ), 1)·                       (9)
eration G, the number of individuals must be trimmed to a                                     1
maximum population. This is accomplished by sorting the                                                ·                             (10)
                                                                                       1 + errorsi /99
individuals by the value of the fitness function and discard-
                                                                                       pi
ing the less fit. The target at each generation is to select                               ·                                          (11)
the fittest individuals, to determine the highest fitness of                             pe
the generation (fg ) and ultimately achieve the determined                             min(progressi /durationi ), 1)                (12)
target fitness of the whole evolution (fe ). At the beginning
                                                                  errorsi are the number of errors and warnings received from
it loads the individuals from the saved gene pool file, or gen-
                                                                  the encoder (fatal errors add 99 to this value and warnings
erate them randomly, to fill the maximum population. And
                                                                  add only 1); progressi is how much of the requested work
after that, it looks for some profiles in the injection watch
                                                                  was completed – it were designed to work with 2-pass en-
folder, as it does at every generation start.
                                                                  codings, so the 1st pass is accounted as 25% of the whole
                                                                  progress, and the 2nd pass start on 25% and goes through
2.5.1    Fitness Function                                         100%.
The design of the fitness function were highly empirical, so,
to simplify it’s formulation – along with the well known
                                                                  2.5.2      Generation End
max() and min() – the following helper function and vari-
                                                                  At this point we know which is the most fitted individual of
ables were used:
                             8                                    the generation (fg ), the population can be trimmed to the
                             < a       (x < a)                    maximum allowed and it’s gene pool can be saved to disk.
           clamp (a, x, b) =    b      (x > b)                    But besides that, some variables have to be readjusted:
                             : x (a ≤ x ≤ b)

                                                                     • timelimit – duration of the source videos to be evalu-
               pi = min(P SN R (Si , Ei ), pe )
                                                                       ated per individual profile. As seen before, fg /fe reg-
                  „       „                       ««                   ulates the proximity to the end of the evolution, so the
                                  sizei · 8                            advance of this value is a smooth interactive process,
          bi = max 0.001,
                               durationi · 1024                        grows from tl0 to tl1 :

           ti = max(0.001, cputimei /durationi )                                 tlg = 1 + tl0 + min(fg /fe , 1) ∗ (tl1 − tl0 − 1)

Si are the frames of the source video sample; Ei are the             • timeout timerate – this is the variable that kill the en-
encoded frames of that video; sizei is the encoded file size (in        coding if it’s taking too long to process. Based on the
bytes); durationi is the duration (in seconds) of the encoded          same fg /fe , it shrinks from tt0 to tt1 :
video; cputimei is the time that the process took to execute
and pe is the target PSNR’.                                                          ttg = tt0 + min(fg /fe , 1) ∗ (tt1 − tt0 )
2.7    Perceptual Evaluation
                                                                It’s strongly recommended that at any point of the evolution
                                                                the encoded videos are verified by eye and adjusts applied
                                                                to the gene structure or the automated evaluation method.
                                                                And the final selection of profiles should be tested against
                                                                as many sources as possible, on the types of content that
                                                                will be used in production (e.g., baseball games, soap op-
                                                                eras, news reports or all of them, for general usage). For
                                                                that, the final test were performed using the winner profile
                                                                to encode 675 source videos – representing an average daily
                                                                production. All the encoded videos had their resulting bi-
                                                                trate and timerate verified and they were played-back in loop
                                                                on the monitor screens around the work area for scrutiny of
                                                                all Webmedia personnel.

                                                                3.    CONCLUSION
                                                                Genetic Algorithms can quickly bring great improvement on
                                                                the quality of hand-crafted video profiles; from the design to
                                                                the end of the first run, it took less than two weeks.

                                                                But the clear disadvantage is that the profile can fail badly
                                                                with sources outside the trained content type. On the other
                                                                side, using too much video sources to train it, would not
Figure 1: evolution chart displaying a normalized               only slow down the process, but can also generate mediocre
                                               pi
combination of the main variables optimized ( bi ·ti )          profiles. A better solution is to create a profile for every
                                                                different source type, like every TV show or every different
over about 5 days
                                                                sport game.

                                                                It is important to be noted that the greatest benefit of the
2.6   Reference Run                                             first run of this project were to successfully bring the legacy
The Genetic Algorithms had crossover rate at 80%, mutation      codec – Sorenson Spark, that had a more widespread adop-
at 2.5% and population pruning to 250 individuals. The          tion among our user-base – on par with a newer technology
basic targets and ranges were set like this: target bitrate,    – On2 VP6, that only had proprietary tools which did not
on this reference run, were 600kbps; target timerate were       integrate well on our production system. This brought us
2, so CPU time could be around two times the duration of        time for the stabilization of the technology and the adoption
the encoded video; target PSNR were set to 40db, above          of H.264 as the standard encoding, instead of lesser, propri-
which any improvement would not be significant, and target       etary alternatives that pressed quality improvements in the
PSNR’ to 60; target fitness were 70, target PSNR’ plus the       gap between the standardization of the convergence codec
base fitness for working profiles (10); timelimit ranges from     and the stability of the legacy one.
5 to 45 and timeout timerate ranges from 50 to 2, from the
loosest to the strictest timeout policy, at the end.            Finally, for the organization, it brings the security of charted
                                                                evolution, above the insecurity of sole subjective evaluation.
The evolution started with one process and finished on six
machines, using a total of thirteen 3GHz cores. One ma-         4.    ACKNOWLEDGMENTS
chine were used for the genetic control and the others, with    The distributed processing would not be possible without
two worker processes each, dedicated to encoding. More          the help of Fernando Luiz Valente de Souza.
processes were plugged on-the-fly, as the machines became
available. And, at the beginning, there were injected the
best quality profile (extremely slow to encode), the fastest
                                                                5.    REFERENCES
                                                                [1] F. Bellard. FFmpeg Documentation.
one (of subpar quality) and the one used in production (more
                                                                    http://www.ffmpeg.org/, 2004-2008.
stable). Finally, the winner choice took in account the pro-
files that had the better fitness for the most source videos.     [2] D. E. G. K. Deb and J. Horn. Genetic algorithms. In in
                                                                    Search, Optimization, and Machine Learning.
Many breakthroughs are distinguishable on the progress chart        Addison-Wesley, 1989.
(Figure 1), some associated with changes in the circum-         [3] J. pierre Goux; Jeff Linderoth and M. Yoder.
stances described below, by the birth count: at the begin-          Metacomputing and the master-worker paradigm. In
ning there were one process, only one source video were eval-       Preprint MCS/ANL-P792-0200, Mathematics and
uated and the bitrate constraint was 420kbps; at the indi-          Computer Science Division, Argonne National
vidual 4000 it changed to 600kbps; at 7000 many machines            Laboratory, Argonne, 2000.
entered the game and there were several changes on the code;    [4] Z. Wang, A. C. Bovik, H. R. Sheikh, S. Member, E. P.
at 20000 the fitness function changed to allow working pro-          Simoncelli, and S. Member. Image quality assessment:
files with warnings to evolve freely, as they can work sur-          From error visibility to structural similarity. IEEE
prisingly well in some cases.                                       Transactions on Image Processing, 13:600–612, 2004.

Weitere ähnliche Inhalte

Andere mochten auch

Sagar grdds final ppt pd
Sagar grdds final ppt pdSagar grdds final ppt pd
Sagar grdds final ppt pdSagar Bansal
 
Hydrodynamically balanced systems
Hydrodynamically balanced systemsHydrodynamically balanced systems
Hydrodynamically balanced systemsNikhil Bhandiwad
 
An introduction-to-polymer-physics
An introduction-to-polymer-physicsAn introduction-to-polymer-physics
An introduction-to-polymer-physicsfhdjamshed
 
Strain gauge loadcell ppt
Strain gauge loadcell pptStrain gauge loadcell ppt
Strain gauge loadcell pptmaneeb
 
Compression pdf
Compression pdfCompression pdf
Compression pdfnisargrx
 
Mucoadhesive drug delivery system
Mucoadhesive drug delivery systemMucoadhesive drug delivery system
Mucoadhesive drug delivery systemJamia Hamdard
 
Polymer science: preparation and uses of polymers
Polymer science: preparation and uses of polymersPolymer science: preparation and uses of polymers
Polymer science: preparation and uses of polymersVARSHAAWASAR
 
Polymers and their properties
Polymers and their propertiesPolymers and their properties
Polymers and their propertiesripestone_ho
 
Transdermal drug delivery system ppt
Transdermal drug delivery system pptTransdermal drug delivery system ppt
Transdermal drug delivery system pptDeepak Sarangi
 
Transdermal drug delivery systems
Transdermal drug delivery systemsTransdermal drug delivery systems
Transdermal drug delivery systemsSonam Gandhi
 
Transdermal drug delivery system
Transdermal drug delivery systemTransdermal drug delivery system
Transdermal drug delivery systemDanish Kurien
 

Andere mochten auch (16)

Polymers and its Viscoelastic Nature
Polymers and its Viscoelastic NaturePolymers and its Viscoelastic Nature
Polymers and its Viscoelastic Nature
 
Sagar grdds final ppt pd
Sagar grdds final ppt pdSagar grdds final ppt pd
Sagar grdds final ppt pd
 
Viscoelasticity
ViscoelasticityViscoelasticity
Viscoelasticity
 
Hydrodynamically balanced systems
Hydrodynamically balanced systemsHydrodynamically balanced systems
Hydrodynamically balanced systems
 
An introduction-to-polymer-physics
An introduction-to-polymer-physicsAn introduction-to-polymer-physics
An introduction-to-polymer-physics
 
Negotiation skills
Negotiation skillsNegotiation skills
Negotiation skills
 
Strain guage
Strain guageStrain guage
Strain guage
 
Strain gauge loadcell ppt
Strain gauge loadcell pptStrain gauge loadcell ppt
Strain gauge loadcell ppt
 
Compression pdf
Compression pdfCompression pdf
Compression pdf
 
Mucoadhesive drug delivery system
Mucoadhesive drug delivery systemMucoadhesive drug delivery system
Mucoadhesive drug delivery system
 
Polymer science: preparation and uses of polymers
Polymer science: preparation and uses of polymersPolymer science: preparation and uses of polymers
Polymer science: preparation and uses of polymers
 
Polymers and their properties
Polymers and their propertiesPolymers and their properties
Polymers and their properties
 
Transdermal drug delivery system ppt
Transdermal drug delivery system pptTransdermal drug delivery system ppt
Transdermal drug delivery system ppt
 
Transdermal drug delivery systems
Transdermal drug delivery systemsTransdermal drug delivery systems
Transdermal drug delivery systems
 
Strain gauge
Strain gaugeStrain gauge
Strain gauge
 
Transdermal drug delivery system
Transdermal drug delivery systemTransdermal drug delivery system
Transdermal drug delivery system
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Optimization of Video Compression Parameters through Genetic Algorithms (2008)

  • 1. Optimization of Video Compression Parameters through Genetic Algorithms Carlo "zED" Caputo globo.com – webmedia Avenida das Americas, 500, Bl. 18, Sala 103 Rio de Janeiro, Brazil carlo.caputo@corp.globo.com ABSTRACT 2. METHOD Applied research of an automated method to quickly emerge Given the difficulty to optimize more than one variable [2] better encoding profiles, using Genetic Algorithms with many and the short time we had to implement a solution, it seemed source video samples to converge to the specified limits of necessary to use an evolutionary method to make the nec- processing time, video quality and file size. essary automation possible. Genetic Algorithms were chosen for the first prototype, us- Categories and Subject Descriptors ing two-point crossover and roulette wheel selection – to I.4.2 [Data]: Image Processing and Computer Vision—Com- avoid to be stuck with at a local maxima. So most of the pression (Coding); I.2.m [Computing Methodologies]: work were employed in finding the appropriate variables for Artificial Intelligence—Miscellaneous, Evolutionary comput- to forge an adequate fitness function. ing and genetic algorithms; C.2.4 [Computer Systems Organization]: Computer-Communication Networks—Dis- tributed Systems 2.1 Encoding Parameters to Optimize The worst problem were the difficulty to find the relation between the encoder tool parameters [1] and the final vari- Keywords ables to optimize. A raw search would explore a domain of Video Compression, Encoding Profile, 2-Pass Encoding, Bi- 3.18981e+78 combinations in 91 parameters (for 1st and 2nd trate, ffmpeg, Peak Signal Noise Ratio (PSNR), Structural pass), whose values could usually range from 0 to 1 or even SIMilarity (SSIM), Genetic Algorithms, Two-point Crossover, from 0 to millions, but that in our case were represented by Roulette Wheel Selection, Global Optimization, Local Max- a discrete sample of meaningful values. ima, Master-Worker Paradigm, Distributed Encoding, UDP Broadcast, Webmedia 2.2 Evaluation Variables to Optimize The result of encoding would be evaluated into the following 1. INTRODUCTION few variables that would be optimized without worrying to Traditionally the encoding profile tunning has being per- keep track of their complex relation to the overwhelming formed as a time-consuming craft, by manual tinkering of number of encoding parameters. the parameters and intense observation. In this process the artisan should have a good eye to spot what each parameter 2.2.1 Size of Encoded Video File (bitrate) caused on the encoded video; such labor were quite mislead- The main variable is bitrate, either for the network or storage ing and error prone, because some combinations, despite restrictions, that can be a great concern for those who deal bringing a little quality improvement, could cause a great with a large audience or production. slow down, or a seemingly harmless changes could crash the encoder for some unusual source video. In any circumstances, it would not make sense to compare video quality without controlling the bitrate. As there weren’t much time, nor interest, to understand what parameters produced the desired result, I had to con- 2.2.2 Time of Encoding (timerate) ceive a tool capable of accelerating the research of profiles This variable really changes from profile to profile and were under the ever changing technological restrictions. Some- the one hardest to optimize by hand; it’s easy to achieve thing that mixed smart fine tunning of the encoded videos, great quality with an extremely long processing time, but it’s with robust stress testing of the profile under the encoder of very hard to make it very good with a very short encoding choice [1]. time. As I am not against the manual tinkering and believe that This variable were called timerate, for short, and calculated it’s the only way to reach a truly pleasant video quality, using encoding time over duration of the encoded video. I also believe that a profile must match the source being encoded and ideally each source video must have a specific At first, on the single-processed prototype, wall-clock time profile fine tunned just for it. Well, clearly it’s not a task were used as encoding time and later, as there were multiple for human endeavor. machines with many processes each, CPU time were used.
  • 2. 2.2.3 Picture Quality (PSNR) fact, the more the profile was tested, the surer one After controlling bitrate and timerate, there is the video qual- can be that the fitness is valid. It is necessary for two ity, roughly represented by the PSNR of the encoded frames reasons: (a) there is an optimization that reduces the against the equivalent source ones. In our case, a great ad- duration of the source video tested to speedup the evo- vantage of using PSNR is that the encoder had it built-in and lution; (b) even if there is an error in the middle of the the evaluation cost were almost insignificant, but there are processing the profiles that went farther are benefited. problems associated with this method, specially when op- erations of different nature were applied to the frames (like • Minimize the errors and warnings – in a similar spirit of the blurriness noted below, on the problem between scaling the above, profiles are penalized by the amount of error and compression), if this becomes unacceptable an options and warning messages. Either to minimize problems is to implement an independent SSIM evaluation [4]. on working profiles or to make broken ones to approach a working state. But in all cases, to use it in the fitness formula the PSNR of • Minimize the number of parameters – slightly avoid each video frame must be combined into one single number; useless combinations of parameters that would not bring average them will not do, because small slips in each frame any improvement and may raise the risk of instability, would look the same as a huge damage in few frames, and the by bringing seldom tested options into the profiles. latter is very undesirable. So as the intra-frame evaluation used PSNR the inter-frame must use it as well, let’s call it PSNR’ and define as follows: 2.3 Hacks to Speedup n−1 • Abort soon if too slow – monitor encodings on-the- 1 X fly and abort, if it is taking longer than minimally M SE (S, E) = max (0, P SN Re − P SN R (S(k), E(k))) n acceptable for the total time of encoding expected. k=0 ! • One encode per individual tested – pick at random only P SN Re P SN R (S, E) = 20 · log 10 p one source video to process for each individual, since M SE (S, E) processing all of them would make the evolution many S(k) and E(k) are the kth source and encoded frames; n times slower. Happened, as expected, that good pro- are the number of video frames encoded; M SE (S, E) is the files appeared in genes of many individuals and were mean square root of the encoded frames against the source verified against multiple sources, which made them us- ones; P SN Re , or target PSNR, is the maximum expected able globally. PSNR for each frame. • Process small bits at beginning – encode only a small part of each source video at the beginning of the evo- In other words, the encoded video’s PSNR’ is the PSNR of lution, when the entropy is higher and there is a lot each frame’s PSNR against a target PSNR. of wasted processing with completely broken profiles. Then, as the fitness raises, encode longer segments of It could be noted that, if multiple source videos were being the source videos to fine tune the surviving profiles. encoded for each profile tested, it would be necessary to pick the PSNR’ from each video and generate a PSNR” in a • Profile injection – profiles could be injected on-the-fly similar way than before, and this new variable is the one that at the beginning of any generation, including the first should be used on the fitness calculation. In this project it one, which needs it most. No process had to be stopped didn’t happened because of an optimization described below. for this; a file with the command-line of the encoding tool should be placed on a watch folder and the con- Also should be noted that, to compare source and encoded trol process would map it’s parameters to genes of new frames using PSNR, both must have the same number of individuals. It’s very streamlined, because the state pixels. And, since the resize must happen before this com- of each generation is also stored as the command-line parison, the scale method – which have many parameters of each individual commented with the fitness value. itself – is not taken in account by the PSNR calculation. So, This way, even changes on the definition of the genetic if those parameters are being optimized, the evolution will strip does not break the gene pool, because it’s state make sure the scaling method used make the smaller amount can be loaded as usual, mapping the parameters to the of compression artifacts, for this reason the blurriest scaler appropriate genes. For this reason, injections can hap- will be automatically selected. A possible solution to this pen much smoothly for profiles foreign to this system problem is to use the best scaler you have to generate the (e.g., exchanged on video forums). Usually to inject a reference frames as close as possible to the source ones. good profile on the gene pool would bring it’s qualities to many individuals on subsequent generations, being 2.2.4 Other Optimizations it a faster processing, better quality or more stability. • Have a working profile, above all – it’s critical to distin- guish whose profiles are working, for it there were two implicit bands for fitness values: (a) 0 ≤ fitness < 10 2.4 Distributed Processing The first prototype could only run as a single process, and for profiles that broke at some point of the process and despite it had greatly improved the testing of candidate pro- (b) fitness ≥ 10 for those that processed all the way files for one source video, the result profile were only capable to the end of the requested duration. of encoding that video well. This behavior was expected, but • Maximize duration of source encoded – the longer the to overcome it the gene pool had to be evolved with mul- source video processed the higher the fitness value; in tiple source videos. That would use a lot more processing
  • 3. power per generation and add a lot of entropy, requiring Finally, fitness of a working individual: to raise the population limit per generation, to avoid loos- ing valuable genes in chaos. Again, more individuals means fi =10+ (1) even more processing power, so we had to quickly integrated min(0.5 + (fg /fe ), 1)· (2) the evolution control process with distributed workers [3] to pi · (3) perform the encoding. 2 min(be /bi , 1) · (4) 2 At start, every worker process binds an UDP port and keep min(te /ti , 1) · (5) standing-by, waiting for the control process to broadcast a „„ « « clamp(te /2, ti , te ) job offering. Upon the offer a simple protocol verifies that 1−2· · 0.001 + 0.999 · (6) only one worker get the job. This way, the control process te „„ « « generated new individuals and asked for the workers to per- gi 1− · 0.0001 + 0.9999 (7) form the evaluation by giving them the profiles with the ge command-line of the encoder – much like they would be in production –, and waiting for the processing log, that had pe , be and te are the PSNR’, bitrate and timerate targets for all the information necessary to compose the individual fit- the evolution; the starting value of 10 is the base fitness for ness. All the job control is handled by UDP, but the videos, working profiles; fg is the best fitness on the last closed gen- profiles and logs are transfered through common LAN file eration and fe is the target fitness of the whole evolution, sharing. The whole design is in such fashion that to plug or then fg /fe represents how mature the evolution is so that unplug workers on-the-fly would not compromise the evalu- no steep turn in fitness are accepted, this may avoid local ation of any individual on the gene pool. maxima, specially because the start of the evolution is pro- cessed with smaller samples and those fitness values worth less (see tlg , below); gi is the number of genes (parameters) 2.5 Evolution that assumed non-default values and ge the total number of Assuming that there are values in a gene, genes in an in- genes per individuals, in this run. dividual, individuals in a generation and generations in an evolution (v ∈ g ∈ I ∈ G ∈ E). In other words, each gene In case of fatal errors the fitness got a special value: g holds a position on the genetic strip of the individual I and can assume some values v preselected from a reasonable fi =1 + 9· (8) range, to reduce the search space. At the end of each gen- min(0.5 + (fg /fe ), 1)· (9) eration G, the number of individuals must be trimmed to a 1 maximum population. This is accomplished by sorting the · (10) 1 + errorsi /99 individuals by the value of the fitness function and discard- pi ing the less fit. The target at each generation is to select · (11) the fittest individuals, to determine the highest fitness of pe the generation (fg ) and ultimately achieve the determined min(progressi /durationi ), 1) (12) target fitness of the whole evolution (fe ). At the beginning errorsi are the number of errors and warnings received from it loads the individuals from the saved gene pool file, or gen- the encoder (fatal errors add 99 to this value and warnings erate them randomly, to fill the maximum population. And add only 1); progressi is how much of the requested work after that, it looks for some profiles in the injection watch was completed – it were designed to work with 2-pass en- folder, as it does at every generation start. codings, so the 1st pass is accounted as 25% of the whole progress, and the 2nd pass start on 25% and goes through 2.5.1 Fitness Function 100%. The design of the fitness function were highly empirical, so, to simplify it’s formulation – along with the well known 2.5.2 Generation End max() and min() – the following helper function and vari- At this point we know which is the most fitted individual of ables were used: 8 the generation (fg ), the population can be trimmed to the < a (x < a) maximum allowed and it’s gene pool can be saved to disk. clamp (a, x, b) = b (x > b) But besides that, some variables have to be readjusted: : x (a ≤ x ≤ b) • timelimit – duration of the source videos to be evalu- pi = min(P SN R (Si , Ei ), pe ) ated per individual profile. As seen before, fg /fe reg- „ „ «« ulates the proximity to the end of the evolution, so the sizei · 8 advance of this value is a smooth interactive process, bi = max 0.001, durationi · 1024 grows from tl0 to tl1 : ti = max(0.001, cputimei /durationi ) tlg = 1 + tl0 + min(fg /fe , 1) ∗ (tl1 − tl0 − 1) Si are the frames of the source video sample; Ei are the • timeout timerate – this is the variable that kill the en- encoded frames of that video; sizei is the encoded file size (in coding if it’s taking too long to process. Based on the bytes); durationi is the duration (in seconds) of the encoded same fg /fe , it shrinks from tt0 to tt1 : video; cputimei is the time that the process took to execute and pe is the target PSNR’. ttg = tt0 + min(fg /fe , 1) ∗ (tt1 − tt0 )
  • 4. 2.7 Perceptual Evaluation It’s strongly recommended that at any point of the evolution the encoded videos are verified by eye and adjusts applied to the gene structure or the automated evaluation method. And the final selection of profiles should be tested against as many sources as possible, on the types of content that will be used in production (e.g., baseball games, soap op- eras, news reports or all of them, for general usage). For that, the final test were performed using the winner profile to encode 675 source videos – representing an average daily production. All the encoded videos had their resulting bi- trate and timerate verified and they were played-back in loop on the monitor screens around the work area for scrutiny of all Webmedia personnel. 3. CONCLUSION Genetic Algorithms can quickly bring great improvement on the quality of hand-crafted video profiles; from the design to the end of the first run, it took less than two weeks. But the clear disadvantage is that the profile can fail badly with sources outside the trained content type. On the other side, using too much video sources to train it, would not Figure 1: evolution chart displaying a normalized only slow down the process, but can also generate mediocre pi combination of the main variables optimized ( bi ·ti ) profiles. A better solution is to create a profile for every different source type, like every TV show or every different over about 5 days sport game. It is important to be noted that the greatest benefit of the 2.6 Reference Run first run of this project were to successfully bring the legacy The Genetic Algorithms had crossover rate at 80%, mutation codec – Sorenson Spark, that had a more widespread adop- at 2.5% and population pruning to 250 individuals. The tion among our user-base – on par with a newer technology basic targets and ranges were set like this: target bitrate, – On2 VP6, that only had proprietary tools which did not on this reference run, were 600kbps; target timerate were integrate well on our production system. This brought us 2, so CPU time could be around two times the duration of time for the stabilization of the technology and the adoption the encoded video; target PSNR were set to 40db, above of H.264 as the standard encoding, instead of lesser, propri- which any improvement would not be significant, and target etary alternatives that pressed quality improvements in the PSNR’ to 60; target fitness were 70, target PSNR’ plus the gap between the standardization of the convergence codec base fitness for working profiles (10); timelimit ranges from and the stability of the legacy one. 5 to 45 and timeout timerate ranges from 50 to 2, from the loosest to the strictest timeout policy, at the end. Finally, for the organization, it brings the security of charted evolution, above the insecurity of sole subjective evaluation. The evolution started with one process and finished on six machines, using a total of thirteen 3GHz cores. One ma- 4. ACKNOWLEDGMENTS chine were used for the genetic control and the others, with The distributed processing would not be possible without two worker processes each, dedicated to encoding. More the help of Fernando Luiz Valente de Souza. processes were plugged on-the-fly, as the machines became available. And, at the beginning, there were injected the best quality profile (extremely slow to encode), the fastest 5. REFERENCES [1] F. Bellard. FFmpeg Documentation. one (of subpar quality) and the one used in production (more http://www.ffmpeg.org/, 2004-2008. stable). Finally, the winner choice took in account the pro- files that had the better fitness for the most source videos. [2] D. E. G. K. Deb and J. Horn. Genetic algorithms. In in Search, Optimization, and Machine Learning. Many breakthroughs are distinguishable on the progress chart Addison-Wesley, 1989. (Figure 1), some associated with changes in the circum- [3] J. pierre Goux; Jeff Linderoth and M. Yoder. stances described below, by the birth count: at the begin- Metacomputing and the master-worker paradigm. In ning there were one process, only one source video were eval- Preprint MCS/ANL-P792-0200, Mathematics and uated and the bitrate constraint was 420kbps; at the indi- Computer Science Division, Argonne National vidual 4000 it changed to 600kbps; at 7000 many machines Laboratory, Argonne, 2000. entered the game and there were several changes on the code; [4] Z. Wang, A. C. Bovik, H. R. Sheikh, S. Member, E. P. at 20000 the fitness function changed to allow working pro- Simoncelli, and S. Member. Image quality assessment: files with warnings to evolve freely, as they can work sur- From error visibility to structural similarity. IEEE prisingly well in some cases. Transactions on Image Processing, 13:600–612, 2004.