MM-4096, x265: Open Source H.265/HEVC Video Encoder, by Steve Borho

X265
OPEN
SOURCE
H.265
ENCODER

OPTIMIZATION
DETAILS

H.265/HEVC
FINALIZED
JANUARY
25,
2013

NOTABLE
CHANGES
FROM
H.264

!  H.264’s
16x16
macroblocks
replaced
with
64x64
CUs
and
QuadTrees

‒  Coding

QuadTree
can
be
recursively
split
down
to
8x8
blocks

‒  At
all
levels,
the
coding
blocks
can
chose
inter
or
intra
predic]on

‒  The
ﬁnal
coding
blocks
can
be
further
split

‒  The
residual
is

signaled
in
a
second
QuadTree
which
can
have
more
depth
than
the
coding
QT

!  Inter
predic]on
has
more
accuracy

‒  HPEL
ﬁlter
has
8-‐taps,
QPEL
has
7-‐taps.

(H.264
has
6-‐tap
HPEL
and
avg
QPEL)

‒  Merge
candidates
replace
direct
and
skip
H.264
modes

‒  AMVP
allows
mo]on
predic]on
to
be
selected
from
a
list,
in
H.264
it
was
en]rely
implicit

4
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

H.265/HEVC
FINALIZED
JANUARY
25,
2013

NOTABLE
CHANGES
FROM
H.264

!  More
intra
predic]ons

‒  DC
and
planar
modes,
similar
to
H.264

‒  33
angular
predic]ons
with
emphasis
on
near-‐ver]cal
and
near-‐horizontal
angles

‒  35
predic]ons
in
total
(for
all
block
sizes
from
32x32
to
4x4)
but
few
special
cases

!  Sample
Adap]ve
Oﬀset
loop
ﬁlter
for
reduced
compression
ar]facts

5
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

H.265/HEVC
PARALLELIZATION
CONSIDERATIONS

NOTABLE
CHANGES
FROM
H.264

!  WaveFront
Parallel
Processing

‒  Each
row
of
largest
CU
blocks
can
be
encoded
in
parallel,
with
a
two
block
lag
to
row
above

‒  The
CABAC
state
of
block
2
is
communicated
to
block
0
of
row
below

‒  <1%
loss
of
compression
eﬃciency,
much
more
eﬃcient

than
slices
or
]les

!  Tiles
–
split
each
frame
into
regular
rectangular
parts,
encode
each
in
parallel

!  Deblocking
only
on
8x8
boundaries,
and
beler
ordering
of
opera]ons

6
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

H.265/HEVC
PARALLELIZATION
CONSIDERATIONS

THE
FINE
PRINT

!  Larger
block
sizes
reduce
the
eﬀec]veness
of
frame
parallelism

‒  Only
a
quarter
of
the
available
block
rows
as
H.264
for
the
same
resolu]on
video

‒  Aner
accoun]ng
for
deblocking,
and
SAO
there
is
a
three
row
(192
line)
lag
between
references

‒  Wavefront
analysis
or
]les
must
be
used
in
conjunc]on
with
frame
parallelism
to
make
up
for
this

‒  High
percentage
of
B
frames
to
P
frames
alleviates
this
bolleneck

!  Large
blocks
increase
serial
opera]ons,
add
longer
data
dependencies

‒  Each
CU
in
the
quad-‐tree
must
be
analyzed
in
Z-‐scan
order

‒  Since
each
CU
can
chose
intra,
all
prior
blocks
must
generate
recon
pixels
–
no
shortcuts

‒  Varia]ons
in
CU
encode
]mes
reduce
the
eﬀec]veness
of
wavefront
analysis
by
causing
stalls

7
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

X265
–
A
SHORT
HISTORY

!  x265
Consor]um
founded
in
April
of
2013

‒  Dual
commercial
and
GPLv2+
license

‒  Development

primarily
centered
in
Chennai,
India
with
contribu]ons
from
China
and
US

‒  Started
from
the
HEVC
reference
encoder
(HM),
less
than
half
of
HM
source
remains
today

‒  Achieved
1080p
15fps
in
June

‒  Public

announcement
and
ﬁrst
open
source
release
in
July

!  Op]miza]ons

‒  WPP
wavefront
CTU
analysis
and
frame
parallelism

‒  Compiler
intrinsic
SIMD
based
performance
primi]ves

‒  Hand-‐wrilen
assembly
performance
primi]ves

‒  Data
ﬂow
improvements,
early
outs,
RDO
reduc]ons

!  Today

‒  1080p@30fps
or
720p@200fps
on
16-‐core
SandyBridge
Xeon

9
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

X265
–
A
SHORT
HISTORY

!  Ecosystem

‒  Licensed
to
reuse
x264
source
code
and
algorithms

‒  Open
development
on
mailing
list
and
IRC

‒  Public
repositories
on
Bitbucket
and
VideoLan.org

‒  Integra]on
into
VLC,
libav,
ﬀmpeg,
and
Handbrake
in
various
stages
of
comple]on

!  x264
feature
adop]on

‒  Lookahead
/
slicetype
decision
and
scene
cut
detec]on

‒  Mo]on
es]ma]on
and
bitcost
func]ons

‒  CLI
interface
and
public
C
interface

‒  Assembly
primi]ves
for
SAD,
SATD,
SSD,
etc

‒  ABR
and
CRF
rate
control
–
VBV
adop]on
in
progress
by
O/S
contributor

!  It
took
eight
years
for
x264
to
dominate
H.264
encoding
market

‒  We
would
like
to
achieve
dominance
in
the
HEVC
market
sooner

10
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

GPU
CONSIDERATIONS

A
SAD
HISTORY

!  Historically,
GPUs
have
been
poor
for
video
encoding

‒  Intra
predic]on
requires
blocks
above
and
to
the
len
to
be
fully
encoded
and
decoded

‒  Inter
predic]on
requires
blocks
above
and
to
the
len
to
be
fully
analyzed

‒  Rate
distor]on
op]miza]ons
require
all
blocks
to
be
encoded
in
scan
order

‒  Together,
these
dependencies
severely
limit
the
amount
of
parallelism
that
can
be
exposed
to
the
GPU

!  Encoder
data
dependencies
are
complex

‒  Copying
data
to
and
from
GPU
device
memory
generally
outweighs
any
performance
improvements

‒  Even
zero
copy
memory
is
insuﬃcient,
the
CPU
and
GPU
must
share
structures
at
full
speed

!  Previous
alempts
at
GPU
encoding
take
short
cuts

‒  One
can
ignore
some
of
these

dependencies
at
the
cost
of
compression
eﬃciency
and
quality

‒  In
x264,
we
only
used
the
GPU
for
lookahead
analysis
that
has
no
intra
and
RDO
dependencies

12
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

APU
CONSIDERATIONS

A
WELL
BALANCED
COMPUTE
PROCESSOR

!  Heterogeneous
architecture

‒  GPU
compute
units
can
perform
high
bandwidth
opera]ons
and
highly
parallel
opera]ons

‒  CPU
performs
necessary
serial
and
logis]cal
opera]ons

‒  CPU
and
GPU
can
see
each
other’s
memory

!  x265
opportunity

‒  Via
WPP
and
frame
parallelism
we
can
expose
two
dozen
parallel

CU
blocks
to
be
encoded

‒  Each
parallel
CU
block
requires
recursive
analysis

‒  Control
must
transfer
between
the
CPU
and
GPU
many
]mes
to
complete
analysis

‒  GPU
performs
all
cost
es]mates
for
inter
and
inter
compression,
loop
ﬁlters,
and
pixel
weigh]ng

‒  CPU
makes
QT
split
and
encode
decisions,
entropy
encoding,
and
dependency
tracking

‒  Many
CUs
can
be
busy
on
the
GPU
at
once,
only
four
may
use
the
CPU
cores
at
a
]me.

‒  Making
use
the
GPU
compute
units
with
minimal
CPU
overhead
is
the
key

13
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

DISCLAIMER
&
ATTRIBUTION

The
informa]on
presented
in
this
document
is
for
informa]onal
purposes
only
and
may
contain
technical
inaccuracies,
omissions
and
typographical
errors.

The
informa]on
contained
herein
is
subject
to
change
and
may
be
rendered
inaccurate
for
many
reasons,
including
but
not
limited
to
product
and
roadmap

changes,
component
and
motherboard
version
changes,
new
model
and/or
product
releases,
product
differences
between
differing
manufacturers,
sonware

changes,
BIOS
flashes,
firmware
upgrades,
or
the
like.
AMD
assumes
no
obliga]on
to
update
or
otherwise
correct
or
revise
this
informa]on.
However,
AMD

reserves
the
right
to
revise
this
informa]on
and
to
make
changes
from
]me
to
]me
to
the
content
hereof
without
obliga]on
of
AMD
to
no]fy
any
person
of

such
revisions
or
changes.

AMD
MAKES
NO
REPRESENTATIONS
OR
WARRANTIES
WITH
RESPECT
TO
THE
CONTENTS
HEREOF
AND
ASSUMES
NO
RESPONSIBILITY
FOR
ANY

INACCURACIES,
ERRORS
OR
OMISSIONS
THAT
MAY
APPEAR
IN
THIS
INFORMATION.

AMD
SPECIFICALLY
DISCLAIMS
ANY
IMPLIED
WARRANTIES
OF
MERCHANTABILITY
OR
FITNESS
FOR
ANY
PARTICULAR
PURPOSE.
IN
NO
EVENT
WILL
AMD
BE

LIABLE
TO
ANY
PERSON
FOR
ANY
DIRECT,
INDIRECT,
SPECIAL
OR
OTHER
CONSEQUENTIAL
DAMAGES
ARISING
FROM
THE
USE
OF
ANY
INFORMATION

CONTAINED
HEREIN,
EVEN
IF
AMD
IS
EXPRESSLY
ADVISED
OF
THE
POSSIBILITY
OF
SUCH
DAMAGES.

ATTRIBUTION

©
2013
Advanced
Micro
Devices,
Inc.
All
rights
reserved.
AMD,
the
AMD
Arrow
logo
and
combina]ons
thereof
are
trademarks
of
Advanced
Micro
Devices,

Inc.
in
the
United
States
and/or
other
jurisdic]ons.

SPEC

is
a
registered
trademark
of
the
Standard
Performance
Evalua]on
Corpora]on
(SPEC).
Other

names
are
for
informa]onal
purposes
only
and
may
be
trademarks
of
their
respec]ve
owners.

14
|

PRESENTATION
TITLE

|

NOVEMBER
19,
2013

|

CONFIDENTIAL

MM-4096, x265: Open Source H.265/HEVC Video Encoder, by Steve Borho

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von AMD Developer Central

Mehr von AMD Developer Central (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MM-4096, x265: Open Source H.265/HEVC Video Encoder, by Steve Borho