Strange x265 transcoding speed difference between avidemux 2.7.7 and handbrake

Started by mitigating, February 17, 2021, 07:20:13 PM

Previous topic - Next topic

mitigating

Hi,

I switched to nightly builds of avidemux and handbrake as I was trying to encode a prores 422 hq and neither read the video properly. After doing this I noticed that avidemux is 2x slower when using x265 for this particular video. I thought it was related to the  x265 settings but both are the same (except the versions). I just wanted to point this out as it may be a bug, the use of an older gcc, or older x265 lib. Avidemux was 9fps and handbrake was 20fps, which is a huge difference. Hardware decoding is off, threads set to auto

Source: ProRes 422 HQ (apch), 1088x816, 25fps, 10bit depth, 95mbit/sec
Output: x265 as CRF 20, slow preset

Avidemux: 2.7.7 210215_545c3b0cebe-fflibs 4.24
Handbrake: Nightly 20210214151010-0edac3da8-master (2021021401)

Avidemux x265 log
x265 [info]: HEVC encoder version 3.4
x265 [info]: build info [Windows][GCC 5.5.0][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-3.1 (Main tier)
x265 [info]: Thread pool created using 16 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 4 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
x265 [info]: Keyframe min / max / scenecut / bias  : 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 25 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 4 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-20.0 / 0.60
x265 [info]: tools: rect limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00
x265 [info]: tools: rskip mode=1 signhide tmvp strong-intra-smoothing lslices=4
x265 [info]: tools: deblock sao

handbrake:
x265 [info]: HEVC encoder version 3.5+dev-681c05e83
x265 [info]: build info [Windows][GCC 10.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-3.1 (Main tier)
x265 [info]: Thread pool created using 16 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 4 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
x265 [info]: Keyframe min / max / scenecut / bias  : 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 25 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 4 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-20.0 / 0.60
x265 [info]: tools: rect limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00
x265 [info]: tools: rskip mode=1 signhide tmvp strong-intra-smoothing lslices=4
x265 [info]: tools: deblock sao

eumagga0x2a

Could you please provide an Avidemux project script containing x265 settings?

A different gcc is off-limits, we use whatever is in MXE (in a version of MXE which works). I have no idea about the cross-build environment Handbrake project is using.

eumagga0x2a

I've finally managed to clone the x265 source (the mercurial repository at videolan.org was stubbornly misbehaving) and to try a win64 Avidemux build which was using it. With my sample video, the developer version of libx265 was roughly 30% faster than the 3.4 release with the same slow preset (4 fps vs 3 fps), the output file size was practically the same.

Unless someone helps me to understand which particular changes are responsible for the improved performance and why this particular point in the development is safe enough to be included in Avidemux, I am not inclined to request a (cumbersome) update of x265 in the win64 build node right now. Obviously, I look forward to the next x265 release.

alexstorm

When you click Expand in the logs listed above, Handbrake has different line info for the first two lines.  You guys already saw that, right?
AVIDemux
x265 [info]: HEVC encoder version 3.4
x265 [info]: build info [Windows][GCC 5.5.0][64 bit] 8bit

Handbrake
x265 [info]: HEVC encoder version 3.5+dev-681c05e83
x265 [info]: build info [Windows][GCC 10.2.0][64 bit] 8bit+10bit+12bit

I found a changelog at: https://www.videohelp.com/software/x265-Encoder/version-history

Maybe the three 3.5 "Enhancements to existing features" can explain the encoding speed difference?

QuoteVersion 3.5
+===========
+
+Release date - 16th March, 2021.
+
+New feature
+-----------
+1. Real-time VBV for ABR (Average BitRate) encodes in â€"pass 2 using :option:`--vbv-live-multi-pass`: Improves VBV compliance with no significant impact on coding efficiency.
+
+Enhancements to existing features
+---------------------------------
+1. Improved hist-based scene cut algorithm: Reduces false positives by leveraging motion and scene transition info.
+2. Support for RADL pictures at IDR scene cuts: Improves coding efficiency with no significant impact on performance.
+3. Bidirectional scene cut aware Frame Quantizer Selection: Saves bits than forward masking with no noticeable perceptual quality difference.
+
+API changes
+-----------
+1. Additions to x265_param structure to support the newly added features and encoder enhancements.
+2. New x265_param options :option:`--min-vbv-fullness` and :option:`--max-vbv-fullness` to control min and max VBV fullness.
+
+Bug fixes
+---------
+1. Incorrect VBV lookahead in :option:`--analysis-load` + :option:`--scale-factor`.
+2. Encoder hang when VBV is used with slices.
+3. QP spikes in the row-level VBV rate-control when WPP enabled.
+4. Encoder crash in :option:`--abr-ladder`.


Version 3.4
Release date - 29th May, 2020.

New features
Edge-aware quadtree partitioning to terminate CU depth recursion based on edge information. --rskip level 2 enables the feature and --rskip-edge-threshold denotes the minimum expected edge-density percentage within the CU, below which the recursion is skipped. Experimental feature.
Application-level feature --abr-ladder for automating efficient ABR ladder generation. Shows ~65% savings in the over-all turn-around time required for the generation of a typical Apple HLS ladder in Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz over a sequential ABR-ladder generation approach that leverages save-load architecture.
Enhancements to existing features
Improved efficiency in 2-pass rate-control algorithm. The savings in the bitrate is ~1.72% with visual improvement in quality in the initial 1-2 secs.
Encoder enhancements
Faster ARM64 encodes enabled by ASM contributions from Huawei. The speed-up over no-asm version for 1080p encodes @ medium preset is ~15% in a 16 core H/W.
Strict VBV conformance in zone encoding.
Bug fixes
Multi-pass encode failures with --frame-dup.
Corrupted bitstreams with --hist-scenecut when input depth and internal bit-depth differ.
Incorrect analysis propagation in multi-level save-load architecture.
Failure in detecting NUMA packages installed in non-standard directories.