HW Accel & Display settings - HW DXVA2+OpenGL display work best for me, somehow

Started by Urik, December 01, 2019, 05:10:10 PM

Previous topic - Next topic

Urik

So, I've been using Avidemux since a few months ago, so far, it's the best tool I've used for trimming GOP video.
I like the scrubbing by keyframes, frame type indicator, and intelligent warning system if you're trying to cut/edit in wrong place.

I'm on recent version of Win10, with i7 7700k and GTX1080Ti. Using current stable 2.7.5 VC++ 64bit.
I'm mostly dealing with 4k 60p gameplay footage in h264 of approx 100-130mbps bitrate. It has a GOP of half the framerate and only P-frames between (nvidia share).

From the start, I've played around with the settings until I found the ones that work for me best:

  • HW Accel                    > decode using DXVA2 (Windows)
  • Video display               > OpenGL (best)
  • GUI Rendering Options > Enable openGl support

Somehow I've found only with these exact settings I get the best playback performance and correct display.
With other options, like display > DXVA2, the video preview is somewhat pixelated / aliased, as if 4 pixels are kind of merged together.
To be clear, that's with default Avidemux window size and default scaling, which is 1:4. And this happens regardless of the file resolution.
Another thing is I get slightly washed out color & gamma, as if it's outputting the colorspace wrong (limited instead of full range).
Finally, with DXVA2 display it often kind of lags and becomes unresponsive to frequent play/pause commands, or gets audio desync.

Logic suggests that I ought to use DXVA2 display, but it's OpenGL that works for me best (and only if openGL for GUI is also enabled).
I do get about 35-40% GPU usage though, which is pretty high as compared to video players like MPC-HC / VLC which cause like 5% usage (yes I understand those use native DXVA2).
Makes me think less powerful cards might struggle.
By the way, when I tried recent nightly build, it actually performed worse, so I went back to the public release.

There's not much of a question here, just wanted to share. Maybe someone weighs in what they think.
I've searched the forum on the topic already, and I fully understand that full on-card HW acceleration isn't implemented yet.

eumagga0x2a

Thank you for the insight, which greatly contradicts what I experienced during testing on Windows (which doesn't happen too often). On my hardware, DXVA2 display beats OpenGL by a large margin. It looks also like OpenGL performance has become even worse with newer Qt (BTW, the wording in the Preferences is misleading, OpenGL is either enabled or disabled, it is tied to a hidden Qt widget where all the probing happens, but apart from that no GUI elements use OpenGL).

However, I have very few 4k 60fps samples and I can't play them on my hw in realtime anyway.

Regarding washed-out colors, what colorspace and bit depth do your source videos have?

Comparably high CPU load with DXVA2 decoding originates from downloading large amounts of data from the graphics card's hw video decoder to the main memory and re-uploading it for display.

QuoteBy the way, when I tried recent nightly build, it actually performed worse, so I went back to the public release.

A VC++ (vsWin64) or a MinGW (win64) one?

Urik

Quote from: eumagga0x2a on December 01, 2019, 05:43:20 PMthe wording in the Preferences is misleading, OpenGL is either enabled or disabled
now that makes sense!

Quote
Regarding washed-out colors, what colorspace and bit depth do your source videos have?
It happens with all sorts of videos, but all of them are basic yuv420 8bit bt709.
The 4k 60p is Nvidia Shadowplay/Share footage, which is actually tagged as bt609, but it's just one of weird quirks of Nvidia Experience file writer, so it's unrelated.

Here's a few screens:
https://imgur.com/a/3bowube

Basically, there's a few examples from different sources. DXVA2 output looks pixelated + gamma lift. And with Nvidia's 4k files, it stops responding to frequent spacebar play/pause commands.

Quote
QuoteBy the way, when I tried recent nightly build, it actually performed worse, so I went back to the public release.
A VC++ (vsWin64) or a MinGW (win64) one?
Ah yes, (I went back to) the official Avidemux_2.7.5VC++64bits.exe installer from sourceforge / fosshub link. The nightly that I briefly tried (and that performed worse for me) was Avidemux_2.7.5 VC++ 64bits 191129.exe from nightly/vsWin64/

eumagga0x2a

Quote from: Urik on December 01, 2019, 10:41:28 PM
Here's a few screens:
https://imgur.com/a/3bowube

Looks as if MPEG color range were not stretched in Avidemux (it *is* stretched with VDPAU on Linux).

QuoteAnd with Nvidia's 4k files, it stops responding to frequent spacebar play/pause commands.

I wonder whether this was fixed post-release. Maybe false memories. If it stops responding, most likely the focus has moved to an input field in the GUI, hitting the tab key might help.

Could you please for the sake of completeness try the MinGW build once? The last VC++ nightlies were built with Visual Studio 2019, the release with 2015. The whole MSVC build environment has been refreshed. This is the only reason for performance gains or losses with VC++ builds I can think of.

Urik

So I've installed avidemux_2.7.5 r191130_win64.exe from nightly/win64/ and the performance is fine with it (as good as the VC++ public one). Also, the following also applies to this release as well.

Regarding play/pause on 4K files with DXVA2 display, it's genuine lag, and it gets worse on frequent play/pause commands.

I think I figured out what I referred to as "aliasing" with DXVA2 earlier: it just renders quarter resolution inside preview window. So with Avidemux default window being quarter size of the desktop, and on my 3840x2160 screen, it basically displays 540p in 1080p preview window.

I don't think it's exclusive to 4K screen though, I've tested on my laptop, and there's lower detail with DXVA2 output vs OpenGL too.
But then, laptop can't really playback 4K that well even in MPC-HC (it's a dual-core i5 with GTX 850m) so I didn't test much there.

Another issue is audio sync (on 4K + DXVA2 or SDL), it progressively runs ahead the longer I play the video.
It doesn't happen on 1080p files though, and I've tried multiple types of videos to verify.
Maybe it has to do with playback, maybe instead of dropping frames when behind, it tries to play them all, and that lets audio to run ahead.

I also tried the (new to MinGW for me) SDL option too (with different sdl drivers, all the same). The performance seems to be ok, but audio run-ahead happens like on DXVA2.
The preview is in correct resolution, but there's noticeable aliasing.
I think what it does is if there's downscaling involved, it throws away unneeded pixels instead of
interpolating/downsampling, and that causes aliasing.

Here's comparison https://imgur.com/a/QzYKhqc
It's a HUD element of videogame, which shows the difference quite clearly.

In the end, I'm sticking to OpenGL for now, mostly because of accurate gamma, pixel sampling & audio sync. It's still not full 60fps playback on 4K, more like 30-45 (it's hard to verify for sure), but it's enough for a quick edit.


I guess the performance/results of these settings differ a lot depending on hardware people use.


I've used VirtualDub2 (aka VirtualDubFilterMod, fork of VirtualDub) before, and there, I had to check "Use Direct3D 11" checkbox to be able to play 4K smoothly. It was probably somewhere around 50-55fps in full res and 55-60fps at 50% scaling. But in its current state, it's limited, it can't even stream copy export mp4 edits (can only export selection). And it doesn't have intelligent i-frame cutting like avidemux, so I had to always specifically select last p-frame for a cut. Way less convenient.

eumagga0x2a

Quote from: Urik on December 02, 2019, 02:36:33 PM
So I've installed avidemux_2.7.5 r191130_win64.exe from nightly/win64/ and the performance is fine with it (as good as the VC++ public one).

Thanks for testing.

QuoteI think I figured out what I referred to as "aliasing" with DXVA2 earlier: it just renders quarter resolution inside preview window. So with Avidemux default window being quarter size of the desktop, and on my 3840x2160 screen, it basically displays 540p in 1080p preview window.

Probably due to automatical HiDPI scaling by Qt (setting env var QT_AUTO_SCREEN_SCALE_FACTOR to 0 would disable it, but Avidemux GUI might be hardly usable then).

QuoteAnother issue is audio sync (on 4K + DXVA2 or SDL), it progressively runs ahead the longer I play the video.
It doesn't happen on 1080p files though, and I've tried multiple types of videos to verify.
Maybe it has to do with playback, maybe instead of dropping frames when behind, it tries to play them all, and that lets audio to run ahead.

Yes, exactly that.

blob2500

For me the defaults seem to work best, for general use:

Display:  DXVA2
HW acceleration: no settings

However if I switch display to QT(and restart Avidemux surely), I get a slight faster encoding speed with x264.

Example:
Sample avisynth video in input (I use vsWin64 version).

-> DXVA2
Encoding time: 4' 15''

-> Qt (and OpenGL display in filter preview: disabled)
Encoding time: 4' 08''

But with Qt, in Avidemux video preview the audio is no longer in sync (a few seconds). :(

eumagga0x2a

blob2500, please don't dig out almost four years old topics without substantial reasons.

Quote from: blob2500 on October 05, 2023, 11:50:34 AMHowever if I switch display to QT(and restart Avidemux surely), I get a slight faster encoding speed with x264.

Fact check: the video renderer in Avidemux has no effect on encoder performance whatsoever as the renderer is not active (not redrawing the picture) while encoding is in progress. I have extra created local builds (Qt5) with additional debug messages to verify this with different renderers on Linux and on Windows.

Quote from: blob2500 on October 05, 2023, 11:50:34 AMBut with Qt, in Avidemux video preview the audio is no longer in sync (a few seconds).

As the unaccelerated "Qt" renderer doesn't invoke any hardware acceleration, depending on video and display resolutions and on fps, even very fast CPUs can be unable to perform necessary scaling and colorspace conversion computations in real time. This is also why the "Qt" renderer should be used only if anything else crashes.

Quote from: blob2500 on October 05, 2023, 11:50:34 AMHW acceleration: no settings

Well, when handling a 8k AV1 source, hw accelerated decoder capable to decode AV1 is quite handy. It allows me to play such a video (30 fps) in real time using the official VC++ build at below 10% CPU load.

Of course, when using AviSynth, Avidemux doesn't perform any decoding.

blob2500

Quote from: eumagga0x2a on October 05, 2023, 01:48:26 PMblob2500, please don't dig out almost four years old topics without substantial reasons.
Sorry, I'll be more careful.

QuoteFact check: the video renderer in Avidemux has no effect on encoder performance whatsoever as the renderer is not active (not redrawing the picture) while encoding is in progress. I have extra created local builds (Qt5) with additional debug messages to verify this with different renderers on Linux and on Windows.

I understand, it's strange. I tested it with 2 PCs several times. Same Avisynth version, same dlls.
With the other (better performing) PC, the difference is smaller, but it is still present:

DXVA2 -> 3' 16''
Qt -> 3' 13''


blob2500

If it helps, these are the logs:

Encoding time: 4'07'' (Qt)
You cannot view this attachment.

Encoding time: 4'13'' (DXVA2)
You cannot view this attachment.