Avidemux Forum

Avidemux => Windows => Topic started by: mean on November 06, 2016, 10:41:54 AM

Title: DXVA2 experimental hw decoding
Post by: mean on November 06, 2016, 10:41:54 AM
The latest nightlies contain experimental dxva2 hw decoding through libavcodec
Only h264/h265 **8** bits are supported

It is only partial, i.e. decoding is done on the video card but data are copied back to main memory for display, which is slow
As a result for seeking, it is way slower than VDPAU or LIBVA where everything happens on the video card

Preferences=> HW decoding => DXVA2

If your PC is fast enough, and the resolution is low, it might not be faster than doing it on software
For 2k H265, it is ~ 1.5 times faster than doing it in software
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 13, 2016, 10:41:03 AM
2nd part (still experimental) is also done : D3D/DXVA display driver
It still has issues, refresh is not working correctly, it does not support card with restrictions on surface size ...
And it is not very fast

It is due to the fact that decoded video frames are copied back and forth between video card and main memory

The next step is to bridge them, like vdpau does, so the video stays as long as possible in the video card memory
It speeds up seeking a lot, and potentially later on directX resize

The naming is a bit confusing, it should be D3D
But to keep it the same as decoding (i.e. dxva2) it is set to DXVA2 too

Zooming is done in hw when it supports it

Rebuild in progress

Title: Re: DXVA2 experimental hw decoding
Post by: EEMcGee on November 13, 2016, 06:20:59 PM
I don't know much about this but would it be good to have Avidemux look at the loaded video resolution and video card memory size then set this feature on or off accordingly?  I don't know if that would keep this feature off if it's going to be slower with it on.
Title: Re: DXVA2 experimental hw decoding
Post by: dosdan on November 13, 2016, 07:37:27 PM
Perhaps on the Display tab, it would be worthwhile to include a simple benchmark Test button?

Dan.
Title: Re: DXVA2 experimental hw decoding
Post by: mm0359 on November 14, 2016, 12:51:33 PM
@ mean,
Dxva2, init: I'm submitting a few PR, step by step...

NB: Please, leave debug output on for the time being...
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 14, 2016, 02:38:52 PM
Normally, it will try first to do hw accel path, and fallback to software if it fails
If the failure happens later, you might need to disable it manually (and restart avidemux)

The build in progress is mostly ok, except :
* refresh is not working correctly. if you get a black frame, go left/right to force a  new display
* The bridge is not working, there is an extra copy

Meanwhile, i've spotted that the audio part is consuming a lot of  CPU on windows

i.e.
playing a 720p h264 video with all DXVA/DXVA, no sound => 4% cpu
playing a 720p h264 video with all DXVA/DXVA, with sound => 30% cpu => ????

Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 14, 2016, 06:38:13 PM
Stupid mistake, us vs ms
Now with dva2 + fixed sound, the cpu consumption is 5% playing a small h264 mkv instead of 30%

Much better
Win32 available, win64 in progress
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 14, 2016, 07:53:20 PM
With CPU consumption fixed, some figures :

All done with 2k H265 video on a core i5 . It is a simple to decode video.

* Software display, software decoding : ~ 18% cpu
* Dxva2 display , Dxva2 decoding  : ~ 4% cpu

Not bad :)
Title: Re: DXVA2 experimental hw decoding
Post by: EEMcGee on November 15, 2016, 01:49:50 AM
Thank You for all of the hard work you put in.
Title: Re: DXVA2 experimental hw decoding
Post by: eumagga0x2a on November 15, 2016, 10:04:09 AM
Quote from: mean on November 14, 2016, 07:53:20 PM
* Software display, software decoding : ~ 18% cpu
* Dxva2 display , Dxva2 decoding  : ~ 4% cpu

These values are amazing, they are similar to the CPU load while playing a 720p h264 video in mpv with VDPAU on Linux in comparison with ~30% CPU load playing the same video with VDPAU in Avidemux on my hardware. Does this happen because Avidemux copies decoded images back and forth from the graphics card even if none of post-processing options or filters is enabled? Is keeping all the data for decoding and displaying with VDPAU in the graphics card memory off-limits?

Thank you for your hard work on Avidemux too.
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 15, 2016, 10:13:27 AM
Normally no, vdpau keeps the video on the video card as long as possible (which dxva does not, i think i'll need to go D3D11 to do that)

It could be something else.
For example it was really bad on windows, due to the audio plugin that was gobbling all the resources it could.

Try to play without sound to see if the cpu consumptions go down
(i.e. remove the audio track)

The 5.1 => dolby filter is very demanding cpu wise
Title: Re: DXVA2 experimental hw decoding
Post by: Jan Gruuthuse on November 15, 2016, 10:21:35 AM
Would suffice to just disable ac3 in Avidemux menu: Audio Select track. [ ] for ac3 track
Playing with AC3 one thread is showing 30%, without ac3 selected it would be around 11%
AC3 is indeed heavy on CPU with no dedicated hardware decoding it.

(just back, still need to catch up a lot of other stuff)
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 15, 2016, 10:22:44 AM
Just did a quick test with full HD H264 video + libva on a core i17

Dolby + pulse audio => 23% CPU
Stereo + pulse audio => 12 % cpu
stereo + dummy audio => 6 % cpu

Points to an audio problem (difference between 2nd and 3rd should be very very small)
Title: Re: DXVA2 experimental hw decoding
Post by: Jan Gruuthuse on November 15, 2016, 10:25:11 AM
downmixing setting related? no downmixing is improving here
Title: Re: DXVA2 experimental hw decoding
Post by: eumagga0x2a on November 15, 2016, 10:31:03 AM
Quote from: mean on November 15, 2016, 10:22:44 AM
Just did a quick test with full HD H264 video + libva on a core i17

Dolby + pulse audio => 23% CPU
Stereo + pulse audio => 12 % cpu
stereo + dummy audio => 6 % cpu

Points to an audio problem (difference between 2nd and 3rd should be very very small)

Not on my hardware (AMD CPU + NVIDIA graphics card) with VDPAU. The CPU load is the same no matter which audio device is selected, with and without downmixing.
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 15, 2016, 11:00:05 AM
and with libva ?
(you need the libva / vdpau wrapper installed)
Title: Re: DXVA2 experimental hw decoding
Post by: eumagga0x2a on November 15, 2016, 11:28:11 AM
Done: the same CPU load (~28% ââ,¬â€œ ~30%) for 720p h264 25fps videos. Both decoder and display driver are set to LIBVA, which requires disabling VDPAU in the preference settings to prevent it from taking over.
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 15, 2016, 01:45:51 PM
Coud it be the window manager messing up with  the performances again ?
Title: Re: DXVA2 experimental hw decoding
Post by: Jan Gruuthuse on November 15, 2016, 01:58:34 PM
Hard to tell: libva seems to spread the load more evenly over the threads.
Title: Re: DXVA2 experimental hw decoding
Post by: Jan Gruuthuse on November 15, 2016, 02:05:30 PM
Website is very slow responding.
"Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later."
Under attack?
Title: Re: DXVA2 experimental hw decoding
Post by: eumagga0x2a on November 15, 2016, 07:49:47 PM
Quote from: mean on November 15, 2016, 01:45:51 PM
Coud it be the window manager messing up with  the performances again ?

Unlikely, because turning hwaccel off adds only ~10% to the CPU load (it grows to ~40%) and dedicated media players like MPlayer and mpv consume only ~5% of CPU when playing 720p h264 videos (in a window, not fullscreen) via VDPAU in the same gnome-shell setup.
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 16, 2016, 06:49:24 PM
Redraw problem should be fixed
Just the mm patch to look at and it is ready for testing
Title: Re: DXVA2 experimental hw decoding
Post by: mm0359 on November 17, 2016, 03:45:22 PM
Quote from: mean on November 16, 2016, 06:49:24 PM
Just the mm patch to look at and it is ready for testing

After the current PR, I'll do at least 1 more patch wrt Dxva init...

Fttb, (init succeeds then) decoding doesn't seem to succeed on my computer:
Quote
  Avidemux v2.6.14 (161115_eb280212ed3) .

Operating System: Microsoft Windows Vista Home Premium Service Pack 2 (6.0.6002; 32-bit)

[uncompress] [DXVA] --No picture 
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 17, 2016, 04:41:10 PM
It's maybe normal
There is a delay of a few frames at the beginning
Title: Re: DXVA2 experimental hw decoding
Post by: mm0359 on November 18, 2016, 01:56:56 PM
Quote from: mean on November 17, 2016, 04:41:10 PM
It's maybe normal
There is a delay of a few frames at the beginning

Agreed: on second look, decoder seems to (then) succeed.
Quote
Surface to admImage = 095232C0
Retrieving image pitch=768 width=720 height=576
Align 576,16 => 576
Paint event

Yet, while Audio was fine, Video seemed as if (very) slowed down, and cpu may have been (much) higher than without Dxva.
But I'll try it again, with a newer build...

Decoder and Renderer can be used independently, can't they?
(Maybe we could had a few (raw) statistics: decoded frames, displayed frames, ...?)
Title: Re: DXVA2 experimental hw decoding
Post by: mean on November 18, 2016, 05:51:00 PM
with the current D3D9 they are always independant, with an extra copy as consequence
Title: Re: DXVA2 experimental hw decoding
Post by: harrym on December 21, 2016, 05:06:04 PM
And what NVRESIZE? Supported?
Title: Re: DXVA2 experimental hw decoding
Post by: mean on December 21, 2016, 07:46:22 PM
on linux , with vdpau