News:

--

Main Menu

DXVA2 experimental hw decoding

Started by mean, November 06, 2016, 10:41:54 AM

Previous topic - Next topic

mean

The latest nightlies contain experimental dxva2 hw decoding through libavcodec
Only h264/h265 **8** bits are supported

It is only partial, i.e. decoding is done on the video card but data are copied back to main memory for display, which is slow
As a result for seeking, it is way slower than VDPAU or LIBVA where everything happens on the video card

Preferences=> HW decoding => DXVA2

If your PC is fast enough, and the resolution is low, it might not be faster than doing it on software
For 2k H265, it is ~ 1.5 times faster than doing it in software

mean

2nd part (still experimental) is also done : D3D/DXVA display driver
It still has issues, refresh is not working correctly, it does not support card with restrictions on surface size ...
And it is not very fast

It is due to the fact that decoded video frames are copied back and forth between video card and main memory

The next step is to bridge them, like vdpau does, so the video stays as long as possible in the video card memory
It speeds up seeking a lot, and potentially later on directX resize

The naming is a bit confusing, it should be D3D
But to keep it the same as decoding (i.e. dxva2) it is set to DXVA2 too

Zooming is done in hw when it supports it

Rebuild in progress


EEMcGee

I don't know much about this but would it be good to have Avidemux look at the loaded video resolution and video card memory size then set this feature on or off accordingly?  I don't know if that would keep this feature off if it's going to be slower with it on.

dosdan

Perhaps on the Display tab, it would be worthwhile to include a simple benchmark Test button?

Dan.

mm0359

#4
@ mean,
Dxva2, init: I'm submitting a few PR, step by step...

NB: Please, leave debug output on for the time being...

mean

Normally, it will try first to do hw accel path, and fallback to software if it fails
If the failure happens later, you might need to disable it manually (and restart avidemux)

The build in progress is mostly ok, except :
* refresh is not working correctly. if you get a black frame, go left/right to force a  new display
* The bridge is not working, there is an extra copy

Meanwhile, i've spotted that the audio part is consuming a lot of  CPU on windows

i.e.
playing a 720p h264 video with all DXVA/DXVA, no sound => 4% cpu
playing a 720p h264 video with all DXVA/DXVA, with sound => 30% cpu => ????


mean

Stupid mistake, us vs ms
Now with dva2 + fixed sound, the cpu consumption is 5% playing a small h264 mkv instead of 30%

Much better
Win32 available, win64 in progress

mean

With CPU consumption fixed, some figures :

All done with 2k H265 video on a core i5 . It is a simple to decode video.

* Software display, software decoding : ~ 18% cpu
* Dxva2 display , Dxva2 decoding  : ~ 4% cpu

Not bad :)

EEMcGee

Thank You for all of the hard work you put in.

eumagga0x2a

Quote from: mean on November 14, 2016, 07:53:20 PM
* Software display, software decoding : ~ 18% cpu
* Dxva2 display , Dxva2 decoding  : ~ 4% cpu

These values are amazing, they are similar to the CPU load while playing a 720p h264 video in mpv with VDPAU on Linux in comparison with ~30% CPU load playing the same video with VDPAU in Avidemux on my hardware. Does this happen because Avidemux copies decoded images back and forth from the graphics card even if none of post-processing options or filters is enabled? Is keeping all the data for decoding and displaying with VDPAU in the graphics card memory off-limits?

Thank you for your hard work on Avidemux too.

mean

Normally no, vdpau keeps the video on the video card as long as possible (which dxva does not, i think i'll need to go D3D11 to do that)

It could be something else.
For example it was really bad on windows, due to the audio plugin that was gobbling all the resources it could.

Try to play without sound to see if the cpu consumptions go down
(i.e. remove the audio track)

The 5.1 => dolby filter is very demanding cpu wise

Jan Gruuthuse

Would suffice to just disable ac3 in Avidemux menu: Audio Select track. [ ] for ac3 track
Playing with AC3 one thread is showing 30%, without ac3 selected it would be around 11%
AC3 is indeed heavy on CPU with no dedicated hardware decoding it.

(just back, still need to catch up a lot of other stuff)

mean

Just did a quick test with full HD H264 video + libva on a core i17

Dolby + pulse audio => 23% CPU
Stereo + pulse audio => 12 % cpu
stereo + dummy audio => 6 % cpu

Points to an audio problem (difference between 2nd and 3rd should be very very small)

Jan Gruuthuse

downmixing setting related? no downmixing is improving here

eumagga0x2a

Quote from: mean on November 15, 2016, 10:22:44 AM
Just did a quick test with full HD H264 video + libva on a core i17

Dolby + pulse audio => 23% CPU
Stereo + pulse audio => 12 % cpu
stereo + dummy audio => 6 % cpu

Points to an audio problem (difference between 2nd and 3rd should be very very small)

Not on my hardware (AMD CPU + NVIDIA graphics card) with VDPAU. The CPU load is the same no matter which audio device is selected, with and without downmixing.