nvidia move towards NVENC away from CUDA software-based NVCUVENC

Started by Jan Gruuthuse, March 29, 2015, 12:29:14 PM

Previous topic - Next topic

Jan Gruuthuse

#15
build fc163e9 not working: No NVENC capable devices found
[HandleAction]  ************ VIDEO_CODEC_CHANGED **************
  [HandleAction]  ************ SAVE_VIDEO **************
  [refresh]  [Vdpau]Rrefresh
  [renderCompleteRedrawRequest]  RedrawRequest
  [admSaver]  [Save] Encoder index=4
  [save]  Audio starting time 01:13:44,640
  [save]  [A_Save] Saving..
[VideoFilterBridge] Creating bridge from 4424 s to 4449 s
[seektoTime]  First frame of the new segment is a keyframe at 4424640ms
  [DecodePictureUpToIntra]   DecodeUpToInta 110798 ref:0
[edCache] Flush
[goToTimeVideo]  Seek done, in reference, gone to 4424640000 with segment start at 0
  [ADM_coreVideoEncoderFFmpeg]  It is probably field encoded, doubling increment
  [ADM_coreVideoEncoderFFmpeg]  [Lavcodec] Using a video encoder delay of 0 ms
  [ADM_ffNvEncEncoder]  [ffNvEncEncoder] Creating.
[ff] Time base 1/50
[adm_lavLogCallback]  [lavc] No NVENC capable devices found
[ff] Cannot open codec
[setup]  [ffMpeg] Setup failed

mean


Jan Gruuthuse

H264 (ff/nvidia)
----------------
display and HWaccel: vdpau:
Output Format on avi, dummy, flv, mmp4, mp4v2, mkv, mpeg-ps, video only
mpeg-ts: hangs
[HandleAction]  ************ SAVE_VIDEO **************
  [refresh]  [Vdpau]Rrefresh
  [renderCompleteRedrawRequest]  RedrawRequest
  [admSaver]  [Save] Encoder index=4
  [save]  Audio starting time 00:00:00,000
  [save]  [A_Save] Saving..
[VideoFilterBridge] Creating bridge from 0 s to 10 s
[convertLinearTimeToSeg]  Frame time=0, taking first segment
  [goToTimeVideo]  Fixating start time to 400 ms
  [seektoTime]  First frame of the new segment is a keyframe at 400ms
  [DecodePictureUpToIntra]   DecodeUpToInta 0 ref:0
[edCache] Flush


Display and HWaccel: LIBVA
Output Format on avi, dummy, flv, mmp4, mp4v2, mkv, mpeg-ps, video only
mpeg-ts: signal 6
[HandleAction]  ************ SAVE_VIDEO **************
  [refresh]  [libva]Rrefresh
  [renderCompleteRedrawRequest]  RedrawRequest
  [admSaver]  [Save] Encoder index=4
  [save]  Audio starting time 00:00:00,000
  [save]  [A_Save] Saving..
[VideoFilterBridge] Creating bridge from 0 s to 10 s
[convertLinearTimeToSeg]  Frame time=0, taking first segment
  [goToTimeVideo]  Fixating start time to 400 ms
  [seektoTime]  First frame of the new segment is a keyframe at 400ms
  [DecodePictureUpToIntra]   DecodeUpToInta 0 ref:0
[edCache] Flush
[adm_lavLogCallback]  [lavc] mmco: unref short failure
  [goToTimeVideo]  Seek done, in reference, gone to 400000 with segment start at 0
  [ADM_coreVideoEncoderFFmpeg]  It is probably field encoded, doubling increment
  [ADM_coreVideoEncoderFFmpeg]  [Lavcodec] Using a video encoder delay of 0 ms
  [ADM_ffNvEncEncoder]  [ffNvEncEncoder] Creating.
[ff] Time base 1/50
[adm_lavLogCallback]  [lavc] No NVENC capable devices found
[ff] Cannot open codec
[setup]  [ffMpeg] Setup failed
*** Error in `/usr/bin/avidemux3_qt4': free(): invalid pointer: 0x0000000003cbbd70 ***


Display: XVideo HWaccel: none
Output Format on avi, dummy, flv, mmp4, mp4v2, mpeg-ps, video only
mkv: freezes:[HandleAction]  ************ SAVE_VIDEO **************
  [refresh]  XV:refresh
  [admSaver]  [Save] Encoder index=4
  [save]  Audio starting time 00:00:00,000
  [save]  [A_Save] Saving..
[VideoFilterBridge] Creating bridge from 0 s to 10 s
[convertLinearTimeToSeg]  Frame time=0, taking first segment
  [goToTimeVideo]  Fixating start time to 400 ms
  [seektoTime]  First frame of the new segment is a keyframe at 400ms
  [DecodePictureUpToIntra]   DecodeUpToInta 0 ref:0
[edCache] Flush
[adm_lavLogCallback]  [lavc] mmco: unref short failure
  [goToTimeVideo]  Seek done, in reference, gone to 400000 with segment start at 0
  [ADM_coreVideoEncoderFFmpeg]  It is probably field encoded, doubling increment
  [ADM_coreVideoEncoderFFmpeg]  [Lavcodec] Using a video encoder delay of 0 ms
  [ADM_ffNvEncEncoder]  [ffNvEncEncoder] Creating.
  [setupInternal]  Codec configured to use global header
[ff] Time base 1/50
[adm_lavLogCallback]  [lavc] No NVENC capable devices found
[ff] Cannot open codec
[setup]  [ffMpeg] Setup failed

mpeg-ts: signal 6
[HandleAction]  ************ SAVE_VIDEO **************
  [refresh]  XV:refresh
  [admSaver]  [Save] Encoder index=4
  [save]  Audio starting time 00:00:00,000
  [save]  [A_Save] Saving..
[VideoFilterBridge] Creating bridge from 0 s to 10 s
[convertLinearTimeToSeg]  Frame time=0, taking first segment
  [goToTimeVideo]  Fixating start time to 400 ms
  [seektoTime]  First frame of the new segment is a keyframe at 400ms
  [DecodePictureUpToIntra]   DecodeUpToInta 0 ref:0
[edCache] Flush
[adm_lavLogCallback]  [lavc] mmco: unref short failure
  [goToTimeVideo]  Seek done, in reference, gone to 400000 with segment start at 0
  [ADM_coreVideoEncoderFFmpeg]  It is probably field encoded, doubling increment
  [ADM_coreVideoEncoderFFmpeg]  [Lavcodec] Using a video encoder delay of 0 ms
  [ADM_ffNvEncEncoder]  [ffNvEncEncoder] Creating.
[ff] Time base 1/50
[adm_lavLogCallback]  [lavc] No NVENC capable devices found
[ff] Cannot open codec
[setup]  [ffMpeg] Setup failed
  [~ADM_ffNvEncEncoder]  [ffNvEncEncoder] Destroying.
  [stopThread]  Destroying threadQueue
  [~ADM_threadQueue]  Killing audio thread and son
  [refresh]  XV:refresh
  [refresh]  XV:refresh
*** Error in `/usr/bin/avidemux3_qt4': free(): invalid next size (fast): 0x00000000054ae910 ***


could there still be an issue:
I manually copied nvEncodeAPI.h into /usr/include/x86_64-linux-gnu
Is there a procedure to install nvenc correctly?

mean

It's trying to load :
libnvidia-encode.so.1
libcuda.so"

make sure you have them

Jan Gruuthuse

libcuda.so:
libnvidia-encode.so.1:
both are there:

Jan Gruuthuse

After building the samples in /usr/local/cuda/samples with make
found in /usr/local/cuda/samples/bin/x86_64/linux/release these tools: deviceQuery and deviceQueryDrv (in device.zip attached)
./deviceQuery
Quote./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 520"
  CUDA Driver Version / Runtime Version          7.0 / 6.5
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1023 MBytes (1072889856 bytes)
  ( 1) Multiprocessors, ( 48) CUDA Cores/MP:     48 CUDA Cores
  GPU Clock rate:                                1620 MHz (1.62 GHz)
  Memory Clock rate:                             535 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 65536 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GT 520
Result = PASS
./deviceQueryDrv
Quote./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 520"
  CUDA Driver Version:                           7.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1023 MBytes (1072889856 bytes)
  ( 1) Multiprocessors, ( 48) CUDA Cores/MP:     48 CUDA Cores
  GPU Clock rate:                                1620 MHz (1.62 GHz)
  Memory Clock rate:                             535 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 65536 bytes
  Max Texture Dimension Sizes                    1D=(65536) 2D=(65536, 65535) 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z):    (65535, 65535, 65535)
  Texture alignment:                             512 bytes
  Maximum memory pitch:                          2147483647 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS
rename device.zip to device.7z (is 7zipped)

Jan Gruuthuse

nvidia-smi
QuoteTue Apr  7 14:36:07 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.47     Driver Version: 346.47         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 520      Off  | 0000:01:00.0     N/A |                  N/A |
| 40%   33C    P0    N/A /  N/A |    762MiB /  1023MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0              C   Not Supported                                         |
+-----------------------------------------------------------------------------+

mean

I dont think your video card supports nvenc
I had a 610GT which is similar and it was not supported

Jan Gruuthuse

That would explain this.
So we are back @ minimum requirement would be an nvidia GPU GM10x or GM20x

Jan Gruuthuse

and working with
Quote[ADM_coreVideoEncoderFFmpeg]  [Lavcodec] Using a video encoder delay of 0 ms
  [ADM_ffNvEncEncoder]  [ffNvEncEncoder] Creating.
  [setupInternal]  Codec configured to use global header
[ff] Time base 1/50
[setup]  [ffMpeg] Setup ok
[StreamProcess] Stream 1920x1080, codec : H264
[StreamProcess] Average FPS1000=50000
[StreamProcess] Video Encoder Delay=0ms
[goToTime]   go to time 0,00 secs
  [convertLinearTimeToSeg]  Frame time=0, taking first segment
  [goToTime]  => seg 0, rel time 0,00 secs
Had to do some HW upgrades, replaced
- GT520 by GTX960 (required PCIe 3.0)
- motherboard MS-7680 (H67 chipset PCIe 2.0) by Z77 extrem3 (Z77 chipset PCIe 3.0)
switching to LGA 2011ââ,¬â€˜3 was too expensive, for now.