Jan Weigner, CTO
The performance secret of the Daniel2 codec is being architected and developed from ground up to be GPU based. All legacy dependencies and old codec architectures were left behind. This of course means that Daniel2 is only compatible to itself. But the benefits of this approach are enormous and outweigh the drawbacks by far.
The Daniel2 codec is acquisition and production codec meant to be used for recording from camera sources, editing and post-production as well as playout. Daniel2 is aiming for the same space in the production workflow as AVID’s DNxHR, Apple’s ProRes, JPEG2000 or Sony’s XAVC.
A problem one faces when designing 4K, 8K or soon 16K systems that need to handle multiple streams and that need to manipulate them in realtime, is that even if you could decode the streams using the CPU then you would probably still want to use the power of the GPU for effects and filters. Now you face the bottleneck of the system bus to transfer the decoded streams into the GPU’s memory. This is where Daniel2 shines as streams a fraction of the size of their uncompressed counterparts are read from disk or via the network and passed to the GPU to be decompressed faster than the uncompressed frames can be copied. So three wins at once – less bandwidth of the system bus being used, less space or bandwidth consumed on disk or network and the CPU is free to do other tasks as it does not need to decode the streams anymore.
The Daniel2 GPU based codec is inevitable. There is no way around it. The advantages are so compelling making this a no-brainer. It is a win, win, win scenario!
Need For Speed
The following chart illustrates the speed advantage that Daniel2 has over Apple's ProRes and AVID's DNxHR. Neither of them has the smallest fighting chance to come anywhere near the 8K decoding performance of Daniel2.
Of course this is an unfair comparison as the Core i7-6700 is only a four core processor. But even when spending an additional six to eight thousand dollars more on a very fast dual Intel Xeon processor machine, the speed of either CPU based codecs is not going to reach a quarter of the GPU based Daniel2 codec which utilizes a $500 graphics card to achieve all this and to offload the CPU. In case of the CPU based codecs the processor cores would be fully loaded leaving no processing power for other tasks. With the Daniel2 codec in the above benchmark example the quad core Intel processor did not get loaded above 50%.
If we are in the lucky situation to have the GPU itself as the video source, e.g. from 3D rendered output or GPU processed video, then the encoding speed is encumbered from PCIe bus bottlenecks. As the numbers above show, with the newest NVIDIA RTX 2080 Ti one can encode up to 460 fps of 8K 10-bit 4:2:2 video per second, at only 77% GPU load, which shows that there is room for optimization for the new Turing GPUs. At 16K resolution the same RTX 2080 Ti can currently encode 105 fps in 4:2:2 10bit. When feeding the GPU with video data via the PCIe 3.0 x16 bus, e.g. from a number of 4K or 8K cameras, the number of 8K frames that can be encoded will not exceed ~150 fps as the PCIe bus bandwidth limit is being reached. This means a RTX 2080 Ti will not be any faster than a GTX 1060 when being used for Daniel2 video encoding with the video data coming via the PCIe bus. On the other hand the GTX1060 will be almost fully loaded doing the encoding while the RTX 2080 Ti hovers at around 35% GPU load leaving loads of untapped GPU power to perform other tasks as well.
Speed is nice, so is quality. With Daniel2 one does not have to sacrifice one for the other.
In the chart above shows the quality measured by APSNR in dB of Daniel2 and commonly used other production codecs such as Apple ProRes HQ, AVID DNxHR HQX, XAVC480 and XAVC300. The source was uncompressed UHD Quicktime YUV 10bit encoded to all target formats by Adobe Premiere CC 2018 and Apple FCP X. As one can see Daniel2 with encode settings CQ90 (constant quality = 90) exceeds all other codecs by far with an average of 4 dB higher than best other codec.
GPU related speed is nice, but Daniel2 is also very fast when it comes to CPU based encoding. The above chart shows the encode speed of Daniel2 and the other codecs using a < $150 Intel Core i3-8100 entry-level quad-core processor. It performs very well in UHD and is almost fast enough to do 8K @ 60fps encoding as well - using Daniel2. The numbers for the other codecs are 7 - 12 times lower. The XAVC300 and XAVC480 fps performance numbers were achieved using Cinegy's highly optimized Cinecoder SDK. Performance results using other XAVC codecs may vary.
In any case using Daniel2 one can build a broadcast quality quad UHD @ 60fos or 8K @ 60 fps ingest server using a single Core i7 processor. With a 28 core Intel Xeon processor or an AMD Epyc 32 core processor one could capture how many 8K @ 60 fps streams? More than four. With a dual Xeon or dual Epyc? Do the math.
The above chart shows the different codec and their bitrates in relation to their image quality measured in dB using APSNR. The winner with lowest bitrate per dB is XAVC300, but it also has the lowest average APSNR value of 41 dB, which for many applications is too low. XAVC480 is the second best, but already consumes a more than 50% higher bitrate to achieve a 3 dB improvement to 44 dB APSNR. The two next codecs that have almost the same characteristics are AVID DNxHR HQX and Apple ProRes HQ with around 48 dB APSNR and almost identical bitrate. Daniel2 with a setting of Constant Quality=70 has also 48 dB APSNR but creates almost 20% bigger bitrates to achieve this. With even higher quality settings CQ=80 and CQ=90 the Daniel2 codec achieves even better APSNR quality results of 51 dB and 53 dB respectively, but that comes at the price of even higher bitrates.
Of course these results can vary with the source material used, but should still be representative for most TV and post work. The beauty of using the Daniel2 codec in Constant Quality (CQ) mode is, that it will always try to maintain the selected quality level, which results in variable bitrate output consuming less space or network bandwidth when the image is less complex and noise free. The Daniel2 codec also has a lot more headroom for higher quality production than all other codecs tested which comes, of course, at the price of higher bitrates. The Daniel2 encode and decode speed on the other hand will remain almost unaffected by the different quality settings and is by far the fastest codec of them all.
Daniel2 Codec Tech Specs
- From 4:2:2 to 4:4:4:4 color space - YUV to RGBA
- From 8 to 12 bit as well a 16bit color depth
- Extremely low decoding latency (< 1 ms for 8K)
- Multi-generation re-compression without artefacts
- Freely selectable compression factor
- Variable bit rate, constant bit rate or constant quality encoding
- Lossy or lossless encoding modes
- Nvidia CUDA acceleration and CPU fallback for VM or cloud w/o GPU
- Available for Windows, Linux and Mac OS. 64 bit version only.
- Intelligent alpha channel support for small file sizes
- IP video - perfect codec for low-latency, high-quality compressed IP video transmission
- Very high speed GPU decoder - faster than the PCIe bus and GPU RAM.