Daniel2 - world's fastest video codec

Daniel2 Codec

Jan Weigner, CTO

The performance secret of the Daniel2 codec is being architected and developed from ground up to be GPU based. All legacy dependencies and old codec architectures were left behind. This of course means that Daniel2 is only compatible to itself. But the benefits of this approach are enormous and outweigh the drawbacks by far.

The Daniel2 codec is acquisition and production codec meant to be used for recording from camera sources, editing and post-production as well as playout. Daniel2 is aiming for the same space in the production workflow as AVID’s DNxHR, Apple’s ProRes, JPEG2000 or Sony’s XAVC.

Bottleneck Circumvention

A problem one faces when designing 4K, 8K or soon 16K systems that need to handle multiple streams and that need to manipulate them in realtime, is that even if you could decode the streams using the CPU then you would probably still want to use the power of the GPU for effects and filters. Now you face the bottleneck of the system bus to transfer the decoded streams into the GPU’s memory. This is where Daniel2 shines as streams a fraction of the size of their uncompressed counterparts are read from disk or via the network and passed to the GPU to be decompressed faster than the uncompressed frames can be copied. So three wins at once – less bandwidth of the system bus being used, less space or bandwidth consumed on disk or network and the CPU is free to do other tasks as it does not need to decode the streams anymore.

The Daniel2 GPU based codec is inevitable. There is no way around it. The advantages are so compelling making this a no-brainer. It is a win, win, win scenario!

Need For Speed

The following chart illustrates the speed advantage that Daniel2 has over Apple's ProRes and AVID's DNxHR. Neither of them has the smallest fighting chance to come anywhere near the 8K decoding performance of Daniel2.

Of course this is an unfair comparison as the Core i7-6700 is only a four core processor. But even when spending an additional six to eight thousand dollars more on a very fast dual Intel Xeon processor machine, the speed of either CPU based codecs is not going to reach a quarter of the GPU based Daniel2 codec which utilizes a $500 graphics card to achieve all this and to offload the CPU. In case of the CPU based codecs the processor cores would be fully loaded leaving no processing power for other tasks. With the Daniel2 codec in the above benchmark example the quad core Intel processor did not get loaded above 50%.

If we are in the lucky situation to have the GPU itself as the video source, e.g. from 3D rendered output or GPU processed video, then the encoding speed is encumbered from PCIe bus bottlenecks. As the numbers above show, with the newest NVIDIA RTX 2080 Ti one can encode up to 460 fps of 8K 10-bit 4:2:2 video per second, at only 77% GPU load, which shows that there is room for optimization for the new Turing GPUs. At 16K resolution the same RTX 2080 Ti can currently encode 105 fps in 4:2:2 10bit. When feeding the GPU with video data via the PCIe 3.0 x16 bus, e.g. from a number of 4K or 8K cameras, the number of 8K frames that can be encoded will not exceed ~150 fps as the PCIe bus bandwidth limit is being reached. This means a RTX 2080 Ti will not be any faster than a GTX 1060 when being used for Daniel2 video encoding with the video data coming via the PCIe bus. On the other hand the GTX1060 will be almost fully loaded doing the encoding while the RTX 2080 Ti hovers at around 35% GPU load leaving loads of untapped GPU power to perform other tasks as well.

Daniel2 vs Other Codecs APSNR Comparison

Speed is nice, so is quality. With Daniel2 one does not have to sacrifice one for the other.
In the chart above shows the quality measured by APSNR in dB of Daniel2 and commonly used other production codecs such as Apple ProRes HQ, AVID DNxHR HQX, XAVC480 and XAVC300. The source was uncompressed UHD Quicktime YUV 10bit encoded to all target formats by Adobe Premiere CC 2018 and Apple FCP X. As one can see Daniel2 with encode settings CQ90 (constant quality = 90) exceeds all other codecs by far with an average of 4 dB higher than best other codec.

Daniel2 vs Other Codecs CPU UHD 4:2:2 10bit FPS Encode

GPU related speed is nice, but Daniel2 is also very fast when it comes to CPU based encoding. The above chart shows the encode speed of Daniel2 and the other codecs using a < $150 Intel Core i3-8100 entry-level quad-core processor. It performs very well in UHD and is almost fast enough to do 8K @ 60fps encoding as well - using Daniel2. The numbers for the other codecs are 7 - 12 times lower. The XAVC300 and XAVC480 fps performance numbers were achieved using Cinegy's highly optimized Cinecoder SDK. Performance results using other XAVC codecs may vary.
In any case using Daniel2 one can build a broadcast quality quad UHD @ 60fos or 8K @ 60 fps ingest server using a single Core i7 processor. With a 28 core Intel Xeon processor or an AMD Epyc 32 core processor one could capture how many 8K @ 60 fps streams? More than four. With a dual Xeon or dual Epyc? Do the math.

The above chart shows the different codec and their bitrates in relation to their image quality measured in dB using APSNR. The winner with lowest bitrate per dB is XAVC300, but it also has the lowest average APSNR value of 41 dB, which for many applications is too low. XAVC480 is the second best, but already consumes a more than 50% higher bitrate to achieve a 3 dB improvement to 44 dB APSNR. The two next codecs that have almost the same characteristics are AVID DNxHR HQX and Apple ProRes HQ with around 48 dB APSNR and almost identical bitrate. Daniel2 with a setting of Constant Quality=70 has also 48 dB APSNR but creates almost 20% bigger bitrates to achieve this. With even higher quality settings CQ=80 and CQ=90 the Daniel2 codec achieves even better APSNR quality results of 51 dB and 53 dB respectively, but that comes at the price of even higher bitrates.
Of course these results can vary with the source material used, but should still be representative for most TV and post work. The beauty of using the Daniel2 codec in Constant Quality (CQ) mode is, that it will always try to maintain the selected quality level, which results in variable bitrate output consuming less space or network bandwidth when the image is less complex and noise free. The Daniel2 codec also has a lot more headroom for higher quality production than all other codecs tested which comes, of course, at the price of higher bitrates. The Daniel2 encode and decode speed on the other hand will remain almost unaffected by the different quality settings and is by far the fastest codec of them all.

Daniel2 Codec Tech Specs

From 4:2:2 to 4:4:4:4 color space - YUV to RGBA
From 8 to 12 bit as well a 16bit color depth
Extremely low decoding latency (< 1 ms for 8K)
Multi-generation re-compression without artefacts
Freely selectable compression factor
Variable bit rate, constant bit rate or constant quality encoding
Lossy or lossless encoding modes
Nvidia CUDA acceleration and CPU fallback for VM or cloud w/o GPU
Available for Windows, Linux and Mac OS. 64 bit version only.
Intelligent alpha channel support for small file sizes
IP video - perfect codec for low-latency, high-quality compressed IP video transmission
Very high speed GPU decoder - faster than the PCIe bus and GPU RAM.

Daniel2 Bit Rates & Depths

With Daniel2 you have no fixed bit rate settings unlike with many other codecs that have fixed presets. The rates may be set at various levels, depending on the acceptable target PSNR (Picture Signal to Noise Ratio) value.

The easiest way to express it is that we are broadly comparable, bit-for-bit, with the Avid DNxHR codec at 8-bit and 10-bit settings and equivalent TV raster formats and rates.

Apple’s ProRes codec at HQ settings can achieve approximately a 20% bitrate saving for the same PSNR, but of course at much slower encoding / decoding speeds. Daniel2 is being improved and we are working on closing that gap while maintaining the massive speed advantage.

The other codecs in this space are from the H.264 ‘intra’ family, such as Panasonic AVC-Intra and Sony XAVC, both being I-frame-only. These generally beat Daniel2, as well as DNxHR and ProRes, achieving yet another 20-30% bitrate saving at the same PSNR quality.

At higher PSNR quality ranges the bitrate advantage of AVC-Intra or XAVC compared to Daniel2 vanishes and the Daniel2 codec becomes more efficient in terms of bitrate to dB ratio.

This operational "sweet spot" is specific to each codecs algorithm and also the reason why the respective codec vendors specify bitrate profiles for their codec for the best PSNR to bitrate ratio.

If quality is the main factor and not bitrate, then Daniel2 offers the far preferable approach of using the Constant Quality encode setting, which will vary the bitrate to maintain the same picture quality independent of the respective picture complexity. In the day and age of SSD storage this is the better approach but makes storage calculations more difficult as the resulting file sizes dependent on the nature of the video material to be encoded.

Daniel2 - Not for Internet Streaming

It is completely impossible to compare Daniel2 bitrates to any ‘delivery’ formats such as H.264 and HEVC. For H.264 or the newer H.265/HEVC, the more complex long GOP (long Group of Pictures) encoding is being used to massively reduce the data rates. The first picture of a group is used as reference of which the delta to the next pictures is being calculated, along with motion vectors, etc. to compensate for picture shifts. Other perceptual tricks like noise reduction and short-cuts like reducing the color precision to 4:2:0 help further saving bandwidth. The resulting low bitrate files or streams are perfect for Internet distribution or remote contribution, but generally not lend themselves to high-end post-production or edit workflows.

Comparing HEVC with Daniel2 is comparing apples to oranges. It is best to consider Daniel2 as part of the family of production codecs containing XAVC, DNxHR, ProRes, and others which all have comparable bitrates.

VBR vs. CBR

Using VBR (Variable Bit Rate) mode instead of CBR (Constant Bit Rate) can help create significantly smaller files – again depending on the video source material. Most other production codecs are CBR codecs which produces predictable file sizes, but when inspecting them in detail often contain a lot of “padding” – filling up the frames with e.g. zeros to reach the defined target bitrate. This is leads to laughable situations where a tiny little logo in the corner of a picture stored with alpha channel can be as big as a complex full frame image. It is advisable to use VBR when dealing with motion graphics and/or images or video with alpha channel. The bitrate savings can be very significant without any quality reduction.

Daniel2 Recommended Bitrates for 4:2:2 at 10-bit

1920x1080p29.97 @ 10-bit 4:2:2

• 120 mbps, visually lossless, not good enough for post work

• 160 mbps, visually lossless, some grain is lost but light post work is possible

• 200 mbps, visually lossless, most grain preserved, heavy post work is possible

3840x2160p59.94 @ 10-bit 4:2:2

• 720 mbps, visually lossless, not good enough for post work

• 960 mbps, visually lossless, some grain is lost but light post work is possible

• 1200 mbps, visually lossless, most grain preserved, heavy post work is possible

These are recommendations and results may vary based on video material used. As mentioned before another approach is choosing Constant Quality, but the resulting bit rate is highly dependent on the complexity of the source video material.

Daniel2 Recommended Constant Quality Settings e.g. for use with Adobe Premiere

• CQ set at 65 - visually lossless, not good enough for post work

• CQ set at 75 - visually lossless, some grain is lost but light post work is possible

• CQ set at 85 - visually lossless, most grain preserved, heavy post work is possible