Benefits of AV1 decoding with Ampere GPUs

It’s been a long time since I wrote my last blog. I was very busy with a private project that took all of my spare time during the last 1,5 years. But now it’s time to be more active here again.
I take the opportunity to show you a great benefit with our latest GPUs. We added AV1 decoding in our hardware decoder which is very useful especially for Youtube as they started to move videos with high resolution and videos with many clicks from VP9 to AV1. That said you need a lot of CPU resources to decode the video or you need a capable GPU or SoC that can do the job.

So let’s start 🙂

Requirements

-Ampere GPU (e. g. Nvidia A6000 or A40)
-Windows 10 OS with AV1 video extension plugin

Test Environment

I did my testing on a Win10 20H2 VM with Citrix VDA and A40-4Q vGPU profile. Details can be seen on the screenshots I’ve taken down below. At this point I also want to mention that it should work the same way on Server 2019 as example but I didn’t test it yet on Server OS.

Test scenario

Running a Youtube video in 4k and 8k that is AV1 encoded to see the impact on CPU load, framerate, smoothness…once with hardware decoder in action and once without hardware decoder.
I’m always using the same video and did a 2min or 1min playtime to see how it behaves.

4K testing

Initially I tested with 4k resolution as I know this works even without a capable GPU but wanted to show the real difference here which is CPU load.

So what do we see here? The left screenshot was taken with hardware decode in place. We can see in the “stats for nerds” popup from Youtube that AV1 decoding is used and the GPUProfiler log on the right shows utilization of CPU/GPU/FB/Encode and Decode. We see moderate decode of 20% and moderate CPU load of ~40%. Video playback is very smooth.
On the right screenshot we can see no decode activity and a much higher CPU load of ~80% (with 4vCPUs !!!). Playback is still smooth even with software decode although we already see much more dropped frames in the stats for nerds window.

8K testing

Now I wanted to see how it behaves when we switch the video resolution to 8K.

Again, on the left screenshot we can see the video playback with AV1 hardware decode, still moderate CPU load of ~40% and very smooth video playback with 0 dropped frames.
In comparison to that on the right screenshot with software encode we see very high (100%) CPU load and the video is stuttering extremely with more dropped frames than played frames.

Conclusion

What can we take out of this? Especially in a VDI environment where we want to get a decent user density it might happen that we see a CPU bottleneck soon when only a few users are playing a Youtube video in high res. without having a hardware decoder in place. As Youtube will move more and more videos from VP9 to AV1 it will be even more important in the near future to have the right decode capability for such VDI use cases. But even on a phyiscal workstation it is almost impossible to play such high res videos (8K) smoothly without having hardware decode working as we’re running out of CPU resources very quickly.


Total Page Visits: 69028

About the Author

Simon

GRID Solution Architect and owner of this Blog site.

Leave a Reply

Your email address will not be published. Required fields are marked *