Anime4k-WebGPU

Demo Link | GitHub Repo

WebGPU Accelerated Anime 4K harnesses WebGPU and GPU compute shaders to rapidly enhance anime-style graphics. Leveraging the power of modern GPUs, it executes complex deblurring, CNN-based upscaling, and denoising algorithms in real-time. As a result, each video frame is processed instantaneously, improving clarity, sharpness, and eliminating noise as it streams. This ensures a superior viewing experience with high-quality visuals delivered without interruption, ideal for anime fans and professionals seeking top-notch and efficient image fidelity.

Performance Analysis


Visualization Comparisons

Following comparisons are done with a 360p image as input. The image is from Anime4K repo.

1. Compare with Denoise & Deblur effects

After applying denoise effect, the image become smoother. The intensity sigma is set to 0.2 and the spatial sigma is set to 2. Increasing intensity sigma will make bilateral filter approximat Gaussian convolution, and increasing the spatial sigma will make the color smoother.

For the denoise effect, deblurring strength has been calibrated to a level of 7. As deblurring strength increasing, the effect of deblur is more distinct. This deblurring process enhances the video’s clarity by sharpening the image’s edges. However, it’s important to note that this enhancement may also inadvertently amplify aliasing effects.

Figure 1. Compare with Denoise & Deblur effect

2. Compare with restore CNN and restore GAN effects

Utilizing restoration models on the image enhances its clarity and sharpens its edges, yet it may also result in the introduction of some aliasing. The restore GAN model, being larger in size compared to the restore CNN model, excels in enhancing edge details. However, this comes with a trade-off as it tends to introduce a greater degree of aliasing. The restoration process will enhance the image’s clarity without altering its dimensions.

Figure 2. Compare with restore CNN and restore GAN

3. Compare with upscale CNN (x2) and upscale GAN (x3 & x4) and the Real-ESRGAN effects

Utilizing upscale CNN and upscale GAN techniques results in an increase in image size as these models aim to upscale the image, enhancing its resolution. Among these models, the upscale GAN x4 stands out as the most substantial, producing an output image that is four times larger in both width and height, yielding the most impressive results among the three upscale models. The Real-ESRGAN result is generated by webgpu-super-resolution. We have upgraded the project for compatibility with the latest WebGPU standards, allowing the original image to be processed correctly. Although Real-ESRGAN delivers the sharpest image, it is significantly slower—approximately 1000 times more so. Our project is focused on achieving real-time video upscaling, which requires rapid processing. Fortunately, the upscale quality is satisfactory and well-suited for real-time video applications.

Figure 3. Compare with upscale CNN(x2) and upscale GAN (x3 & x4) and Real-ESRGAN from

4. Compare with upscale GAN (x4) + Restore GAN and the Real-ESRGAN effects

Input (1x):
The original image at 200x133 pixels, serving as the starting point for upscaling.

Upscale GAN x4 + Restore GAN: Upscaling is achieved by a factor of 4 using a generative adversarial network (GAN), followed by a restoration process with another GAN. This method prioritizes speed and memory efficiency, operating 1000 times faster than Real-ESRGAN and with substantially lower memory requirements, at the cost of some visual quality.

Real-ESRGAN x4:
The Real-ESRGAN result is generated by webgpu-super-resolution. This image has been upscaled by a factor of 4 using the Real-ESRGAN technique. It offers enhanced visual quality that more closely approximates the ground truth but at a cost of slower runtime and higher memory usage, which may not be feasible for real-time applications.

Ground Truth (x5):
The reference standard with a resolution five times that of the original input, against which the other images are compared.

Conclusion:
The comparison highlights significant trade-offs between the speed of rendering, memory usage, and the visual quality of upscaled images. While the Upscale CNN method provides a rapid and memory-efficient solution suitable for real-time applications, the Real-ESRGAN approach delivers superior image quality, which may be necessary for applications where visual fidelity is paramount, albeit with greater resource requirements.

Figure 4. Compare with Upscale CNN x2 + Retore CNN & Real-ESRGAN effect

Run Time Comparisons(GPU)

We conducted comparisons using a video (Demo Video: Miss Kobayashi’s Dragon Maid) as our test input. The GPU processing time, measured in milliseconds per frame, was tracked using the GPU performance analysis tool in Chrome. This was achieved by recording a 10-second segment and then calculating the average GPU time per frame. This calculation was done by dividing the total GPU time over 10 seconds by the product of 10 seconds and the video’s frame rate.

1. GPU time for different effects in different graphic cards

The subsequent comparisons reveal that the GPU processing time for rendering a frame of texture is sufficiently swift for real-time video upscaling across all models with a 720p video input. While the RTX 4090 GPU exhibits a slightly quicker frame rendering time, the performance on the RTX 3070Ti is nearly on par, with both completing the task within 3 milliseconds. Looking ahead, as hardware technology continues to evolve, we anticipate that this project will greatly benefit from utilizing the cross-platform capabilities of WebGPU.

Figure 5. GPU time for different effects (720P)

2. GPU time for different effects with different resolution video inputs

Following comparisons are done with RTX 3070Ti graphic card.

With following plot, it becomes evident that each effect/rendering pipeline display a remarkable consistency in frame rendering time. This observation holds true irrespective of the input video size, indicating a well-optimized rendering process. Notably, the time required to render a single frame across these various effects is sufficiently brief, making it viable for real-time video application of all effects without significant delay.

However, an exception is observed with the upscale GAN x4 when applied to a 1080p video input. In this case, the rendering time significantly increases. This outlier can be attributed to the upscale GAN x4’s ambitious task of upscaling the video to an 8K resolution. Such a substantial increase in resolution demands an intense computation, explaining the prolonged rendering time in this particular scenario. This suggests that while the system is generally efficient for real-time applications, certain high-intensity tasks like extreme upscaling to 8K can still pose challenges in maintaining the same level of performance.

Figure 6. GPU time for different effects with different resolution videos

Project Info


This project is part of UPenn CIS5650 final project. Following are related resources:

Authors


(alphabetical order with equal contribution)

Reference


Multimedia Demonstrations