The advent of high-speed internet connections, advanced video coding algorithms, and consumer-grade computers with high computational capabilities has led video-streaming-over-the-internet to make up the majority of network traffic. This effect has led to a continuously expanding video streaming industry that seeks to offer enhanced quality-of-experience (QoE) to its users at the lowest cost possible. Video streaming services are now able to adapt to the hardware and network restrictions that each user faces and thus provide the best experience possible under those restrictions. The most common way to adapt to network bandwidth restrictions is to offer a video stream at the highest possible visual quality, for the maximum achievable bitrate under the network connection in use. This is achieved by storing various pre-encoded versions of the video content with different bitrate and visual quality settings. Visual quality is measured by means of objective quality metrics, such as the Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Visual Information Fidelity (VIF), and others, which can be easily computed analytically. Nevertheless, it is widely accepted that although these metrics provide an accurate estimate of the statistical quality degradation, they do not reflect the viewer's perception of visual quality accurately. As a result, the acquisition of user ratings in the form of Mean Opinion Scores (MOS) remains the most accurate depiction of human-perceived video quality, albeit very costly and time consuming, and thus cannot be practically employed by video streaming providers that have hundreds or thousands of videos in their catalogues. A recent very promising approach for addressing this limitation is the use of machine learning techniques in order to train models that represent human video quality perception more accurately. To this end, regression techniques are used in order to map objective quality metrics to human video quality ratings, acquired for a large number of diverse video sequences. Results have been very promising, with approaches like the Video Multimethod Assessment Fusion (VMAF) metric achieving higher correlations to user-acquired MOS ratings compared to traditional widely used objective quality metrics. In this chapter, we examine the performance of VMAF and its potential as a replacement for common objective video quality metrics.
|Title of host publication||AI for Emerging Verticals|
|Subtitle of host publication||Human-Robot Computing, Sensing and Networking|
|Editors||Muhammad Zeeshan Shakir, Naeem Ramzan|
|Publication status||Accepted/In press - 9 Feb 2020|
- machine learning
- perceptual video quality metrics
Katsigiannis, S., Rabah, H., & Ramzan, N. (Accepted/In press). A machine learning driven solution to the problem of perceptual video quality metrics. In M. Z. Shakir, & N. Ramzan (Eds.), AI for Emerging Verticals: Human-Robot Computing, Sensing and Networking IET.