Você está na página 1de 6

GETTING USED TO OR GROWING ANNOYED: HOW PERCEPTION THRESHOLDS AND

ACCEPTANCE OF FRAME FREEZING VARY OVER TIME IN 3D VIDEO STREAMING


Pter A. Kara1, Werner Robitza2, Maria G. Martini1, Chaminda T.E.R. Hewage1, Fatima M. Felisberti1
1

Wireless Multimedia & Networking (WMN) Research Group, Kingston University, London, UK
{p.kara, m.martini, c.hewage, f.felisberti}@kingston.ac.uk
2

Telekom Innovation Laboratories, Deutsche Telekom AG, Berlin, Germany


werner.robitza@telekom.de

ABSTRACT
In recent years, HTTP video streaming has become a
dominant technology for multimedia content consumption.
Frame freezing occurring in the video stream is considered
one of the key factors affecting the perceptual visual quality
at the client side. In this paper we provide an analysis of
how frame freezing in 3D video streams is observed over
time, addressing the question whether the user starts
tolerating freezes or gets irritated by them. The amount of
Temporal Information of the video sequences is taken into
consideration, since significant differences between adjacent
frames can increase the detectability of freezing.
Index Terms Quality of Experience, perception of
quality, QoE over time, 3D video streaming, frame freezing
1. INTRODUCTION
Video streaming today is possibly the most popular way of
home entertainment. High-bandwidth mobile data services
allow the user to be mobile, in a sense that moving from one
place to another does not prevent the consumption of such
services. HTTP streaming is commonly used by over-the-top
service providers.
While with a UDP-like transport protocol the video
quality can be deteriorated by visual artifacts during servicelevel degradation, the greatest foe of the consumer of an
HTTP service is frame freezing or rebuffering events. Here
it is important to distinguish between indicated rebuffering
(e.g., with a spinning circle indicator, simply rebuffering
in the following) and frame freezing (i.e., one frame is
repeated until video playback starts again). Frame freezing is
the main focus of this study, since some playback systems
may indicate the rebuffering to the user, while others may
just freeze the frames. Frame freezing could also be an
indicator of performance issues with the playback system.
Initial loading times must be taken into consideration as
well, but, unsurprisingly, rebuffering during a high-motion

scene has been found to decrease the Quality of Experience


(QoE) more [1].
A quite important characteristic in video streaming is
the duration of the viewing session. While many forms of
streaming take a couple of minutes or fewer (e.g., music
videos), some take typically around 2030 minutes (e.g.,
episodes of a series), or can be an hour or even longer (e.g.,
movies). Regardless of the actual content and duration,
frame freezing can be irritating to most users. However, time
has its own effect on QoE e.g., the memory effect [2] or
how perceived quality itself changes over time [3] thus
tolerance towards rebuffering or frame freezing may change
as well.
While 2D usage scenarios have already been addressed
in the literature, to the knowledge of the authors, no
exhaustive study has focused on the subjective evaluation of
frame freezing for 3D multimedia services. Therefore we
conducted the experiments described in this contribution on
an autostereoscopic 3D display, and this study also considers
the impact of motion in the z direction (depth).
We address several research questions in this paper:
How does the frame freezing tolerance vary over time for
frames with different Temporal Information (TI) values?
What is the significance of motion in the scene on the 2D
plane and in the third dimension (backforth movement)
during frame freezing events? The paper also contains the
investigation of the perceptibility of frame freezing,
assessing whether test participants can detect video freezing
or not, regardless of their level of perceptual tolerance. The
results are analyzed and presented not only from the angle of
mean scores, but also from scoring distributions of
individual participants.
The paper is structured as follows: Section 2 introduces
relevant works, followed by the configuration of the carried
out measurements in Section 3. Results are detailed in
Section 4. We conclude the paper in Section 5, pointing out
possible continuations and further investigations of the area.

2. RELATED WORK
The variation of QoE over time is a relevant topic not only
in the consumption of everyday entertaining multimedia, but
also in fields related to medical media. The EU FP7
CONCERTO project proposed a novel cross-layer
architecture [4] to enable prompt diagnosis and operation in
medical emergency scenarios. The solution takes into
consideration the factor of time in perceived quality [3],
addressing the question of how QoE perception varies over a
temporal period.
De Pessemier et al. [5] give valuable insight into service
acceptability, based on rebuffering interruptions. In the
study, the number and the duration of rebuffering events
plus the initial loading time were taken into account. While
loading time was considered less critical by the test
participants, fluent playback was even more important than
video resolution or frame rate: users can tolerate up to 20
seconds total waiting time for stimuli lasting around 2530
minutes (14 videos, about 2 minutes each), but a waiting
duration beyond 1 minute is unacceptable.
Indeed, the initial loading time and interruptions caused
by rebuffering are both sources of dissatisfaction; choosing
between them would be like being caught between the devil
and the deep blue sea. The research questions of Hossfeld
et al. [1] address this dilemma. The results show that even
though both phenomena depend on the service while users
develop expectations regarding initial waiting time based on
the application stalling is considered unexpected and
sudden.
Along with initial buffering time and stalling duration,
the frequency of rebuffering events affects QoE. The
findings of Mok et al. [6] provide a mapping between QoS
and QoE for HTTP video streaming and also point out that
instead of the spatial artefacts, the temporal structure plays a
key role in modifying the quality of user experience of a
streaming service suffering frequent rebuffering due to
network conditions. The subjective results display a
significant drop in MOS during certain levels of rebuffering
frequency, depending on content type (sports game, news,
TV comedy show and music video) and thus temporal
information.
TI [7] is a measure of difference between adjacent
frames, and thus provides an estimation of the level of
motion in video contents. This estimation can be used to
further estimate the perceived video quality during network
performance degradation [8]. This method provides realtime QoE monitoring based on the data measured and
computed on the receiver side, but without relying on video
sequence reference as input. The paper emphasizes the
importance of TI (and also Spatial Information) in
perceptual quality estimation. In our paper, we focus only on
TI, and put a greater highlight on QoE over time.
It also needs to be noted that when frames freeze in a
stalling event, it can happen with or without frame skipping.

According to van Kester et al. [9], there is no significant


difference in the effect on QoE between the two cases. In
this study, frame freezing refers to frame halt, so no frames
are skipped.
3. RESEARCH MEASUREMENT SETUP
3.1. Environment and display
Our subjective evaluation of video quality and frame
freezing detection took place in a laboratory environment.
The stimuli were displayed on a 55 Philips glasses-free 3D
HD TV, running Dimenco player [10]. Participants were
viewing the display from a distance of 6H (6 times the
height of the display, which in this case was approximately 4
meters). No audio was present; the test measurements solely
focused on the assessment of video quality.
3.2. Stimuli
Two 3D HD source video stimuli provided by Dimenco
were selected as stimuli: Pinocchio and Motorcycle (18
and 13 seconds length, respectively, referred to as S1 and S2
later on). In both 30-FPS videos, three specific points were
selected for frame freezing based on TI, and all three points
had the same extent of freezing in the test cases. For
instance, we define a condition with 500 ms frame freezing
which is the equivalent to 15 repeated frames and then
apply it to S1 with three freezings (500 ms each) and S2
with three freezings (again 500 ms each). This means that
each condition had six specific points with the same amount
of frame freezing.
3.3. Rating scale
We wanted to obtain data on how the participants perceive
these frame freezing events. For this, we utilized the fivepoint Degradation Category Rating (DCR) scale [7], which
allows users to rate whether they find a degradation
imperceptible, perceptible, but not annoying, slightly
annoying, annoying and very annoying. The
participants had to verbally assign a value to each freezing
event. If frame freezing was not perceived to any extent, the
participants had to say 5. If it was detected but was not
considered annoying, it was a 4, and so on.
3.4. Perceptual threshold measurement
Before the actual test, we measured the frame freezing
detection threshold for S1 for each participant. The
participant was repeatedly shown a 5-second section of S1,
first containing no freezing at all, then including a single
point of frame freezing. The task was to identify the moment
of freezing. In case the participant could not notice anything,
the extent of frame repetition was increased by one frame

and the section was repeated until the point was clearly
detected. This was performed for all three points of S1.
3.5. Conditions
In order to perform the quality rating later on, the
participants had to be aware of the six points of frame
freezing. We indicated those positions before the assessment
task. One could argue that a randomized placement of
freezing points without subject knowledge would have
resulted in a more realistic test, but if the study had been
done this way, we could not have determined the impact of
the freezing location itself at least not without introducing
many more conditions, requiring more tests and a larger
subject pool.
The order of conditions (see Figure 1) included a
repeated pattern of five conditions (reported in white in
Figure 1). We primarily used this pattern to observe the
alteration of the frame freezing tolerance over time, since
subjects would see it twice during the test.

3.7. Test protocol


The total duration of the test per participant was around
twenty minutes. Since the quality scores were provided
orally, the participant did not need to look away from the
screen and there was no pause between videos. The scores
were recorded by laboratory personnel real-time on a
separate computer. No additional information was provided
to the participants on the video streams themselves, which
were played from a local drive. The term rebuffering was
used in the learning phase, suggesting actual video
streaming.
4. RESULTS
In total, 20 subjects participated in the research
measurements. The mean age of the participants was 24.
They were mostly university students and staff. 17 of them
had previously experienced 3D video services; for three of
them this was the first time.
4.1. Perceptual threshold measurement results

Fig. 1. Conditions in order of display. The white column groups


indicate the repeated pattern.

3.6. Selected rebuffering points

As specified in the research setup, the measurement


began by determining the capability of the participant to
detect frame freezing at the given points (see Figure 3),
using 5-second clips of S1. However, instead of immediately
increasing the frame repetition by one, the reference was
played three more times, but the participant was not aware of
this fact. We believe that due to cognitive dissonance [11]
during which the preconception stating that there should be
rebuffering dominates actual perception 6 out of 20
participants claimed to have clearly identified frame freezing
events despite there not being any.

While in S2 all three points of frame freezing had different


TI values (3.44, 6.98 and 15.22), we chose to have the first
two points of S1 to be the same in terms of TI (see Figure 2),
however, the prevailing direction of motion in the scene was
different; the second one had an up-down vertical movement
during the selected point, while the first one had a back-forth
movement in the 3D plane.

Fig. 3. Perceptual detection of frame freezing. Each point marks


the threshold for a test participant. Points are displayed in
increasing order.
Fig. 2. Temporal Information over time for sequence S1. The white
lines indicate the points of frame freezing.

The thresholds determined (see Figure 3) show an


apparent difference between how frame freezing was

detected in the first two points of S1; a greater sensitivity


can be seen for the back-forth movement. The correlations
between the threshold of the individual and the scores
provided are not detailed in this paper; the analysis is
provided in an article yet to be published. Such correlations
are suitable for modeling, e.g. as done in the metric of
Huynh-Thu et al. [12], which uses a fixed detection
threshold.

reference at the very beginning of the assessment task was


evaluated to have perceptible frame freezing. In fact, one of
these was slightly annoying and one was annoying. It is
also interesting to see that by a frame repetition of two the
number of imperceptible frame freezings drops to 75 out of
120, even though the data obtained on perceptual detections
of frame freezing suggests otherwise. Of course it needs to
be noted that in this case the participants knew where to look
for the freezing events.

4.2. Mean scores


The Mean Opinion Scores (MOS) of the measurements
(see Figure 4) suggest an increasing level of tolerance
towards frame freezing. Before we attend to that topic, let us
observe the perceptibility of frame freezing in the individual
conditions.
Fig. 6. Imperceptible frame freezings for different extents of frame
repetition.

Fig. 4. Mean Opinion Scores for different extents of frame


repetition.

Compared to the findings of van Kester et al. [8], which


claims 360 ms to be the threshold of acceptance, our results
indicate that the perceptual effect of frame freezing varies
between perceptible but not annoying and slightly
annoying for the interval running from 150 ms to 450 ms.

The distribution of these highest scores among the


different points of frame freezing (see Figure 7) also
reinforces what we have seen during the analysis of
thresholds (see Figure 3). Stalling at the second point of S1
was more difficult to detect than in the first one. Notably, the
first and second points of frame freezing in S2 show no
apparent difference, even though their TI values are quite
different. We find an explanation for this in the content: both
points have the same type and level of motion (a motorcycle
racer jumps through the air in uniform slow motion close to
the camera, with its wheels spinning), but at the second
point, the actor fills a bigger part of the screen. In this case,
the size of the actor did not play a role in the perception of
frame freezing.

4.3. Scoring distribution


If we just take a look at the distribution of quality
ratings (see Figure 5), we can see that 687 out of 3,000 DCR
scores (22.9%) were deemed to have undetectable stalling.
This means that on average a participant found 34 out of 150
points to be unnoticeable. Actually, the first 6 points were
indeed unnoticeable, since the first test case was a hidden
reference.

Fig. 5. Distribution of DCR scores.

As it can be seen on the distribution of the scores


denoting imperceptible frame freezings among the different
extents of frame repetition (see Figure 6), the hidden

Fig. 7. Distribution of imperceptible frame freezings.

Let us finally compare the scores of the two patterns


(see Figure 8), which are identical series of five conditions
(see Figure 1). The most apparent difference is the
significant rise in the number of perceptible but not
annoying scores. This does not only originate from the
reduction of annoyance over time (annoying scores drop
by more than half and very annoying scores almost
completely disappear), but also because freezing in the
selected points became easier to detect after observing the
same points several times. This alteration of assessment is
also present on the level of mean scores (see Figure 4).

Fig. 8. Distribution of DCR scores for the two patterns.

Fig. 10. Comparison of the evaluation of point 1 and 2 in S1.


Positive values indicate higher scores for the point 2, while
negative values favor point 1.

However, even though the changes in distribution


suggest acceptance of freezing, this does not apply to every
participant. The following section of the analysis introduces
the individual assessment of freezing events. Similarly to the
results on the perceptual detection of frame freezing (see
Figure 3), the upcoming figures contain the measurement
data for each test participant separately.
4.4. Individual assessments of frame freezing
According to the individual analysis of pattern
comparison (see Figure 9), 7 out of 20 test participants
actually gave lower scores to the second pattern. In terms of
difference magnitude, one participant had an average rise of
nearly 1.5, which is exceptionally high on a 5-point DCR
scale of evaluation, because it means that a pattern that was
first assessed to be somewhere between annoying and
slightly annoying was later considered not to be annoying
at all.

Fig. 11. Comparison of the evaluation of point 1 and 2 in S1 for


the two patterns. Positive values indicate higher scores for the
point 2, while negative values favor point 1.

We explain this phenomenon with the increase in the


perceptibility of the second point of frame freezing; as the
test measurement went on, the stalling of that point became
easier to detect. Due to this, the difference between its
scores and the mean decreased (see Figure 12).

Fig. 9. Comparison of the evaluation of the first and second


pattern. Positive values indicate higher scores for the second
pattern, while negative values favor the first pattern.

The results on perceptual detection of frame freezing


(see Figure 3) have shown that participants were more
sensitive to the first point of frame freezing in S1, which
should also be reflected in their ratings (see Figure 10). Yet
to six participants the stalling in the second point was easier
to tolerate on average, all test cases considered.
The most interesting phenomenon that we have come
across in the research was the alteration of the assessment of
the first and second point of frame freezing in S1 between
the two patterns (see Figure 11). In several cases, evaluation
changes to favor the first point more by time. The extents of
these changes are 1 for three participants and 0.8 for two.

Fig. 12. Comparison of the evaluation difference between point 2


of frame freezing in S1 and the mean score of S1 for the first (left)
and the second (right) pattern for each participant (up) and the
mean difference (down). Positive values indicate how higher the
score of point 2 was compared to the mean.

The changes in the perceptibility of frame freezing over


time is also well portrayed by the fact that during the first
pattern a total of 49 out of 100 (5 conditions and 20
participants) scores stated that the second point in S1 was
imperceptible, but for the second pattern, there were only
38.

The results show that even though frame freezing


tolerance may rise over time, the awareness towards frame
freezing might rise as well. This however may also have
been a result of the test design, in which subjects were aware
of the points of freezing. The obtained data is diverse in the
aspect of how people rate freezing events over time
depending on the source content, and further investigation of
the topic may enable the modeling of perceptibility for
occurrences of service waiting times for different real-time
applications. Modeling requires additional studies with a
variation of test factors (source content, conditions, etc.) and
a larger pool of test participants.
5. CONCLUSIONS
The paper has shown an analysis of the perception of frame
freezing in 3D video content and its variation over time.
Results indicate that there is no obvious direction of
behavior; while some people start tolerating freezing over
time, others grow annoyed of it. Roughly two thirds of the
test participants displayed increasing tolerance towards
frame freezing. We identified the average threshold for
frame freezing detection to be in the range of 102 and 312
milliseconds, depending on content. It was also revealed that
in such a kind of service consumption, freezing during back
and forth motion in the third dimension is more noticeable
and irritating than during vertical movement on the 2D
plane.
Further analysis will focus on the correlations between
perceptual detection of frame freezing and DCR assessment.
Since the subjective threshold measurement and one of the
major research questions were addressed by the first
stimulus, the results of the second stimulus were not
investigated individually in this paper, but shall also be
included in further analysis.
As a continuation of the topic, we will investigate
different actor sizes and motion levels in scenes, and their
effects on the perception of frame freezing. Temporal
Information can provide information on how much adjacent
frames change but, as shown in this paper, frame freezings
with the same TI can lead to different assessments while
others with different TI values can be perceived similarly.
ACKNOWLEDGMENT
The work in this paper was funded from the European
Unions Horizon 2020 research and innovation program
under the Marie Sklodowska-Curie grant agreement No
643072, Network QoE-Net.
REFERENCES
[1] T. Hossfeld, S. Egger, R. Schatz, M. Fiedler, K.
Masuch, C. Lorentzen, Initial delay vs. interruptions:
Between the devil and the deep blue sea, Fourth

International Workshop on Quality of Multimedia


Experience (QoMEX), Yarra Valley, Australia, 2012,
pp. 16.
[2] T. Hossfeld, S. Biedermann, R. Schatz, A. Platzer, S.
Egger, M. Fiedler, The memory effect and its
implications on Web QoE modeling, 23rd
International Teletraffic Congress (ITC), San Francisco,
2011, pp. 103110.
[3] M.G. Martini, C.T.E.R. Hewage, M.M. Nasralla, O.
Ognenoski, Chapter QoE Control, Monitoring and
Management Strategies, in Multimedia Quality of
Experience (QoE): Current Status and Future
Requirements, Wiley, 2014.
[4] M.G. Martini, L. Iacobelli, C. Bergeron, C.T.E.R.
Hewage, G. Panza, E. Piri, J. Vehkapera, P. Amon, M.
Mazzotti, K. Savino, L. Bokor, Real-time multimedia
communications in medical emergency - the
CONCERTO project solution, 37th Annual
International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), Milan, Italy,
2015, pp. 73247327.
[5] T. De Pessemier, K. De Moor, W. Joseph, L. De Marez,
Quantifying the Influence of Rebuffering Interruptions
on the User's Quality of Experience During Mobile
Video Watching, IEEE Transactions on Broadcasting,
2013, pp. 4761.
[6] R.K.P. Mok, E.W.W. Chan, R.K.C. Chang Measuring
the quality of experience of HTTP video streaming,
IFIP/IEEE International Symposium on Integrated
Network Management (IM), Dublin, 2011, pp. 485-492.
[7] ITU-T Rec. P.910: Subjective video quality
assessment methods for multimedia applications, 2008.
[8] P. de la Cruz Ramos, F.G. Vidal, R.P. Leal, Perceived
Video Quality Estimation from Spatial and Temporal
Information Contents and Network Performance
Parameters in IPTV, Fifth International Conference on
Digital Telecommunications (ICDT), Athens, Greece,
2010, pp. 128-131.
[9] S. van Kester, T. Xiao, R.E. Kooij, K. Brunnstrm,
O.K. Ahmed, Estimating the impact of single and
multiple freezes on video quality, IS&T/SPIE
Electronic Imaging, International Society for Optics
and Photonics, 2011.
[10] Official website of Dimenco
http://www.dimenco.eu/
[11] L. Festinger, A theory of cognitive dissonance,
Stanford, CA: Stanford University Press, 1957.
[12] Q. Huynh-Thu, M. Ghanbari, No-reference temporal
quality metric for video impaired by frame freezing
artefacts, 16th IEEE International Conference on
Image Processing (ICIP), 2009, pp. 22212224.

Você também pode gostar