Você está na página 1de 4

EFFICIENT CONTEXT ADAPTIVE ENTROPY CODING FOR REAL-TIME APPLICATIONS Guillaume Fuchs, Vignesh Subbaraman and Markus Multrus

Fraunhofer Institut Integrierte Schaltungen Am Wolfsmantel 33, 91058, Erlangen, Germany email: amm-info@iis.fraunhofer.de
ABSTRACT Context based entropy coding has the potential to provide higher gain over memoryless entropy coding. However serious difculties arise regarding the practical implementation in real-time applications due to its very high memory requirements. This paper presents an efcient method for designing context adaptive entropy coding while fullling low memory requirements. From a study of coding gain scalability as a function of context size, new context design and validation procedures are derived. Further, supervised clustering and mapping optimization are introduced to model efciently the context. The resulting context modelling associated with an arithmetic coder was successfully implemented in a transform-based audio coder for real-time processing. It shows signicant improvement over the entropy coding used in MPEG-4 AAC. Index Terms Entropy coding, audio coding, context modelling 1. INTRODUCTION Entropy coding is a lossless data compression scheme which plays an essential role in the chain of processing of many signal coding schemes. For example, the audio compression standard MPEG-4 AAC [1] uses such a principle on its quantized spectral coefcients in the form of Huffman coding. As entropy coding is the nal stage of the encoder, its improvements are directly transposed into a reduction of the total bit-rate. However, the upper performance bound of entropy coding is well denied for xed tables of variable length codes. It is given by the entropy of the source and can only be surpassed by increasing the order of the entropy coding [2]. A higher order entropy can be exploited by going from a memoryless source model to a model including a memory. This can be done by taking the context of every random event into account. Such a principle was already exploited in combination with arithmetic coding in modern video coders [3]. In audio coding, an approach was already proposed in [4] and the upcoming MPEG standard USAC employs such a technology as well [5]. Although context adaptive entropy coding can reach a high coding gain, its implementation usually demands heavy computational and memory requirements. This makes its usage in real-time applications difcult and sometimes unfeasible [6]. In the approach introduced in [4], the problem of high memory requirement was already outlined and a solution of context state reduction was proposed. However, after implementing such a solution in a transformbased audio coder, we observed that the system requires approximately 68 kbytes of memory for reaching the desirable coding gain. Such a requirement prohibits the technology especially in embedded systems. In the present work, advanced methods and optimizations are proposed for designing context adaptive entropy coder suitable for real-time applications. Starting from the method proposed in [4], we rst present a study of the coding performance scalability as a function of context size. From this study, we demonstrate that the context should be carefully designed taking into consideration the application and the target memory size. After validation of the choice of a context, we propose a new mapping algorithm for reducing drastically the size of the context modelling. In the nal step of design, supervised clustering is proposed as an optimal way for capturing the relevant structures of the context. The whole procedure permits to design an efcient context adaptive arithmetic coder requiring as low as 4 kbytes of table storage. Evaluation shows that it outperforms signicantly Huffman coding in MPEG-4 AAC. 2. CONTEXT ADAPTIVE ENTROPY CODING 2.1. Principle The entropy of a discrete memoryless source X having as realization I different values (x0 ,....,xI1 ) is given by, H(X) =
I1 X i=0

p(xi ) log2 (p(xi )).

(1)

The entropy establishes the lower bound of the average bit-rate achievable by source coding. However, when the source is correlated, this bound can be further lowered by taking into account a higher order of the entropy like the conditional entropy, H(X|S) =
J1 X j=0

p(sj )

I1 X i=0

p(xi |sj ) log2 (p(xi |sj )),

(2)

where sj is a specic state of the source and J represents the total number of considered states. One elegant solution is to derive such states from the already received values available at both encoder and decoder. The state information comes then at no cost by exploiting the history of the source. This is the paradigm of context adaptive entropy coding. The main problem in context adaptive entropy coding is the excessive memory space required for storing the different states and codebook tables. Indeed, the memory space grows exponentially with J and can exceed by far the application constraints. The different tables needed can be estimated to occupy H(X|S) I J bits. Consequently, [4] proposed a method for reducing drastically the number of states while maintaining the size of the history. It was shown that the state of the source can be reduced into a few probability models. The concept can be summarized by the following procedure called state reduction:

1. Design a relevant context and select a limited number of S signicant states. 2. Map the signicant states into L probability models, where L S. In the next sections we propose to improve and enhance each of the two stages by introducing new analyses and optimizations. 2.2. Coding scheme In the present work, we address audio coding applications and especially the standard MPEG-4 AAC [1]. The aim is to replace the Huffman coding of the quantized spectral coefcients. Such coefcients are obtained by conveying the input signal through an MDCT and a quantizer controlled by psychoacoustic considerations. The spectral coefcients are further considered in blocks of 2, called 2tuples. Combining two consecutive values permits to exploit the inter-coefcient dependencies. Dimension 2 was selected as a compromise between coding gain and codebook size. Higher dimensions increase exponentially the alphabet size and therefore the codebook size, which contradicts the main goal of the present work. The coding considers separately the sign, the 2 most signicant bits and the remaining least signicant bits. The context adaptation is applied only the 2 MSBs of the unsigned spectral values. The sign and the least signicant bits are assumed to be uniformly distributed. Along with the 16 combinations of the MSBs of a 2-tuple, an escape symbol, ESC, is added in the alphabet for indicating that one additional LSB has to be expected by the decoder. ESC symbols are transmitted as many LSBs are needed. In total, 17 symbols form the alphabet of the code. A time-frequency context is associated with the presently coded 2-tuple as shown in Figure 1. The context encompasses 2-tuples prior in time or in frequency. From the context, a state is derived and associated with a probability model, which is in turn input to an arithmetic coder for generating a variable length code.

In this section, we intend to draw a method for validating the choice of a context. More precisely the performance scalability of a family of contexts was evaluated as a function of memory requirement. For this purpose, we dene a family of contexts by means of the function C(N, M ). The parameter M represents the number of 2-tuples lying in vicinity of the presently coded 2-tuple which are considered for the context derivation. Different shapes of contexts are studied. They are illustrated in Figure 1, where the square shaped cells represent the already decoded 2-tuples available at both encoder and decoder for the context derivation. The gray shaded squared cells represent the 2-tuples considered in the studied contexts. The context C(N, M ) considers the cells ci with 0 i < M . N is the number of bits on which each 2-tuple ci of C is represented. The past coded 2-tuples are re-quantized on a coarser and non-uniform scale. This limits greatly the size of the context by combining values expected to share a similar probability distribution. The re-quantized values cN are formed by concatenating the i L1 norm |a| + |b| of 2-tuple {a, b} along with the number of ESC symbols needed for indicating the number of remaining least significant bits. The L1 norm was chosen as a relevant information for the context derivation as the distribution of the spectral coefcients are expected to be Laplacian, for which iso-probabilty contours are dened by the L1 distance. The context C is then built as an ordered set of 2N M states, where the elements can be enumerated by the following indexing: s=
M 1 X i=0

cN 2N i i

(3)

The second stage of the context design is the state reduction of the original context. A relevant subset C of size S is extracted from C(N, M ), where S is much lower than 2N M . The elements of C are selected by evaluating for each state of C the distance, d(p(X), p(X|s)) =
I1 X i=0

p(xi |s) log2 (p(xi ) H(X|s)) p(s)


I1 X i=0

= p(s)

p(xi |s) log2 (

p(xi |s) ). p(xi ) (4)

The distance d represents the overall entropy reduction of the system when considering separately the own distribution of the state s. After the selection phase, the S most signicant states are grouped in the subset C: C = {s0 , . . . , sS1 } (5) A description of this subset need to be stored in a table of the coder. M The memory requirement for such a table occupies S N 8 bytes. Since each signicant state needs a description of its own distribution function, the coder has to additionally store 32S bytes of tables, considering that cumulative frequencies of the alphabet of 17 symbols are stored on 16 values of 16 bits. In total, the size of the tables M at the end of this stage is equal to S ( N 8 + 32). Figure 2 shows the performance of different contexts by plotting the entropy per symbol as a function of required overall memory space. As expected, increasing the number of signicant states will decrease systematically the entropy of the system. The gain from a memoryless entropy is already important when taking into account only few signicant states. However, further gain can only be achieved by increasing to a great extend the size of C. Starting with a context with a small original size will constrain the performance to

Fig. 1. Time-frequency representation of the context

3. CONTEXT DESIGN Context design is the rst stage in the conception of a context adaptive entropy coding and probably the most important as it inuences the subsequent stages. Since it is a very complex problem, one typically uses empirical methods and prior knowledge of the application.

Fig. 2. Entropy as a function of memory requirement

Fig. 3. Entropy as a function of number of L probability models

rapidly saturate towards a hard limit. As a consequence, designing a too small context will produce a system with limited performance. On the other hand, starting from a too large context may require high memory demand for reaching a desirable coding gain. The study shows clearly that the available memory space should constrain the size of the original context. It also permits to validate the choice of a context. As an example, the context C(4, 4) can be considered as suboptimal since the contexts C(3, 4) and C(3, 6) outperform it for every memory size. In our case, the context C(3, 4) is chosen as the best tradeoff for our application requirements. It performs the best in the operating range of kbytes. In the rest of the paper, only this particular context will be considered. 4. CONTEXT MAPPING Once a context is chosen, each of the S selected states of C is mapped into one of L probability models, where L is much lower than S. Once again, the number of probability models has to be determined as a function of expected coding gain and memory constraint. The context mapping is an optimization problem which is NP-complete. The goal is to nd the best mapping from S + 1 distribution functions towards L distribution functions by minimizing the impact of combining different statistics. In this section, we propose a new optimization algorithm trying to nd an approximate solution to the problem. The signicant states S as well as the group of remaining uns elected states g = [C {C}] are combined together forming a set called T of S + 1 elements and covering the whole space of C: T = {C, g} (6)

3. l = l + 1; pml (X) = pTj (X); T = T {Tj }; S = S 1 4. is l < L? if yes go to step 2, otherwise go to step 5 5. j, k = arg minj mink (d(pmk , pTj (X))) , with 0 j < S and 0 k < L 6. hmk (X) = hmk (X) + hTj (X); T = T {Tj }; S = S 1 7. is S > 0? if go to step 5, otherwise nish Figure 3 illustrates the coding efciency as a function of tables size. As it can be observed the size of tables is signicantly reduced compared to section 3. One needs to store the description of the signicant states, the probability models, and the mapping rule for a total memory requirement of S( N M +log2 (L) )+32L. It is worth 8 noting that the performance saturates when the number of signicant states becomes too large compared to the number of models. The signicant states selection is then suboptimal as it doesnt consider the forthcoming context mapping. For the same reason, one can observe some instabilities especially for small values of L. However, a determined number of L models can easily be admitted as optimial for a given target memory requirement. The parameter L is then deduced once both context and available memory space are xed. The two previous studies show that the context design is the most critical step of the context modelling. It determines the overall performance as it is the starting point for the context mapping. However, we cannot avoid a tradeoff between memory requirement and performance. For minimizing the impact of the compromise, the next section introduces an enhanced method for modelling the context. 5. CONTEXT CLUSTERING Further improvements of the context modelling is achieved by considering groups of states in addition to the signicant states. The problem is translated into clustering the context space for getting a more relevant and compact representation. Accordingly, signicant states are also used as bounds of clusters. The set T dened in section 4 is now formed by state indexes and group of states: T = {[0, s0 [, s0 , ]s0 , s1 [, . . . , sS1 , ]sS1 , 2N M [} (7)

Each element of T has its own distribution pTj (X), its respective histogram hTj (X) and its probabilty pT (j) of occurence as well. Besides, each probability model ml is also characterized by a distribution pml (X) and respective histogram hml (X). The algorithm begins with one single probability model initialized with the most probable element of T , and can be described by the following steps: 1. l = 0; pml (X) = pTj (X) with j = arg maxj pT (j) 2. j = arg maxj mink (d(pmk (x), pTj (X))) , with 0 j < S and 0 k < l

Fig. 4. The context clustering reduces signicantly the memory requirement.

Fig. 5. The new system outperformed Huffman Coding.

In order to cluster the context, each state s of C is evaluated with a new distance measure d2 : d2 (s) = pT (2j) H(X|Tj ) H(X|]sj1 , s[) (8) H(X|]s, sj [) H(X|s) where 2j is the index of the cluster where s lies, i.e. sj1 < s < sj is satised. The new distance aims to evaluate the benet of splitting an existing cluster of states into two new clusters. First, The set T is initialiazed to a single cluster [0, 2N M [ covering in a single interval the whole context space C. The signicant states are then selected sequentially by evaluating arg minsC{C} d2 (d). Finally, S selected states are mapped to L probability models as described in section 4. The reduction of memory requirement coming from the proposed clustering is illustrated in Figure 4. As an example, for a target entropy of 2.08, the size of tables can be reduced from 23 kbytes to 7 kbytes by using only the mapping optimization. It can be further reduced to 3 kbytes by adopting the proposed clustering. 6. EVALUATION For evaluation purposes in real-time applications, a new context adaptive arithmetic coding was designed and compared against MPEG-4 AAC Huffman coding. The design is based on the presented context modelling for a target memory size similar to the Huffman tables occupancy, i.e. 4 kbytes. A lossless transcoding of legacy MPEG-4 AAC bitstreams was performed by re-encoding the spectral values with the new scheme. The overall computational complexity of both encoders was observed to be in the same range. The two output bit-rates are compared in Figure 5. In addition, the bit-rates obtained by the context modelling proposed in [4], which required 68 kbytes, is included as well. Compared to Huffman coding, the new approach allows a saving of more than 1 kbps at 16 kbps and of 3.3 kbps at 128 kbps. 7. CONCLUSION A new context modelling for efcient entropy coding is presented in this paper. First, we explained the choice and the validation of an

appropriate context. This was followed by the presentation of a new algorithm for the context mapping. An efcient way of clustering the context space was also proposed. Finally, a realistic comparison shows that a context adaptive arithmetic coding scheme can significantly improve the performance of a conventional coding scheme while fullling the requirements for a real-time application. The result of this study was already adopted in course of the development of the upcoming MPEG-USAC standard. As future work, we envision to investigate more advanced clustering, and especially using 2-dimensional representation. 8. REFERENCES [1] ISO/IEC International Standard 14496-3, Coding of Audiovisual Objects, Part 3: Audio, Subpart 4 Time/Frequency Coding, 1999. [2] C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J., vol. 27, no. 3, pp. 379423, 1948. [3] Detlev Marpe, Heiko Schwarz, and Thomas Wiegand, ContextBased Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 7, pp. 620 636, jul. 2003. [4] Nikolaus Meine and Bernd Edler, Improved quantization and lossless coding for subband audio coding, in 118th AES Convention, Barcelona, Spain, May 2005, Preprint 6468. [5] Max Neuendorf et al., Unied speech and audio coding scheme for high quality at low bitrates, in IEEE Int. Conf. Acoustics, Speech and Signal Processing, ICASSP, 2009. [6] Yongseok Yi and In-Cheol Park, High-Speed H.264/AVC CABAC Decoding, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 17, no. 4, pp. 490494, 2007.

Você também pode gostar