Escolar Documentos
Profissional Documentos
Cultura Documentos
Wireless Loudspeaker
System With Real-Time
Audio Compression
Author: Ivar Loekken
Employer: Chipcon AS
University: Norwegian University of Technology and Science (NTNU)
Instructor: Robin Osa Hoel, Chipcon
Language: English
Number of pages: 240 including appendixes.
This thesis will cover the work done developing a system for wireless audio
transmission. The intended application is a wireless loudspeaker system1 where a hifi
control and playback unit transmits data to remote active speakers using an RF
tranceiver.
This concept is not new, but while most such systems use analog FM transfer, which
will inevitably compromise audio quality, the transmission in this WLS will be fully
digital with AD-conversion in the transmitter and DA-conversion in the receiver. A
digital input will also be available. The transmission will be done using a Chipcon
CC2400 RF-transceiver with 1Mbps transfer rate. Chipcon, who is the employer for
this project, intends to use the WLS as a demonstration or reference design for the
CC2400.
The informed reader might notice that the 1Mbps transfer rate is insufficient for CD-
quality audio, which requires about 1.4Mbps. This will be resolved using real-time
compression. The main focus of the thesis work has been on developing a low-
complexity and high-quality compression algorithm that can be run using only a
simple MCU. The employer required the design to be low-cost so separate DSPs or
ASICs for compression was not an option. Both lossless and lossy algorithms have
been explored2.
The thesis is divided into two main parts. The first covers audio compression theory
and gives the reader the basis knowledge necessary to understand how the algorithms
work. The second part covers the deveopement itself and provides documentation of
the work done. This includes both hardware and software design. Finally, the project
1
Througout the thesis, the target application will be referred to as the Wireless Louspeaker System or
simply the WLS.
2
A lossless compression algorithm is one where the output after decompression is identical to the
original data. In lossy algorithms, psychoacoustic models are used to remove audio information that is
not perceptible.
3
This is detailed in the project review, included at the end of the thesis.
4
Apple Powerbook G4 running Mac OS-X 10.3 ”Panther”.
2
itself is reviewed and a discussion around the work process, the achievements made as
well as the academical rewards is presented.
Finally I’d like to thank the following persons who have been of great help during the
project:
3
Table of Contents
4
3.3.3 Configuration of the RF-transceiver.................................................................................... 79
3.3.4 Configuration of the MCU IO ............................................................................................. 80
3.3.5 The finished circuit .............................................................................................................. 83
4 Analysis of Lossy Compression Algorithms.....................................................86
4.1 Reference for comparison; 8-bit and 4-bit LPCM ................................................87
4.2 Analysis of 4-bit DPCM ............................................................................................88
4.3 Analysis of IMA ADPCM .........................................................................................90
4.4 Analysis of µ-law ........................................................................................................91
4.5 Reference for comparison II: MP3..........................................................................93
4.6 iLaw: a low-complexity, low-loss algorithm. ..........................................................96
4.7 Notes about the performance measurements .........................................................99
5 Design of Lossless Compression Algorithm...................................................100
5.1 Coding method .........................................................................................................103
5.1.1 Evaluation of Pod-coding and Rice-coding ...................................................................... 103
5.2 iPod: an attempt at improving the Pod-coding....................................................107
5.3 Prediction scheme ....................................................................................................110
5.4 Channel decorrelation.............................................................................................115
5.5 Final algorithm proposal and benchmark............................................................119
5.6 Lossy mode................................................................................................................121
5.6.1 LSB-removal lossy-mode .................................................................................................. 122
5.6.2 Mono samples lossy-mode................................................................................................. 125
6 WLS Implementation Considerations............................................................127
6.1 MCU implementation considerations....................................................................127
6.1.1 Wrap-around arithmetic ..................................................................................................... 127
6.1.2 Look-up tables.................................................................................................................... 128
6.2 RF-link implementation considerations................................................................129
6.2.1 Packet handling .................................................................................................................. 129
6.2.2 Transmission or calculation of k ?..................................................................................... 130
6.2.3 Lost packet handling .......................................................................................................... 130
7 Project Review ...............................................................................................135
8 Summary .......................................................................................................136
9 References .....................................................................................................137
5
List of Figures
Figure 1 Wireless loudpeaker system........................................................................................................... 12
Figure 2 Digital representation of audio signal............................................................................................ 13
Figure 3 Histogram of samples in Stevie Ray Vaughan, ”Voodoo Chile” wav-file .................................. 15
Figure 4 Basic principles of lossless audio compression............................................................................. 16
Figure 5 Histogram of mutual and side, "Voodoo Chile", 30s excerpt....................................................... 18
Figure 6 Prediction model [reference 2]....................................................................................................... 19
Figure 7 Histogram, prediction error e[n], "Voodoo Chile", 30s excerpt................................................... 20
Figure 8 Signal flow chart, difference prediction ........................................................................................ 21
Figure 9 General filter-based prediction [reference 2] ................................................................................ 21
Figure 10 Entropy vs. predictor order, fixed FIR predictor........................................................................ 23
Figure 11 The four polynomal approximations of x[n] [reference 2] ......................................................... 25
Figure 12 Binary tree with prefix property code (code 2 from table 3)...................................................... 28
Figure 13 General depiction of Huffman-tree, seven symbols W1-W7 ..................................................... 29
Figure 14 Algorithm FGK processing the ensemble EX: (a) Tree after processing "aa bb"; 11 will be
transmitted for the next b. (b) After encoding the third b; 101 will be transmitted for the next
space; the tree will not change; 100 will be transmitted for the first c. (c) Tree after update
following first c. [reference 9] ............................................................................................................ 31
Figure 15 Complete Huffman-tree for example EX .................................................................................... 32
Figure 16 The human auditory system ......................................................................................................... 37
Figure 17 Cross-section of the cochlea ........................................................................................................ 38
Figure 18 Cochlea filter response................................................................................................................. 39
Figure 19 Masking threshold ........................................................................................................................ 39
Figure 20 The Fletcher-Munson curves (equal loudness curves)................................................................ 40
Figure 21 Temporal masking........................................................................................................................ 41
Figure 22 MP3 encoding and decoding block diagram ............................................................................... 42
Figure 23 AAC compression block diagram................................................................................................ 43
Figure 24 DPCM-encoder block diagram [reference 17] ............................................................................ 44
Figure 25 DPCM decoder block diagram [reference 17] ............................................................................ 45
Figure 26 ADPCM general block diagram [referene 18] ............................................................................ 46
Figure 27 IMA ADPCM stepsize adaptation [reference 18]....................................................................... 47
Figure 28 IMA ADPCM quantization [reference 18].................................................................................. 48
Figure 29 Basic block diagram, wireless audio transceiver ........................................................................ 53
Figure 30 Typical application circuit, Chipcon CC2400 [reference 22]..................................................... 54
Figure 31 Texas Instruments TLV320AIC23B block diagram [reference 24] ........................................... 56
Figure 32 Block diagram, Crystal CS8416 [reference 28] .......................................................................... 58
Figure 33 Communication through a) 2 SPI-ports or b) 1 SPI-port and parallell IO via shift registers.... 61
Figure 34 I2S data transfer timing diagram .................................................................................................. 69
Figure 35 Principle for data transfer between audio device and MCU....................................................... 70
Figure 36 Simplified schematics, 74HC4094N [reference 37] ................................................................... 71
Figure 37 Tming diagram, transfer from audio device to MCU ................................................................. 71
Figure 38 Logic diagram, 74HC166N [reference 38].................................................................................. 72
Figure 39 Timing diagram, transfer from MCU to audio device ................................................................ 72
Figure 40 Logic circuit for generation of control signals ............................................................................ 73
Figure 41 Timing diagram for control signals ............................................................................................. 74
Figure 42 Block diagram, wireless loudspeaker system............................................................................. 75
Figure 43 Configuration of SP-dif receiver.................................................................................................. 76
Figure 44 Recommended filter layout [reference 27].................................................................................. 77
Figure 45 220µF, 330µF, 470µF decoupling caps frequency response, 32/16Ω load ............................... 78
Figure 46 Configuration of audio codec....................................................................................................... 78
Figure 47 Connection, Chipcon CC2400 RF-transceiver............................................................................ 79
Figure 48 C8051F00x IO-system functional block diagram [reference 36]............................................... 80
Figure 49 C8051F00x priority decode table [reference 16] ........................................................................ 81
Figure 50 Configuration of MCU IO CrossBar Decoder ............................................................................ 82
Figure 51 Complete circuit diagram............................................................................................................. 83
Figure 52 Jumper settings ............................................................................................................................. 84
Figure 53 Logic analyzer standard connection ............................................................................................ 84
6
Figure 54 Logic analyzer connections.......................................................................................................... 85
Figure 55 Waveform and spectrum, "littlewing.wav" ................................................................................. 87
Figure 56 Performance measurements, 4-bit and 8-bit LPCM................................................................... 87
Figure 57 4:1 DPCM performance measurement, "Littlewing.wav".......................................................... 89
Figure 58 IMA ADPCM performance measurement, ”Littlewing.wav”.................................................... 90
Figure 59 µ-law performance measurement, ”Littlewing.wav”.................................................................. 92
Figure 60 Measured performance, 128kbps MP3, ”littlewing.wav”.......................................................... 94
Figure 61 Measured performance, 256kbps MP3, ”littlewing.wav”........................................................... 95
Figure 62 10-bit µ-law data format .............................................................................................................. 96
Figure 63 Flowchart, iLaw encoder designed for this thesis....................................................................... 97
Figure 64 Flowchart, iLaw decoder designed for this project..................................................................... 97
Figure 65 Measured performance, custom codec, "littlewing.wav". .......................................................... 98
Figure 66 Waveform of, from top to bottom, "littlewing.wav", "percussion.wav", "rock.wav",
"classical.wav", "jazz.wav" and "pop.wav", Audacity .................................................................... 101
Figure 67 Spectrum of the "littlewing.wav", "percussion.wav", "rock.wav", "classical.wav", "jazz.wav"
and "pop.wav”, Audacity .................................................................................................................. 102
Figure 68 Encoding performance and worst-case word length, all tests averaged................................... 106
Figure 69 Distribution of overflow, "littlewing.wav"................................................................................ 109
Figure 70 Bit-wise polynomal approximation encoder data structure ...................................................... 111
Figure 71 Polynomal selection, framewise polynomal appr., 255 sample frames, Excel........................ 113
Figure 72 Performance, different tested prediction schemes .................................................................... 114
Figure 73 Entropy of channels, mutual and side signals and filesize reduction, average results of files in
table 14 except ”dualmono.wav”...................................................................................................... 118
Figure 74 Performance evaluation, Shorten vs. suggested algorithm for WLS........................................ 120
Figure 75 Algorithm for LSB-removal lossy mode................................................................................... 123
Figure 76 Lossy-mode performance, "modernlive.wav", 30s excerpt, left channel................................. 124
Figure 77 Spectrum with mono-mode, 64-sample frames, ”modernlive.wav”, 30s excerpt. .................. 126
Figure 78 Chipcon CC2400 packet format [reference 22] ........................................................................ 129
Figure 79 Proposed frame for WLS-implementation with transfer of frame-static k .............................. 130
Figure 80 Left: Audibility of difference between method 1 (silence) and 2 (repitition), 1000 packet
"loose interval", 64 sample packet. .................................................................................................. 131
Figure 81 Preferred lost packet handling method ...................................................................................... 132
7
List of Tables
Table 1 Higher-order FIR-prediction [reference 2] ..................................................................................... 21
Table 2 Entropy with FIR-prediction, first to third order, ”Little Wing”, 30s excerpt .............................. 22
Table 3 Two example binary codes [reference 7]....................................................................................... 27
Table 4 Pod-codes vs. Rice-codes ................................................................................................................ 36
Table 5 DPCM nonlinear quantization code [reference 17]........................................................................ 44
Table 6 First table lookup for IMA ADPCM quantizer adaptation [reference 18] .................................... 47
Table 7 Second table lookup for IMA ADPCM quantizer adaptation [reference 18]................................ 47
Table 8 AKM4550 versus TI TLV320AIC32B comparison [references 23 and 24] ................................. 55
Table 9 Crude MIPS requirement estimation for MCU .............................................................................. 59
Table 10 Comparison between seriously considered MCUs [references 30-36]........................................ 62
Table 11 Performance, 8-bit and 4-bit LPCM ............................................................................................. 88
Table 12 DPCM quantization table .............................................................................................................. 88
Table 13 Performance 4-bit DPCM, ”littlewing.wav” (see text) ................................................................ 89
Table 14 Performance 4-bit ADPCM, ”littlewing.wav ............................................................................... 91
Table 15 Performance 8-bit µ-law, ”littlewing.wav” and ”speedtest.wav”................................................ 92
Table 16 Measured performance, LAME MP3, ”littlewing.wav” .............................................................. 93
Table 17 Performance iLaw codec, ”littlewing.wav”.................................................................................. 98
Table 18 Wav-files used for characterization of lossless algorithms........................................................ 100
Table 19 Performance of Rice- and Pod-coding, A and N reset every 256th sample, no prediction,
”littlewing.wav” ................................................................................................................................ 104
Table 20 Performance of Rice- and Pod-coding, A and N reset every 256th sample, 1st order
prediction, ”littlewing.wav”.............................................................................................................. 105
Table 21 Performance of Pod- and Rice-coding with HF-rich file, no prediction, "percussion.wav". ... 105
Table 22 Performance of Pod and Rice coding with HF-rich file, 1st order prediction,
"percussion.wav"............................................................................................................................... 105
Table 23 Regular Pod-coding vs. iPod-coding .......................................................................................... 107
Table 24 Pod-coding vs. iPod coding, filesize reduction (no prediction)................................................. 108
Table 25 Filesize reduction, no pred., 1st order and 2nd order linear pred. ............................................ 111
Table 26 Filesize reduction, sample-wise polynomal approximation....................................................... 111
Table 27 Performance, framewise polynomal approximation, 0th, 1st and 2nd order polynom selection
............................................................................................................................................................ 112
Table 28 Third and fourth order fixed predictor, new k for every sample ............................................... 114
Table 29 Computational cost per sample for the different prediction schemes........................................ 115
Table 30 Recordings used to test stereo decorrelation .............................................................................. 116
Table 31 Results of inter-channel decorrelation ........................................................................................ 117
Table 32 Lossy-mode performance ............................................................................................................ 124
8
List of Acronyms and Abbreviations
A list of acronyms and abbreviations that are not explicitly explained in the text.
9
Part I
- Theory -
10
1 Wireless Loudspeaker System Description
To date most wireless loudspeaker systems have used analog FM-transfer. This
compromises the quality of playback, analog transfer will inevitably decrease SNR
and increase distortion. However, more recently fully digital RF-transceivers with
high data bandwidth have become cheap and available in the market. Norwegian
circuit manufacturer Chipcon offers amongst others the CC2400 RF-transceiver, a
1Mbps unit operating in the 2.4Ghz ISM-band. They wanted to explore the
possibillities of using it in a wireless loudspeaker system and thus initiated the project
resulting in this thesis.
This is beyond the transfer capability of the Chipcon CC2400. Because of this the
audio must be compressed, and compression must happen in real-time. Since the
hardware was required to have very low cost, the compression algorithm must be of
such nature that it does not require any dedicated hardware. Irrespective of audio
processing, a microcontroller unit (MCU) is necessary to control the data transfer and
setup of the hardware. If this MCU can do the compression as well, the system cost
will been lowered significantly. But it requires a low-complexity scheme. Besides
hardware design, reseach and development of a suitable compression algorithm has
been the main focus of this project.
5
Sony-Philips digital interface formats – it, and other formats and protocols relevant for this thesis, is
presented in appendix 1.
11
Figure 1 shows the intended system. A audio playback unit provides either analog or
digital signals to the transmission module. This performs either AD-conversion or SP-
dif decoding depending on whether the input signal is analog or digital. Then the data
is compressed and transmitted to the RF-transceiver. The receiver module sits in the
loudspeaker. Data is received and decompressed before being DA-converted and fed
to the loudspeaker’s built-in amplifier. Since the transmission is digital, it should not
result in any loss of audio quality. The only significant loss factors are AD- and DA-
conversion, and possibly the compression. These will both be adressed thoroughly.
Audio compression can be divided into two main categories, lossless and lossy
compression. The former has no signal degradation, the decoded output is sample-to-
sample identical with the input. Lossy compression tries to model the human auditory
system to remove audio content that is not perceptible. The ratio between input and
output bandwidth, the compression ratio, of lossless algorithms is limited, usually in
the range of 2:1, while good lossy algorithms can provide ten times that ratio and still
maintain decent audio quality. Another advantage with the lossy approach is that the
output bitrate can be set at whatever the user desires. The effectiveness of lossless
algorithms vary with the input’s data redundancy, or in other words it’s
”compressability”. In the WLS a quite small ratio is required, but the real-time
operation does add some complications when it comes to variable output bitrate. In
this thesis, both lossless, lossy and hybrid6 algorithms have been developed and
studied, and suggestions are made for all alternatives.
6
What is reffered to as a hybrid algorithm is one that is lossless during normal operation, but goes into
a lossy-mode if necessary, for instance when the compression ratio does not meet the instantaneous
bitrate requirements given by the transceiver operating in real-time.
12
2 Audio Compression; Theory and Principles
A digital audio signal is usually represented by uniformly sampled values with a fixed
word length N, which means that each sample can have a value between –(2N-1) and
(2N-1-1). The digital sample value represents the signal amplitude at a specified instant
(the sample instant) as shown in figure 2. The number of samples per second is
specified by the sampling frequency fS. This technique is called linear quantization or
LPCM (Linear Pulse Code Modulation).
Since each sample, regardless of it’s value, is represented with N bits, the bandwidth
requirement for transfer of the LPCM-signal will be given by
Eq. 2 B = N ! f s [bits/sec]
7
The Nyquist theorem and the 6dB per bit rule are explained in appendix 2, ”Data converter
fundamentals”.
13
For CD-audio the sample frequency is 44.1kHz, the resolution is 16 bits and there is
two channels to transfer. Then the total bandwidth requirement B will be
This number does not depend on the actual value of the samples, it depends on the
number of possible values they can have, the resolution. Thus it is natural to assume
that one could reduce the bandwidth by using a coding scheme where the code-length
depends on the actual values rather than the resolution.
Since the signal from an audio source is unknown (not deterministic) it must be
described using information theory. It can be shown that the average binary
information value of a sample S is quantifiable as
where p(S) is the probability of the value S occuring. A measurement of the binary
information content of a statistically independent source derived by this is it´s entropy
H(s), given by the equation
n
Eq. 5 H(s) = " ( pi ! log 2 ( p1i )) ;[reference 1]
i=1
In which pi is the probability that the value i occurs. The entropy is in other words a
probability-weighted average of the information. If we look at a signal uniformly
distributed over all possible values within CD-audio, from i=-(215) to i=(215-1), the
entropy is
215 !1
Eq. 6 H (s) = ! # 2 !16 " log 2 (2 !16 ) =16bits
i = !(215 )
This is hardly surprising. When you quantize to 16-bits, what you really do is to
assume that each sample can have any value between –(215) and (215-1). The
probability of any given value to occur then is 2-16. As equation 6 shows this
corresponds to a uniform distribution between the two limit values.
When we know that the entropy gives us the average information content of a signal
we can use this to draw a some important conclusions:
- The entropy tells us how many bits the data will use when coded ideally (if
the coding does not remove any information and also contains no
unnecessary data it is ideal)
- The difference between the entropy and the coded binary wordlength tells
us how much redundancy there is in the coding scheme.
14
When quantizing to LPCM-code you assume that you have no knowledge about the
signal, except that it can have any given value between a minimum and a maximum.
You assume random values or in other words a uniform distribution. The question is
whether or not music actually has such a distribution, or if the entropy in reality is
smaller and we are coding with redundancy.
In practice the music signal almost always has a probability distribution that is closer
to a Laplacian one than a uniform one. In figure 3, a histogram is shown of a 30
seconds excerpt from the music track ”Voodoo Chile”, a recording of late guitar
legend Stevie Ray Vaughan. The histogram is made in MatLab. It shows that a
overwhelming majority of the samples have quite low values.
The histograms show the left channel (upper) and right channel (lower). As one can
see, they are very similar and much closer to a Laplacian than a uniform distribution.
A script was made in MatLab [appendix 7] which reads a music-file and calculates the
entropy using equation 5. For the excerpt of ”Voodoo Chile” it gave the results shown
in 7 and 8.
Since practically all music has a distribution similar to the one shown in figure 3 one
can make good assumptions of its probability distribution and therefore code it in
15
ways that in almost all cases gives less redundancy than the uniform LPCM-variant.
In addition one can change the representation of the signal to reduce the entropy
further. These techniques makes up the basis for all types of compression of audio
signals. If the compression only removes redundant data and not information, it is said
to be lossless. The other type, lossy coding, tries to find and remove any information
that is unnecessary. For audio data, models of the human auditory systems are used to
find and remove information that we can’t here even when it’s there.
Entropy coding is based on giving short codes to values with a high probability of
occurrence and longer codes to the values with lower probability. Then, if the
assumptions of probabilities are correct, there will be many short codes and few
longer ones.
2.2.1 Framing
In most lossless compression algorithms, the data is divided into frames before
compression. If the prediction or encoding is adaptive, information about what
parameters are used has to be sent with the audio data in the shape of a header. To
send this header with each sample will give too much data overhead, thus frames are
used instead. Over the duration of a frame, the same parameters are used for
compression and it only needs one information block, for obvious reasons called the
frame header.
16
The application will determine how big each frame is. If the frames are small, it will
compromise the bandwidth reduction since the number of headers, which also use
data space, will increase. If the frame is too large, the same parameters will have to be
used over many samples for which they might not be ideal, and this will again reduce
the compression ratio. Determining the frame size is often a question of trying and
evaluating. There is no absolute answer to what is the best framesize, one just has to
find a resonable tradeoff. It is generally sensible to make the framesize a multiple of
the wordlength so a fixed number of samples fit within one frame. The most usual in
existing algorithms is 576-1152 samples [reference 2], but this can to a large extent be
adjusted to the intended application.
2.2.2 Decorrelation
L+R
Eq. 9 M=
2
Eq. 10 S = L ! R
For the file ”Voodoo Chile” the histograms for M and S are as shown in figure 5.
17
Figure 5 Histogram of mutual and side, "Voodoo Chile", 30s excerpt
As we can see S has many more small values than L or R. It should be evident by
looking at equation 5 that the entropy of S should be smaller than that of L or R. The
script that calculates entropy gives the following results for M and S:
As we can see the information amount has been reduced. Still it’s easy to calculate L
and R in the decoder by using M and S. Redundancy due to inter-channel correlation
has been removed without losing any information.
18
what type of approximation is used and also knows the error, it can calculate it’s way
back to the original values and the information will be regained without loss. A model
for the predicion process is shown in figure 6.
The easiest way to understand this is by looking at the simplest prediction possible: to
assume that the current sample has the same value as the last one. In other words
Simply the difference between the two adjacent samples. If there is absolutely no
correlation between them, e[n] will have a totally random value from time to time or a
uniform probability distribution. However, if there is correlation it is likely that the
error e[n] will be small and the entropy will then be reduced. It is also evident that
when the decoder knows what the difference between one sample and the next is, it
just needs an initial value to be able to calculate every sample with no other input than
e[n]. To check if the entropy really is decreased, the simple prediction from equation
13 was performed on the excerpt of the music file ”Voodoo Chile”. The result e[n] is
shown in figure 7.
19
Figure 7 Histogram, prediction error e[n], "Voodoo Chile", 30s excerpt
It’s easy to see that the prediction error in general has much smaller values than the
actual signal shown in figure 2. A calculation of the entropy gives the result shown in
equations 15 and 16.
If you take a closer look at the simple prediction given by equation 14, you will see
that a signal flow chart will be like the one in figure 8.
20
Figure 8 Signal flow chart, difference prediction
It becomes evident by looking at it that the figure actually shows a first-order FIR
high-pass filter. So difference prediction and first order high-pass-filtering are the
same. This is logical when one considers what the prediction actually does. If the
freqency is low, the difference between adjacent samples, which is the output of the
predictor, is small. If the frequency is high, the differences are large. This is clearly
high-pass filtering. It’s then obvious that more advanced prediction algoritms must be
based on higher order filters. First to third order FIR-prediction is shown in table 1.
In addition to higher order filtering, past values of the error can also be used for
prediction, in other words IIR-prediction. However, since implementing prediction of
very high order FIR- or IIR-filters is beyond the capability of the hardware used in the
WLS, this thesis will not deal with such in any greater detail.
21
Q denotes quantization of the filter output to the same wordlength as the original
signal. The figure depicts the equation
&M N #
Eq. 17 e[n] = x[n] ! Q'" aˆ k x[n ! k] ! " bˆk e[n ! k]$ ;[reference 2]
( k=1 k=1 %
The quantization operation makes the predictor a nonlinear predictor, but since it is
done with 16-bits precision, it is resonable to neglect the effects it has on the level of
compression. This quantization is necessary in lossless codecs since we want to be
able to reconstruct x[n] exactly from e[n] and possibly on a different machine
architecture [reference 2]. Since the same quantization is done in the decoder’s
inverse filter, the reconstruction is still exact i.e. lossless.
Table 2 Entropy with FIR-prediction, first to third order, ”Little Wing”, 30s excerpt
Order Entropy left channel Entropy, right channel
1. 10.81 bits 10.94 bits
2. 10.38 bits 10.29 bits
3. 10.34 bits 10.34 bits
It is clear that the gain in entropy reduction decreases rapidly when the order
increases. Thus a prediction of very high order is probably not worth the extra
computationally complexity. Another MatLab script was written to examine the
effectiveness of different prediction orders when inter-channel decorrelation is
included. The results are presented in figure 10.
22
Figure 10 Entropy vs. predictor order, fixed FIR predictor
As we can see, there is a huge gain from no prediction to first order prediction. Also,
there is a clear improvement from first order to second order. After that, the gain is
small, and in some cases, a higher order predictor even gives worse results. This
underlines the conclusion that a very high order fixed predictor is unlikely to produce
results that are worth the extra cost in complexity.
Although a fixed predictor can yield significant reduction in the entropy, it is evident
that it will not be optimal for every combination of input signals. For instance, when
the difference between adjacent samples is large, the difference predictor will provide
a poor result. Many good predictors are adaptive which means that they adjust to the
input signal. To illustrate how this work, a simple example [reference 5] is used:
In this example, a facor m is used to adjust the predictor, the parameter m varies from
0 to 1024 where 0 is no prediction and 1024 is full prediction. After each prediction,
m is adjusted up or down depending on whether the prediction was helpful or not. For
the example we use a second order predictor (see table 1) and consider an input
sequence x=[2, 8, 24, ?]. Since the predictor is adaptive it uses the value m to
determine the level of prediction and compares the result p[n] with the real value x[n]
to see if the prediction was good and to update m for the next one. Thus, the output
will be:
23
m
Eq. 18 xˆ [n] = x[n] ! p[n]= x[n] ! pF [n]"
mmax
Where pF[n] is a second order fixed predictor pF [n] = 2x[n !1] ! x[n ! 2]. If, in the
example ? = 45 and m = 512 then
512
Eq. 19 xˆ [n] = ?! pF [n]" m = 45 ! (24 " 2 ! 8) " = 25
1024
Since the prediction underestimated the real value, m will be adjusted upwards for the
next run.
One of the best known algorithms is the least mean square, or LMS, algorithm where,
at each iteration, the predictor coefficients are updated in a direction opposite to that
of the instantaneous gradient of the squared prediction error surface [reference 3]. A
less computationally demanding algorithm, the exponential power estimation, or EPE,
is also much used. In this, the envelope of the magnitude of the input sequence x[n] is
tracked and used to adapt the prediction [reference 4].
" xˆ 0 [n] = 0
$ˆ
$ x1[n] = x[n !1]
Eq. 20 # ;[reference 6]
$ xˆ 2 [n] = 2x[n !1] ! x[n ! 2]
$% xˆ 3 [n] = 3x[n !1] ! 3x[n ! 2] + x[n ! 3]
24
"e0 [n] = x[n]
$
$e1[n] = e0 [n] ! e1[n]
Eq. 21 # ;[reference 6]
$e2 [n] = e1[n] ! e1[n !1]
$%e3 [n] = e2 [n] ! e2 [n !1]
No multiplications are needed and the cost in extra resources is small. For each frame,
the four residuals e1[n], e2[n], e3[n] and e4[n] are computed as well as the sums of the
absolute values of these residuals over the complete frame. The residual with the
smallest sum magnitude is then defined as the best approximation for this frame, and
sent to the entropy encoder. In figure 11, this principle is illustrated.
Since the approximator selects the best predictor for each frame, the structure can be
said to be frame-adaptive. It yields a significant improvement over fixed predictors at
a low computational cost. However, since four sets of residuals need to be saved, as
well as variables containing the absolute value of the sums, the memory usage
increases. But this principle does not have to be locked to four polynomals as used in
Shorten, one can for instance calculate and choose the best between the 0th order and
1st order predictions or maybe the 1st order and the 2nd order. This would have to be
decided depending on the compression ratio requirement and the available resources
in form of processing power and memory.
25
2.2.3 Entropy-coding
As mentioned, lossless or entropy-based compression ignores the semantics of the
data, it is based purely on the statictics of the data content. These statistics can be the
frequencies of occurrence for different symbols or the existence of repetitive
sequences of symbols (in information theory, ”symbol” is often used even if it in the
case of digital audio in reality is sampled values). For the former, statistical
compression which assigns variable-length codes to symbols based on their
frequencies of occurrence is used. For the latter, repetitive sequence encoding, like for
instance run-length encoding, is the simplest option
The idea of run-length encoding is to replace long sequences of identical values with a
special code that indicates the value to be repeated and the number of times which to
repeat it. As an example a text file with the input string: ”aaaaaaabbbbbaaaabbaaa”
will be replaced with ”7a5b4abb3a”. As we can see, the coding is only effective, and
thus only used, on runs greater than 3 samples.
Since there in audio playback is relatively few repeating strings to be found (in music
long identical sequences usually only appear in pauses), the effectiveness of RLE-
coding in itself is very limited. However, it can be used as a step in more elaborate
compression schemes.
2.2.3.2 Huffman-coding
As shown earlier, a coding based on linear quantization, where every sample with a
possible value between 0 and 2B (or –2B-1 to 2B-1-1) is represented by B bits, is not the
most space-efficient coding scheme, simply because some values are more common
than others. As the histograms has shown, in recorded audio small values are much
more frequent, thus it is inefficient to code using a fixed number of bits large enough
to contain even the biggest possible number. Huffman-coding uses a variable-length
representation where short codes are assigned to the most frequent values and longer
codes to the ones that appear more rarely. Huffman-coding can be shown to be
optimal only if all probabilities are integral powers of 1/2, however it still yields
significant improvement over normal LPCM-code even in audio applications.
Since the number of bits per symbol is variable, in general the boundary between
codes will not fall on byte boundaries, there is no built-in ”decimation” between
symbols. One could add a special ”marker”, but this would waste space. Instead, a set
26
of codes with a prefix property is generated, each symbol is encoded into a sequence
of bits so that no code for a symbol is the prefix of the code for any other. This
property allows decoding of a bit string by repeatedly deleting prefixes of the string
that are codes for symbols. The prefix property can be assured using binary trees. An
example [reference 7] will be used to show how it’s done.
Two example codes with the prefix propery are given in table 3. Decoding code 1
(standard binary code) is simple, as we can just read three bits at a time (for example
”001010011” is decoded to 2,3,4). For code 2, we must read one bit at a time so that,
for instance, ”1101001” would be read as ”11”=2, ”01”=3 and ”001”=’4’. Clearly, the
average number of bits per symbol is less for code 2 (2.2 vs. 3, for a data reduction of
27%).
When a set of symbols and their probabilities is known, the Huffman algoritm lets ut
find a code with the prefix propery such that the average length of code for each
symbol is a minimum. The basic principle is that we select the two symbols with the
lowest probabilities (in table 3; 1 and 4) and replace them with a symbol s1 that has a
probability equal to the sum of the original two (in the example, 0.20). The optimal
prefix for this set is the code for s1 with a zero appended for 1 and a one appended for
4. This process is repeated until all symbols have been merged into one symbol with
probabillity 1.00. This is equivalent to constructing a binary tree from the bottom up.
To find the code for a symbol, we follow the path from the root to the leaf that
corresponds to it. Along the way, we output a zero every time we follow a left link
and a one for each right link. If only the leaves of the tree are labeled with symbols,
then we are guaranteed that the code will have the prefix property (since we only
encounter one leaf on the path from the root to the symbol). An example code tree
(for the code in table 3) is shown in figure 12.
27
Figure 12 Binary tree with prefix property code (code 2 from table 3)
1. Initialization: put all nodes in an OPEN list, keep it sorted at all times (e.g.
12345).
2. Repeat until the OPEN list has only one node left:
a. From OPEN, pick the two nodes having the lowest frequencies, create
a parent node of them.
b. Assign the sum of the childrens frequencies to the parent node and
insert it into OPEN
c. Assign code 0, 1 to the two branches of the tree and delete the children
from OPEN.
Since the probabilities are usually estimates used for weighting of the different
symbols (the source is not deterministically known), they are expressed as a list of
weights {w(1), ... ,w(n) } where ∑w(n) for all n is 1. The Huffman-coding in reality
then is a merging of weights and the Huffman tree is usually depicted as shown in
figure 13.
28
Figure 13 General depiction of Huffman-tree, seven symbols W1-W7
As we can see there is a total of seven symbols arranged after weighting with W1 as
the smallest.
29
2.2.3.3 Adaptive Huffman coding
The basic Huffman algorithm clearly requires a statistical knowledge of the data
which is often unavailable. For audio playback it is definetely not available, although
as the histogram examinations show, an estimation can be done that will make
Huffman coding quite effective in most cases (the prediction residuals can be
estimated well with a laplacian probability density function - high probalility for
small values, exponentially decreasing probability as the values increases). But even
if it is available, there could be a heavy overhead, especially when many tables has to
be sent because a non-zero-order model is used (i.e. taking into account the impact of
the previous symbol to the probability of the current symbol).
The most frequently used adaptive Huffman algorithm is the FGK-algoritm [reference
9] which is based on the sibling propery. A binary code tree has the sibling property if
each node (except the root) has a sibling and if the nodes can be listed in order of
nonincreasing weight with each node adjacent to its sibling. It can be proved that a
binary prefix code is a Huffman code only if the code tree has the sibling property.
In the algorithm, both sender and receiver maintain dynamically changing Huffman
code trees. The leaves of the code tree represent the source messages and the weights
of the leaves represent frequency counts for the messages. At any point in time, k of
the n possible source messages have occurred in the message ensemble.
Initially, the code tree consists of a single leaf node, called the 0-node. The 0-node is
a special node used to represent the n-k unused messages. For each message
transmitted, both parties must increment the corresponding weight and recompute the
code tree to maintain the sibling propery.
30
Figure 14 Algorithm FGK processing the ensemble EX: (a) Tree after processing "aa
bb"; 11 will be transmitted for the next b. (b) After encoding the third b; 101 will be
transmitted for the next space; the tree will not change; 100 will be transmitted for
the first c. (c) Tree after update following first c. [reference 9]
At the point in time when t messages has been transmitted, k of them distinct, and
k<n, the tree is a legal Huffman code tree with k+1 leaves., one for each message and
one for the 0-node. If the (t+1)st message is one of the k already seen, the algorithm
transmits a(t+1)’s current code, increments the appropriate counter and recomputes
the tree. If an unused message occurs, the 0-node is split to create a pair of leaves, one
31
for a(t+1), and a sibling which is the new 0-node. Again the tree is recomputed. In
this case, the code for the 0-node is sent; in addition, the receiver must be told which
of the n-k unused messages have appeared. At each node a count of occurrences of the
corresponding message is stored. Nodes are numbered indicating their position in the
sibling property ordering. The updating of the tree can be done in a single traversal
from the a(t+1) node to the root. This traversal must increment the count for the
a(t+1) node and for each of its ancestors. Nodes may be exchanged to maintain the
sibling property, but all of these exchanges involve a node on the path from a(t+1) to
the root. The final code tree for the example is shown in figure 15.
The Adaptive Huffman coding basically updates the Huffman-tree for every new
occurrence of a symbol, since it’s frequency then increases. It is in many cases more
effective, produces less overhead (n·log(n) as compared to 2n for the static Huffman
code). However it is more demanding computationally. It is proved that the time
required for each encoding og decoding operation is O(l) where l is the current length
of the codeword.
32
2.2.3.4 Rice-coding
1. Make a guess as to how many bits a number will take and call that k.
2. Store the rightmost k bits of the number in their original form.
3. Imagine the binary number without there k rightmost bits, this is the
overflow that doesn’t fit in k.
4. Encode this value with a corresponding number of zeros followed by a
terminating ’1’ to indicate the end of the encoded overflow.
As an example, if n=578 (”01000010”) and k=8; then sign = ’1’, n/(2k) = 578/256 = 2
= ”00”, terminator = ’1’, k least significant bits = ”01000010”.
while, as a comparison
As we can see 4 bits are saved. It’s also obvious from looking at the algorithm that for
this to work, absolute values must be used.
8
The same as for LPCM, but if desired, the opposite sign representation can of course also be used
33
It is clearly apparent that a good estimation of k is necessary, if not the number of
zeros (n/(2k)) will be large and the code will be ineffective. The optimum k is
determined by looking at the average value over a number of past samples (16-128 is
normal, this is a speed vs. efficiency trade-off) and choosing the optimum k for that
average. The optimum k can be calculated as:
log(n avg )
Eq. 25 kopt = ;[reference 5]
log(2)
By looking at the algoritm it is evident that the crucial step is the calculation of the
parameter k. The exhaustive method of calculating the average of a large number of
past samples and employing formula 25 is computationally demanding.
Overcompensating by using very few samples will increase the redundancy since
there is a larger possibillity of k being far from optimal. During the development of
the JPEG-LS (JPEG Lossless) image compression standard [reference 10] an
alternative and much simpler method was proposed. However, understanding this
demands a more formal expression of the Rice algorithm.
log(1+ !)
Eq. 26 m = ;[reference 10]
log( ! "1 )
A special case of the Golomb codes is when m = 2k. If m is a power of two, the code
for n consists of the k least significant bits of n, followed by the number formed by the
remaining higher order bits of n, in unary representation. This is exactly the same
representation as described above (minus the sign bit, as this derivation assumed n ≥
0), thus G2 k - codes are the same as Rice-codes as described. It also becomes apparent
why they are called GP2-codes. To match the assumption of a two-sided
exponentially (laplacian) distribution of the prediction residuals to the optimality of
Golomb-codes for geometric distributions, the predicion residuals $ in the range -%/2
& 0 & %/2-1 are mapped to values M($) in the range 0 & M($) & %-1 by:
$ 2! !"0
Eq. 27 M(!) = % ;[reference 10]
&2 ! #1 !<0
34
If the values $ follow a laplacian distribution centered at zero, then the distribution of
M($) will be close to (but not exactly) geometric, and can then be encoded using an
appropriate Golomb-Rice code9.
In a discrete laplacian distribution P($)=p0'|$| for prediction residuals are in the range
-%/2 & 0 & %/2-1 where 0<'<1 and p0 is such that the distributions sums to 1, the
expected prediction residual magnitude is given by
" / 2%1
Eq. 28 a! ," E [ $ ] =
#
&p ! 0
$
$ ;[reference 10]
$ =%" / 2
We are interested in the relation between the value of a',% and the average code length
L',k resulting from using the Golomb-Rice code Rk on the mapped prediction residuals
M($). In particular, we seek to find the value k yielding the shortest code length. It can
be shown [reference 11] that a good estimate for the optimal value of k is
In order to implement this estimation, the encoder and decoder maintain two variables
per context: N, a count of prediction residuals seen so far and A, the accumulated sum
of magnitudes of prediction residuals seen so far. The expectation a',% is estimated by
the ratio A/N and k is computed as
{
Eq. 30 k = min k' 2 k' N ! A } ;[reference 10]
9
To do this with a two’s complement representation is very simple, one left shift for positive values
and inverting the sign bit for negative values.
35
2.2.3.5 Pod-coding, a better way to code the overflow
Standard Rice-coding is very inefficient when the value k is not ideal. Any overflow
Ov that does not fit in the k-bit binary coded part is unary coded with Ov zeros
followed by a one. If these numbers are large, the code length will be very long and
the efficiency will suffer. An alternative is to use the Rice-preprocessing part of the
Rice algorithm (find a value k, store the k rightmost bits unchanged and encode the
owerflow), but to use another method to encode the overflow remainder [reference
12]. A code suited for this is the Pod-code10. Instead of using Ov zeros, the Pod-code
works as follows:
1. For 0, send 1
2. For 1, send 01
3. For 2-bit number 1Z, send 001Z
4. For 3-bit numbers 1YZ, send 0001YZ
5. For 4-bit numbers 1XYZ, send 00001XYZ etc.
It is no problem for the decoder to know how many bits WXYZ… to expect, it is one
less than the number of 0s which preceeds the 1. Thus, the prefix propery is
maintained. An integer of B significant bits encoded using the Pod-code is represented
in max 2B+1 bits, while the standard Rice-code will use 2B+1 bits. A comparison is
shown in table 4 (sign-bit is omitted for clarity).
10
The code described is a variant of the Elias-'-code, which itself is a variant of the Elias group of
codes, these will not be investigated in any further detail in this report. P. Elias: ”Universal Codeword
Sets and Representations of the Integers”, IEEE Transactions on Information Theory, is recommended
to the interested reader.
36
As the table shows, the gain when coding overflow values larger than 5 is positive.
When the parameter k is more than three bits off, Pod-coding will give better results
than Rice-coding. The potential loss in efficiency is small, just one bit inferior
performance for the overflow values 2 and 4.
This section will contain a quick introduction to the human auditory system, with
emphasis on the aspects relevant to perception-based compression. Then the relevant
compression methods will be introduced and explained
37
In the outer ear we have the ear itself and an external auditory canal, leading to the
eardrum. The eardrum is a membrane which resonates as air pressure varies. To
maintain pressure equality on the two sides, we have a canal (the eustachian tube)
leading down to the nose. Inside the eardrum, in the middle ear, we have three bones
functioning as a mechanical transformer. These three bones, the hammer, the anvil
and the stirrup, are the smallest bones in the entire human body. They connect the
eardrum to the oval window, the ”entrance” to the cochlea. The cochlea is a fluid-
filled chamber where resonances in the oval window are processed. Inside the
cochlea, the basilar membrane transports the resonances. A cross section is shown in
figure 17.
The basilar membrane is connected to the outer haircells which transforms resonances
into neural signals. The inner haircells provide a feedback to increase sensitivity. An
interesting propery of the cochlea is that it works as a spectral filter bank. High
frequencies excite resonances in the outer part, close to the oval window, while lower
frequencies excite resonances further inside. Thus different haircells transports
different frequencies and the system works like a bank of filters. The response might
look like shown in figure 18.
38
Figure 18 Cochlea filter response
In figure 18, the frequency axis is denoted ”Bark”. The bark-scale is a standardized
scale where each ”Bark” constitutes one critical bandwidth. The Bark-scale is defined
as a table, but good mathematical approximations have been done [reference 19]. The
critical bandwidth is defined as the width of a noise band beyond which
increasing the bandwidth does not increase the masking effect imposed by the
noise signal upon a sinusoid placed at the center frequency of the band. Which
leads to the concept of masking. A dominating tone will render weaker signals
inaudible. The distance in frequency between the “masker” and the masked
sound decides how loud the inaudible sounds can be (down to one critical
band). This is known as the masking threshold.
39
We are not able to hear anything below the masking threshold and this is what
perceptual audio algorithms exploit; if we can’t hear it, it can be removed. The
signal is divided down to small frequency bands using a filter bank. Then,
within each band, the signal can be quantized down until the noise level is just
below the masking threshold. As figure 19 shows, high noise levels are
allowable within each band and very significant data reduction can be
achieved. Furthermore, we see that the sensitivity of the ear is lower in the bass
and treble range than in the midrange (1-5kHz). The frequency-dependent
sensitivity of the hearing is quantified by the Fletcher-Munson diagram and
was proved in 1933. As a result, a lowering of the resolution would give
smaller degradation in sound quality if done in the bass and treble than in the
midrange. The Fletcher-Munson diagram, given in figure 20, also shows the
sensitivity is dependent on the loudness. The curves, called equal loudness
curves, show what sound pressure level we perceive being of a certain
loudness. The perceived loudness is denoted phon.
40
Figure 21 Temporal masking
Fascinating as it might be, the human auditory system has flaws that can be used to
reduce the amount of data without compromising audio quality. In general, lossy
compression algorithms introduce some degree of sonic degradation, how perceptible
it is depends on the application (high-end hifi-system or cheap computer speakers),
the level of compression and of course how good the algorithm is.
Recent advances in processing capability of home computers and digital devices (like
ASICs, DSPs and FPGAs) has however pushed the development of much more
sophisticated systems. The spearhead of this development has been the Motion Picture
Expert Group (MPEG) that made the basis framework for the current standard, MP3,
as well as other up-and-coming systems. However, other vendors like Microsoft and
Sony have also made their own systems. In recent times, even open-source
alternatives have become competitive, much due to the development of the Ogg-
Vorbis project, now believed to be at least on par with most commercial systems.
Generally, these algorihms allow for a reduction in file-size to 1:10 or less of the
original with minmal quality loss.
41
2.3.2.1 MPEG-based algorithms
Recent advances in processing power and the growing requirement for online
distribution of high-fidelity music has advanced the demand for even more elaborate
compression algorithms. Microsofts Windows Media Audio [reference 14] and Sonys
most recent ATRAC-algorithm [reference 16] use more advanced auditory models
than MP3. Also, the completely free and open-source Ogg-Vorbis [reference 21]
algorithm has gained a reputation for being significantly better than MP3. The
Fraunhofer Institute has however responded by launching AAC or Advanced Audio
Codec [reference 16], a system utilizing the much more sophisticated MPEG-2
compression scheme.
42
Figure 23 AAC compression block diagram
As the figure shows, AAC also uses TNS or temporal noise shaping, intensity stereo,
adaptive prediction and more in addition to the MP3 features. Research show that
AAC allows around 1,4 times better compression ratios than MP3 with the same
audio quality.
It is however apparent that none of these algorithms are suitable for implementation
on a simple MCU. Thus they are not applicable in the wireless loudspeaker system
this report documents. They will therefore not be investigated in any further detail
here.
Much simpler algorithms for lossy audio compression has existed long before the
introduction of MP3 and related systems. Back then processor power was very
limited, which forced quite crude models and calculations to be used. The result was
of course vastly inferior to modern systems, but in our application the required
compression ratio is very small (approximately 2:1), which makes high-fidelity
reproduction possible with much simpler schemes. While MP3 or other internet-audio
based algorithms must deliver almost CD-quality audio at 128kbps or even lower, we
can tolerate a system which is inferior at that bitrate, as long as it’s transparent11 at the
1Mbps (including overhead) the CC2400 RF-transceiver allows.
11
In the digital audio vocabulary, ”transparent” usually refers to ”no detectable quality degradation”. If
listeners can’t hear the difference between the uncompressed original and the compressed version of
the music in a blind test enviroment, the codec is said to be ”transparent”.
43
2.3.2.2 Differential Pulse Code Modulation (DPCM)
One of the simplest and fastest methods for lossy audio compression is differential
pulse code modulation or DPCM. This algorithm utilizes the fact that the ear is
sensitive for small differences when the volume is low, while, when the volume is
loud, we can not perceive subtle details to the same extent. Since there is no subband-
filtering, the noise level must be below the lowest masking threshold level (see figure
19) at any frequency (as compared to within the subband for algorithms with
filterbanks) for the compression to be transparent. Since the threshold is highly
dependent on the level of the signal, a non-linear quantization is performed, where
the quantization steps are fine for low values and coarse for large values. In addition,
the signal being quantized is the difference between adjacent samples, which have a
smaller probability of large values. As explained earlier, this is equivalent to a first
order predictor where the prediction residuals are the ones being coded. Of course,
more sophisticated predictors can be constructed to decrease the entropy further
before re-quantization. An example [reference 17], showing a 2:1 DPCM compression
(from 8-bit PCM to 4-bit DPCM) is given to illiustrate the algorithm.
The encoder shown in figure 24 calculates the difference between a predicted sample
and the original sample. To avoid accumulation of errors the predicted sample is the
previously decoded sample. The residual is then quanitized to 4-bits using a non-
linear quantizer and fed to the output. The quantization operation is shown in table 5.
By using 15 values for encoding, the code is made symmetric and a level in the binary
search tree can be omitted.
The decoding is very simple. The 4-bit word is requantized to 8-bits using a quantizer
with an opposite transfer function of the one given in table 5. Then the necessary
prediction is done (when the input is the difference between two adjacent samples, the
next output value is obviously the sum of the current output value and the next
difference).
44
Figure 25 DPCM decoder block diagram [reference 17]
One other thing should also be noted when regarding prediction in combination with
requantization: the predicted values are small when the differences between samples
are small and big when the differences are big. Small differences of course means low
frequencies, while big differences mean high frequencies. Thus a noise-shaping is
performed, where the noise is moved up in frequency. When one look at the equal
loudness curves or the masking curve in figure 19, it becomes evident that moving the
noise to high frequencies is a good thing. Also, the total noise effect will decrease
since less energy exists in the high treble range. Actually, prediction is equivalent to
delta-modulation, a technique often used in audio converters (delta-sigma converters)
where a low noise level in the baseband is desirable.
45
Figure 26 ADPCM general block diagram [referene 18]
46
Table 6 First table lookup for IMA ADPCM quantizer adaptation [reference 18]
Three bits quantized magnitude Index adjustment
000 -1
001 -1
010 -1
011 -1
100 2
101 4
110 6
111 8
Table 7 Second table lookup for IMA ADPCM quantizer adaptation [reference 18]
Index Stepsize Index Stepsize Index Stepsize Index Stepsize
0 7 22 60 44 494 66 4026
1 8 23 66 45 544 67 4428
2 9 24 73 46 598 68 4871
3 10 25 80 47 658 69 5358
4 11 26 88 48 724 70 5894
5 12 27 97 49 796 71 6484
6 13 28 107 50 876 72 7132
7 14 29 118 51 963 73 7845
8 16 30 130 52 1060 74 8630
9 17 31 143 53 1166 75 9493
10 19 32 157 54 1282 76 10442
11 21 33 173 55 1411 77 11487
12 23 34 190 56 1552 78 12635
13 25 35 209 57 1707 79 13899
14 28 36 230 58 1878 80 15289
15 31 37 253 59 2066 81 16818
16 34 38 279 60 2272 82 18500
17 37 39 307 61 2499 83 20350
18 41 40 337 62 2749 84 22358
19 45 41 371 63 3024 85 24623
20 50 42 408 64 3327 86 27086
21 55 43 449 65 3660 87 29794
88 32767
Figure 27 shows how the step-size adaptation works based on these two look-up
tables.
47
When the quantizer knows it’s stepsize, the quantization is done based on binary
search. Figure 28 shows a flowchart for the quantizer.
The adaptively quantized value is output from the quantizer. Since the lookup-table
can be stored in both the encoder and the decoder, no overhead in form of additional
information exists. Thus the compression ratio is constant and exactly 4:1.
A fortunate side effect of the ADPCM scheme is that decoder errors caused by
isolated code word errors or edits, splices or random access of the compressed bit
stream generally do not have a disasterous impact on the decoder output. Since
prediction relies on the correct decoding of previous samples, errors in the decoder
tend to propagate.
The decoder reconstructs the audio sample Xp[n] by adding a previously decoded
audio sample Xp[n-1] to the result of a signed magnitude product of the code word
C[n], the quantizer stepsize plus an offset of one-half stepsize.
In the second lookup-table each successive entry is about 1,1 times the previous entry.
As long as range limiting of the second table index does not take place, the value of
stepsize[n] is approximately the product of the previous value, stepsize[n-1] and a
function of the codeword F(C[n-1]). The above two equations can be manipulated to
express the decoded audio sample Xp[n] as a function of the stepsize and the decoded
samplevalue at time m, and the set of codewords between time m and n.
48
n #% i '%
Eq. 32 Xp[n] = Xp[m] + stepsize[m]! *%" $ F(C[ j]) ( ! C'[i]
%)
[reference 18]
i= m +1 & j= m +1
Note that the terms in the summation are only a function of the codewords from time
m+1 onwards. An error in the codeword C[q] or a random access entry into the
bitstream at time q can result in an error in the decoded output Xp[q] and the
quantizer stepsize stepsize[q+1]. The above equation shows that an error in Xp[m]
amounts to a constant offset to future values of Xp[n]. This offset is inaudible unless
the decoded output exceeds it’s permissible range and is clipped. Clipping results in a
momentary audible distortion but also serves to correct the offset term. The equation
also shows that an error in stepsize[m+1] amounts to an unwanted gain or attenuation
of future values of Xp[n]. The shape of the output waveform is unchanged unless the
index to the second table is range limited. Range limiting results in a partial or full
correction to the value of the stepsize. The nature of stepsize-adaptation thus limits
the impact of an error in the stepsize.
2.3.2.4 µ-Law
While DPCM and ADPCM are very popular for a few-bit representation, it is not as
suitable for lower compression rates (that is, many bits in the compressed output
values). For instance, an 8-bit DPCM encoder would have to search through a 256-
level quantization code table instead of a 16-level like table 5. With good search
algorithms, the search time can be minimized to twice that of a 4-bit, but still
alternative methods like adaptive quantization will give the same or higher
performance increase with a lower data rate and lower computational complexity.
ADPCM has fairly good performance, but the second lookup table would have to be
very large if a fair amount of adaptation were to be attainable with so many output
levels. In memory-critical applications, like MCU-systems, this is not a good way to
go. However, while IMA ADPCM is standardized for a 4-bit output, µ-law is an
adaptive algorithm developed and standardized for an 8-bit output (there is also a 12-
bit version). Thus it has become very popular in applications where higher bitrates are
allowed, but where the requirement for simple computation still prohibits algorithms
with large complexity. In digital telephony and audio DAT-recorders with a longplay-
option, µ-law is the standard algorithm in use (8-bit for telephony, 12-bit for DAT)12.
Like the DPCM-based algorithms, µ-law is based on fine quantization for low-level
signals and a more coarse quantization for loud levels (when the masking threshold is
also higher). But it uses an alternative approach where it compresses the dynamic
range of the signal during encoding and expands it again when decoding.
12
An alternative standard to µ-law, a-law, is used in some telephone systems. It is similar to µ-law and
has about the same performance. Since it has not been used during the work with this thesis it will not
be presented in any closer detail.
49
The standardized µ-law algorithm performs a 16-bit to 8-bit quantization by
employing the formula
!
Eq. 33 xˆ µ Qµ [log 2 (1+ µ x(n) )] ;[Reference 20]
As can be seen, the quantization depends on the input value. The roundoff will be the
discarded values to the right of the mantissa, which vary depending on where the
leftmost 1-bit is LSB is. Thus an exponential quantization has been applied, but
without using the comprehensive search routines of DPCM and ADPCM. Since the
number of bits saved is the same, the quantization stepsize depends on the input value.
The quantization can therefore in principle be said to be adaptive. It can be shown that
an 8-bit µ-law encoding has 13 bits dynamic range (the smallest value is when the
exponent region is ”00000001”, thus 3-bits will always be discarded), which means
it’s dynamic range is 78dB (ref. 6dB per bit rule) as compared to the 48dB for 8-bit
LPCM [reference 20]. However, the noise of any logarithmic quantization will of
course increase when the signal level increases.
Another advantage of µ-law is that the noise-levels are more evenly spread througout
the signal range than with DPCM. The maximum roundoff error (when all discarded
bits are ’1’-s) is 36dB below the sample value at any time, which is more than for
most DPCM tables (for instance, an expantion of table 1 to 8-bits would give a
maximum roundoff error just 6dB below the sample (or difference) value). In
addition, if the available processing power allows it, µ-law can easily be combined
with prediction for even better results.
50
51
Part 2
- Practical work –
- Documentation –
Thomas Alva Edison – in his laboratory at Menlo Park, New Jersey, 1883
52
3 Hardware Design
Designing the hardware for the wireless loudspeaker system proved to be significantly
more work than first anticipated. Finding components that matched all the
requirements for communication capabilities and data handling proved to be quite
difficult. A custom logic circutry for transfer of the audio data had to be designed.
The process of developing the hardware and the critical choices made in the different
stages are documented in the following sections.
A system for wireless loudspeakers must include some essential basic parts. The
transmitter has to receive audio signals, digitize and process them and transmit them
over an RF-modulated link. The receiver will receive, decode and convert the signal
back to it’s original analog form. In addition, if the signal source is a CD-drive with
digital output, a SP-dif reciever might be included in the transmitter.
For this project, the RF-transmitter and reciever were both predecided to be
implemented with the Chipcon CC2400 RF-tranceiver. Control and data transfer to
and from the tranceiver has to be done with a microcontroller unit (MCU). Since low-
cost was of the essence in this project, a separate digital signal processor (DSP) for
audio processing was not an option. Thus the audio processing also has to be done by
the MCU and it must be reasonably powerful yet cheap.
In addition, some control logic will be necessary to ensure the right timing and data
transfer between the main units. Since we want a single-clock system it is also
53
preferable if the MCU or the control logic generates the required clocks and the audio
devices slaves off these. The CC2400 however needs it’s own clock, but this will not
interfere with the rest of the system.
54
3.1.2 Audio codec
To make the design flexible an integrated audio codec was preferred over separate
converters. The system can then have all analog inputs as well as outputs on a single
chip regardless of whether it’s a receiver or a transmitter. Thus true bidirectional
modules can be designed. To have the reference design ready for duplex
communication, it was also regarded as advantageous if the system could include a
microphone input and a headphone output. Under normal circumstances this would be
implemented with opamps. However, some codecs offer a very high level of
integration and have analog amplifiers for mics and headphones, some even
loudspeakers, built in. The cirteria for choosing a audio codec were:
Two codecs were under serious consideration, the AKM AK4550 and the Texas
Instruments TLV320AIC23B. They are briefly presented and compared in table 8.
13
Basic ADC and DAC performance parameters are explained in appendix 2.
55
As the comparison shows, the TLV320 is a bit more advanced than the AK4550. The
specifications are also better, especially for the DAC which features a 100dB dynamic
range. The microphone input includes a low-noise bias supply for electret
microphones (often called phantom power) and the headphone output is compatible
with standard 32Ω and 16Ω loads. The unit outputs audio data according to the I2S-
standard and is configured by a system processor over a three-wire (SPI) or two-wire
(I2C) compatible control interface. The control interface allows for many additional
functions, like volume control, power down and audio path control.
Even though the TLV320 is a bit more expensive, it’s higher degree of integration
would probably make the total system cost lower, since opamps for microphone and
headphone amplifiers, components surrounding these as well as PCB area will be
saved. Thus, the TLV320 ended up being the preferred audio codec.
56
3.1.3 SP-dif receiver
When transferring audio digitally it makes sense to design a system which can receive
digital signals from external sources. The most common digital output on CD-players
is the SP-dif interface. In addition to audio data, a SP-dif frame also contains other
information. For details on SP-dif or other formats used in this thesis, see appendix 1.
To decode the information content in the MCU would demand too much resources.
The SP-dif voltages are also not compatible with standard digital TTL or CMOS
levels, therefore an external receiver is necessary. Preferably, it should have a sample
rate converter and be run on an external clock so the data transfer can easily be
synchronized with the MCU. Resamplers are usually quite expensive, but since it is
meant to be optional, one could choose between a version with digital input and one
without, this is acceptable at least for the prototype. Thus, the criteria for choosing a
receiver are:
There is currently two sample-rate converters with integrated SP-dif receivers widely
available on the market; the Analog Devices AD1892 [reference 25] and the Crystal
Semiconductors CS8420 [reference 26]. Both units offer state of the art performance
(dynamic range in the 130dB range) and both have an arbitary sample rate conversion
factor so the clock frequency can be chosen independent of the input sample rate. The
AD1892 has a fixed output sample rate of fCLK/512, while the CS8420 lets the user
choose a factor of 256, 384 or 512.
Although widely available, both chips are aimed at the high-end hifi market and are
therefore quite expensive. But an even bigger problem is that they are both made in a
5V process. Without I/O voltage conversion they can not be used in a 3.3V system.
If one has to integrate the receiver and the sample rate converter on the same circuit
there is only one 3.3V chip that does the trick, the AKM AK4122 [reference 27]. Like
the AD1892 and the CS8420, this is an integrated asynchronous sample rate converter
and receiver which accepts SP-dif at 32kHz, 44.1kHz, 48kHz or 96kHz and outputs
an I2S-compatible data stream at an arbitary sample rate of between 32kHz and
96kHz. However, the chip is not yet (Q2 2004) in mass production and only
engineering samples are currently available.
Since suitable sample rate converters are not available one has to look at stand alone
receivers. Since the MCU will control the clock signals generation and the data
transfer to it from either the codec or the receiver, the receiver should have some sort
of slave mode. Usually receivers regenerate the incoming clock from the SP-dif
signal, and through a PLL generates an output clock which is used to control the unit
it transfers data to, usually a digital filter or a DAC. In this application however, the
data is to be transferred to a MCU which has its own clock. The MCU has to work
57
even when the SP-dif receiver is not connected or does not receive data on it’s input,
and then it can’t be slaved off the receiver. The MCU needs to be the master, and the
receiver needs to be the slave.
There is a 3.3V SP-dif receiver with such a slave mode, the Crystal Semiconductors
CS8416 [reference 28]. It does not have a sample rate converter, but in slave-mode
the LR-clock and bit-clock are inputs which are used to clock data out on the I2S-bus.
If they drift apart from the SP-dif input clock the circuit will either skip or repeat a
sample to get back on track.
The method for managing slave mode is called ”slip/repeat behavior” [reference 28].
An interrupt bit, OSLIP, in the Interrupt 1 Status register is provided to indicate
whether repeated or dropped samples have occurred. After a fixed delay from the
Z/X-preamble, the circuit will look back in time until the previous Z/X-preamble and
check if one of three possibilities occurred:
1. If during that time, the internal data buffer was not updated, a slip has
occurred. Data from the previous frame will be output and OSLIP set to 1.
OSLIP will remain 1 until the register is read. It will then reset until
another slip/repeat occurs.
2. If during that time, the internal data buffer dit not update between two
positive and two negative edges of ORLCK a repeat has occurred. In this
case the buffer data was updated twice, so the part has lost one frame of
data. This event will also trigger OSLIP to be set to 1. It will remain 1 until
the register is read.
3. If during that time, it did see a positive edge on ORLCK then no slip or
repeat has happened and the OSLIP will remain in it’s previous state.
If the user reads OSLIP as soon as the event triggers, over a long period of time the
rate of occuring interrupts will be equal to the difference in frequency between the
input SP-dif data and the master’s serial output LRCK. To avoid excessive slip/repeat
events due to jitter14 on the LR-clock the CS8416 uses a clock hysteresis window.
14
Jitter is explained in appendix 2
58
3.1.4 Selection of microcontroller
To find a suitable microcontroller was definetely the most complicated task when it
came to hardware design. There are many architectures to choose from and the
requirements cannot be fully established since the software at that stage is not yet
written. So one has to make an estimate of how demanding the application will be,
and then add some headroom to be on the safe side. In addition, the microcontroller
will have to meet the requirements set by the other hardware in the system, like I/O
capabilities and supply-voltages. It has to be easy to implement in circuit, compilers
and other development tools must be available, it’s preferable if the architecture is
fairly standard, it must run at suitable clock speeds and last but not least, it has to be
low-price and widely available.
The microcontroller will have to tranfer data to or from the CC2400 and the codec or
the interface in real-time while doing compression. The sample speed is 44.1kHz, so
the data rate from the codec/interface will on average be 1.41Mbps while the data rate
to the CC2400 on average will be 1Mbps. How much resources the data compression
will use is unknown, but a study of excisting algorithms showed that the fastest ones
need between 25 and 35 instructions per sample when running on a 16-bit processor
[reference 2, 29]. In an 8-bit architecture arithmetic operations on 16-bit numbers will
be significantly more demanding, so close to 100 instructions per sample is a crude,
but fair estimate. A typical serial data transfer requires approximately 8-10
instructions per register transfer [reference 31], which translates to a bit more than one
instruction per bit in an 8-bit architecture and the half in a 16-bit architecture. In
addition the MCU will have to do control routines, the timers will be used to
genereate clocks and so on, so quite some headroom must be calculated. For stereo
16-bit/44kHz audio this lead to an estimate shown in table 9.
One must keep in mind that this is a very crude estimation and just a guideline. The
algorithm can be simplified or improved, different MCU architectures may use less or
more instructions to transfer data and so on, so one cannot discard a MCU just
because it is slightly below the estimate given in table 9. But it cannot be too far
away, a 2 MIPS processor will not do the trick. Also, since the codec and interface
need a clock speed of at least 256fS, the MCU should be able to run at this frequency.
Can it run even at 512fS, this would be advantageous, but it is not a requirement.
59
3.1.4.2 Memory requirements
A study of existing lossless audio codecs showed that a frame size of between 576
and 1152 samples is commonly used [reference 2]. For a stereo signal this translates
to approximately 2-4 kBytes of memory usage. In a microcontroller this has to be
decreased, but not too much since the overhead from any frame headers should not be
too significant. An estimated ”least useful frame size” of 64 samples is defined. This
translates to 1.45ms of music or 256 bytes of memory when uncompressed. A
compressed frame will require an estimated 180 bytes (including overhead) since the
maximum transfer-rate is 950kbps, with double-buffering and some overhead this
should require approximately 400 bytes (double-buffering is needed because the
sample rate in lossless compression will vary, while the transmission rate through the
CC2400 will be constant). In addition some headroom must be given to allow other
variables, tables and such in the software.
In addition one must take into consideration the requirement for program memory.
The size of the program itself, as well as possible constants or look-up tables will
determine the need for program memory. The object code for the smallest algoritm
available, MusiCompress [reference 29] takes up a total of 14.8kBytes for
compression and packing (decompression and unpacking uses 9kBytes). The MCU-
algorithm will probably be more simple, but communication with the codec and the
CC2400 will also need some room for implementation.
60
3.1.4.3 I/O requirements
The microcontroller needs to communicate with both the CC2400 and the audio codec
or receiver simultaneously. The data transfer to both has to be synchronous. This can
be done in two ways. Either the MCU can have two SPI (Serial Peripheral Interface)
ports or the data from one of the units can be parallelized and transferred through a
general I/O-port. Since the data from the audio units consists of fixed wordlength
samples it makes most sense to parallelize the data from them.
To convert the serial data to parallell and vice versa can be done via parallell-to-serial
and serial-to-parallell shift registers, for instance the 74HC166 and the 74HC4094.
The I/O must also be capable of the necessary transfer rates, 1.41Mbps to or from the
audio unit and 1Mbps over SPI to or from the CC2400. In addition it will need a I2C-
interface (or a free SPI-interface) to be able to configure the preferred audio codec.
61
3.1.4.4 Evaluated microcontrollers
In the process of finding the right microcontroller, serveral was taken into
consideration. When evaluating microcontrollers, the following factors have been the
most important:
Based on these criteria, numerous alternatives were evaluated. The following sections
will give a brief description and comparison of the microcontrollers that ”made it to
the final round”. The MCUs listed in table 10 were seriously considered and closely
studied before a final decision was made.
15
Prices are found either from the manufacturers website or www.digikey.com. Prices are in quantities
of 100, except Renesas R8C/Tiny for which price is given for a quantity of 1000.
62
3.1.4.4.1 Atmel AVR Mega169L and Mega32L
Early in the hardware design phase, Atmel AVR was considered the most likely
achitechture to use in the WLS. It’s a much-used processor-series with good
performance at a resonable price. It’s also easy to program with a small and efficient
instruction set. The AVR-architecture has the disadvantage of being 8-bit, but at 1
MIPS per Mhz, the performance is still good.
The only AVR-series with enough memory is the Mega-series ranging from the
Mega16 with 1kByte RAM and 16kByte flash to the Mega128 with 8k/128k. The
”L”-units are 3.3V-compatible and thus the only ones which can be integrated into the
system with ease. The Mega169L [reference 30] and Mega32L [reference 31] were
considered to be the most suitable.
While the Mega169L has the advantage of two SPI-interfaces it also has the
disadvantage of having only half the memory compared to the 32L. Even if the
communication was made easier by being able to opt for the scheme shown in figure
33a), there was always the question of memory. The 32L on the other hand has more
memory and is also both cheaper and comes in a smaller package, so the extra cost of
some external logic will probably balance out. Thus the Mega32L was considered the
best of the two.
However a more crucial problem became apparent when it came to speed. The ”L”-
versions are both rated at only 8Mhz. Although the 169L has a typical performance of
12Mhz at 3.3V [ref 31, figure 138], Atmel will not guarantee stable operation at this
frequency.
If the MCU is to be run at 8Mhz, much external logic is needed to generate the 256fS
clock for the audio circuits. In addition the timing between the MCU and the audio
circutry would be more complicated. The alternative would be to divide the 256 fS
clock and run the controller at 128 fS or 5.64Mhz. But with an 8-bit architecture and
5.6 MIPS, the speed requirements are not met. Therefore other alternatives had to be
taken into consideration. The AVR Mega169L/32L suitability is summarized with
advantages and disadvantages.
63
3.1.4.4.2 Texas Instruments MSP430F1481
The MSP430 however has the same problem as the Mega-L-series, it is not rated
faster than 8Mhz. Thus it has to run at 128 fS or 5.6Mhz to avoid excessive external
logic. Although this speed limitation is the same as for the AVR, the TI is still
considered better performance-wise since it has a 16-bit achitecture. Since all register-
to-register instructions are 1T-executable, it will probably exceed five 16-bit MIPS,
which is close to the performance requirement. Another drawback with the MSP430 it
that it has no I2C-compatible interface. Because of this, one SPI interface must be
used for the CC2400 and one for the TLV320, and the audio transfer scheme would
still have to be the one from figure 33b).
+ 16-bit RISC-architecture
+ Fast and efficient instruction set
+ Easily meets memory requirements
+ 2 SPI-interfaces
+ Very low price.
- Can not be run at frequencies above 8 Mhz
- Not widespread standard
- No two-wire / I2C interface.
64
3.1.4.4.3 Motorola DSP56F801
The instruction set [reference 34] is somewhat more complex than for your basic
MCU since it also includes some signal processing instructions. However a single-
instruction 16-bit barrel-shifter, a single-instruction 16x16 multiplier and two 36-bit
accumulators included in hardware simplifies mathematical operations significantly.
The major disadvantage of the DSP56800 is that it is a relatively new processor
family and that and code examples and reference designs are not as available as for
more established architectures. Also, the development tools are very expensive.
+ Very powerful
+ Built-in DSP-features
+ 16-bit architecture
+ Meets memory and I/O requirements
+ Efficient C-compiler
+ Small package and single power-supply
+ Competitive price
- Relatively new architecture, not as established as others
- Development tools are very expensive.
65
3.1.4.4.4 Hitachi/Rensas R8C/10 Tiny
The Hitachi R8C/10 [reference 35] is a powerful 16-bit microcontroller built into a
small 32-pin package. Still it can be run at up to 16 Mhz and features a SPI-interface
as well as 21 general I/O-ports. The price is very low, four times below the Atmel or
Motorola-units, and the small package makes it ideal for portable solutions.
There are three alternative memory configurations of the R8C/10 with the biggest at
16kB/1kB. This is the same at the minimum set in the requirements. In addition to
this, there seems to be another major problem with the R8C/10; availability. It is not
easy to find from other distributors than Rensas themselves, and the selection of
development tools seems small. Compilers and debuggers are not freely available and
litterature, code examples and other practical information is very difficult to come
across. Also, it does not meet the IO-requirements when using the TLV320 since it
has only one SPI-interface and no I2C.
The R8C/10 seems to have great potential, but some uncertainties makes it a bit risky
to include in a reference design without prior knowledge to the processor family.
66
3.1.4.4.5 Silicon Laboratories C8051F005
The 8051-architecture is very well established and compilers and other tools are
widely available. Silicon Laboratories also offer development kits at very reasonable
prices. The only disadvantage of the C8051F005 is that it, like most others, only has
one SPI-interface and that the chip itself is more expensive than the rest. If 16 I/O-
ports are sufficient, a slightly cheaper but otherwise identical model, C8051F006, is
available in a 48-pin package.
67
3.1.5 Conclusions:
The process of finding the right components was extensive but ultimately rewarding
work which gave insight into the hardware market as well as experience in evaluating
possibilities and limitations of different kind of circuits.
The CC2400 was decided to be one of the components in advance, since the target
application is a demonstration system for just that chip. The decision to use the
TLV320 audio codec was also made at an early stage since it met all the requirements
and also is highly integrated and thus quite easy to implement in circuit.
Finding the right SP-dif interface and microprocessor however, was a more difficult
task. An SP-dif receiver with an integrated sample rate converter was initially thought
to be the solution, but no such circuits are available for 3.3V supply voltages. The
arrival of the AK4122 can change this in the near future. But for now, a receiver
without sample rate conversion must be used. The Crystal CS8416 seems to be the
most suitable one since it features a slave-mode and also 3.3V operation. However,
when the AK4122 arrives this is highly likely to be preferable.
As far as microcontrollers go, there are so many models and architectures to choose
from, and so many factors to take into account, that one just has to cut through to ever
get done. Consequently a few models were moved on to ”round two” and evaluated
furher. They are the ones presented in this document.
The final decision fell on the Silicon Laboratories C8051F005, due to it’s
performance, availability, low-cost tools and well-known architecture. The great
performance and competitive price also makes the 16-bit Motorola DSP56F801 a very
strong contender, especially if software upgradability is taken info consideration. The
Motorola is probably powerful enough to run more advanced audio algorithms, like
subband filtering or even Ogg-Vorbis fixed-rate lossy compression, in real-time. But
the unit is less widespread, development tools are much more expensive and
litterature is scarse, so opting for a 8051-architecture was considered the safest bet.
It should also be mentioned that although the price given in table 10 seems very high
compared to the others, my instructor at Chipcon informed me that very good deals
could be made with the distributor, which would make it much more competitively
priced. This also had significance for the final decision when it was made.
68
3.2 Audio transfer to MCU
The preferred MCU, Scilicon Laboratories C8051F005 has only one SPI-port which
will be occupied by the CC2400 RF-module. Since the datarate to and from the audio
codec or SP-dif device is more than 1.4Mbps, creating a second SPI in software will
put to much strain on the processor. A different scheme is proposed where the data is
converted from serial to parallell form and sent word-wise to the microcontroller. The
microcontroller will read or write 8-bit words on it’s IO-port and appropriate logic
will be implemented to convert it to serial form.
The principle for the communication shceme is shown in figure 35. The data is
transformed from serial to parallell form, so the MCU receives or transmits SD[15..8]
in one read/write and SD[7..0] in the next.
69
Figure 35 Principle for data transfer between audio device and MCU
The control signals will tell the serial to parallell interface when to latch data onto the
8-bit bus (the data-flow from the I2S-interface is continuous) when data goes from the
audio device to the MCU and when to read data from the bus when the flow is in the
opposite direction. There also has to be control signals to the MCU so it knows when
to write or read data on it’s IO-port and also so it knows if it is the left or right
channel data it is dealing with.
To make the data transfer possible, appropriate logic devices had to be found. The
74HC4094N 8-stage shift-and-store bus register [reference 37] is ideal for converting
data from serial to parallell form. It has a serial input and a strobe input. For each
clock tick the data on the serial input is shifted one step to the right in the shift
register. When the strobe is set high, the data in the 8-stage shift register is latched to
the 8-bit storage register. Whenever the output enable signal OE is high, the contents
of the storage register is available on the parallell outputs. When OE is low, the output
is in tri-state. This is shown in figure 36.
70
Figure 36 Simplified schematics, 74HC4094N [reference 37]
To use this device to transfer data from the audio device to the MCU, a control signal
is needed for the STR-input. The strobe signal has to be set high when a complete set
of data is shifted to the input. This is shown in figure 37.
As can be seen, a STR-pulse is needed every eigth BCLK cycle. Since the
74HC4094N holds it output vaule constant when STR is low, the MCU can read the
data at any given time before the next STR-pulse. A delayed STR-signal, for instance
by one BCLK cycle can thus be used to interrupt the MCU to make it read it’s IO-port.
The falling edge of STR provides an ideal interrupt source.
To transfer the data from the MCU to the audio codec the 74HC166N 8-bit parallell-
in/serial-out shift register [reference 38] is used. It latches in an 8-bit word on the
input and shift it out serially, MSB-first. The device is activated with an active low
/CE-signal and the data is latched in using the /PE-input. A logic diagram of the
circuit is shown in figure 38.
71
Figure 38 Logic diagram, 74HC166N [reference 38]
The I2S audio device reads the SDTI input on the positive edge. To assure valid data
with good timing margins on the I2S-interface, the data on the SDTI-input should
change state on the negative clock edge and have a stable, valid value on the positive.
This can be seen from figure 34. Since the 74HC166N shifts data out on it’s positive
clock edge it should therefore be run on an inverted clock. Then the timing diagram
will be as shown in figure 39.
The arrows indicates when the audio device reads the SDTI-data. The data is valid in
this instance and there is significant time to or from the next transition on SDTI. Thus
the timing requirements are not very stringent. The requirement for the MCU is that it
has valid data on it’s outputs before /PE is low. The falling edge of STR can therefore
provide the interrupt source for the write too.
72
3.2.2.2 Design of logic to create necessary control signals
The control signals that needs to be generated are strobe and /PL in addition to the
SCLK and LRCK signals. At first I intended to use the PWM-outputs from the MCU
to generate these signals, but this proved to be unfeasible. The C8051 has a
programmable counter array (PCA) consisting of 5 separate capture/control modules
that can provide separate PWM-outputs. These are all controlled by a single PCA
counter/timer. The low-byte of the counter register is compared to a user defined
value to provide a PWM output with selectable dutycycle and a frequency of 256fT,
where fT is the timebase frequency of the counter (see reference 36, chapter 20 for
details). Since the maximum fT is SYSCLK/4, the maximum PWM-frequency is
SYSCLK/(4·256) or fS/2. SCLK, /PL, STR and LRCK must run at 32fS, 8 fS, 8 fS and fS
respectively, and it’s then impossible to generate them using the PWM-outputs of the
C8051.
Both the strobe and /PE signals are active only every eighth SCLK cycle. Without the
possibillity of using PWM, an external counter is needed to generate them. A gate on
the output can give a high value when the counter has a specific value (e.g. ”000” or
”111”) and a low value otherwise. The /PE is delayed one half clock cycle with
respect to STR and also inverted. This does not have to be done externally since the
74HC166N is running on an inverted clock and therefore detects /PE one half clock
cycle later. A fast ripple counter, like the 74LV4040, can also be used to create the
SCLK and LRCK when it is clocked with the master clock. Since the master clock is
512fS, SCLK is 32fS and LRCK runs at fS, the scheme proposed is as shown in figure
40.
The 256 fS output provides a master clock signal for the audio device. The bitclock
BCLK and it’s inverted /BCLK are given by the ripple counter b[5..7] = ’1’. The
output b8 provides a SCLK/16 signal that will be used to tell the MCU if it’s the
MSW (most significant word) or the LSW (least significant word) of a sample that is
being transferred. If LRCK and SCLK/16 are both ’1’, it’s a right channel LSW, if
73
[LRCK, SCLK/16] is [’1’,’0’] it’s a right channel MSW, [’0’,’1’] is a left channel
LSW and [’0’,’0’] is a left channel MSW.
The control signals are shown in figure 41. The very high frequency MCLK and 256 fS
are omitted for clarity.
As we can see, STR is high and /PE is low at the critical points, when their respective
circuits are supposed to latch and load data.
These are all low-cost circuits, so compared to a MCU with two SPI-interfaces (which
would also need some logic to be made I2S-compatible) the extra cost in hardware is
not significant. Another alternative is to integrate all of this into a small and cheap
CPLD, for instance the Xilinx XE9536. The cost would then be the CPLD plus the
connector needed to program it.
74
3.3 Circuit design
After deciding which components to use and developing the communications system,
the next step was to design the complete circuit. The system consists of a total of eight
IC’s; the C8051F005, the TLV320, the CS8416, the CC2400, the 74LV4040, the
74HC27, the 74HC4094N and the 74HC166. The block diagram is showed in figure
42.
This diagram is highly simplified, although all major signals and buses are included.
The thickest lines are buses and the thinnest clock lines, the normal ones are signal
lines. As can be seen, there is a fair amount of routing to be done, especially between
the MCU and the audio units and logic. The switch indicates the analog/digital input
selectors. Rather than opting for a electronic selector, like a mux, jumpers are used
since they were necessary to include some other functionality anyway.
75
3.3.1 Configuration of the SP-dif receiver.
The configuration of the Crystal Semiconductor CS8416 SP-dif receiver was done in
accordance with it’s datasheet. The unit has 8 SP-dif inputs routed through a 8:2 input
MUX, but only one input was used in our application. To keep the physical
dimensions small and to avoid extra cost, the possibility to use more inputs was not
utilized. To simplify implementation, the stand-alone modus is used, so the MCU
does not need to use resources communicating with the receiver. The input-select pins
are hardwired to choose input 0, while the indicator outputs, with the exception of
/AUDIO, are not used. The /AUDIO output indicates if there is valid data being
received and is connected to a general I/O-pin on the MCU so this can know when a
signal is coming. The connection of the chip is as shown in figure 43. The SP-dif
input is terminated with a 75Ω load resistance as specified by the SP-dif standard.
Special care should be taken when routing the PLL-filter. This is very sensitive to
stray capacitances. To achieve correct filter characteristics and thus good jitter
performance, the layout should be like shown in figure 44. Ground connection for the
PLL filter should also be returned directly to AGND independently of the ground
plane.
76
Figure 44 Recommended filter layout [reference 27]
If this recommendation is followed, the PLL in the CS8416 should provide very good
jitter attenuation.
The line inputs and the mic-input is set up and filtered as recommended in the
datasheet and the electret biasing output is connected to the mic-input so the system
can be used with all kinds of microphones. It is connected through a big resistor (10k)
to prevent the DC-voltage inflicting damage on dynamic microphones.
The headphone output however, was changed slightly from the recommended layout.
In their reference design, Texas Instruments used 220µF decoupling capacitors.
However, simulations showed that this would compromise bass performance when
used with a low-impedance 32Ω or 16Ω headphone. Since the system is supposed to
have high-fidelity performance, a frequency response convering the entire audible
range from 20hz-20khz (-3dB) is desirable. The capacitor size had to be increased.
Figure 45 shows SPICE-simulations with two widely available alternatives, into
standard 32Ω and 16Ω headphone loads.
77
Figure 45 220µF, 330µF, 470µF decoupling caps frequency response, 32/16Ω load
The 220µF capacitor gives a 4dB drop at 20hz, which is outside of specification even
with a 32Ω headphone. 330µF gives almost 2dB while 470µF leads to just 1dB drop,
well within the demands. With a 16Ω load only the 470µF cap fulfilled the spec.
However, from our supplier 470µF capacitors turned out to be much larger physically
than 330µF. Because of this, and also since 16Ω headphones are quite rare, the middle
value was chosen as a compromise. The complete connection of the TLV320AIC23B
is as shown in figure 46.
78
3.3.3 Configuration of the RF-transceiver
The Chipcon CC2400 RF-transceiver is in this application set up identically to the
2400DB demonstration board. The microcontroller interface is connected for
hardware packet handling support. This allows for hardware insertion of preambles,
sync-words and CRC in the data stream by the CC2400. If this does not need to be
utilized, the relevant pins can just be ignored by the MCU. It uses it’s own 16Mhz
crystal and two voltage levels (1.8V core and 3.3V IO). Data transfer and
communication is, in addition to the pins used for packet handling, done through a
standard SPI-interface, connected to the MCUs SPI-pins.
79
3.3.4 Configuration of the MCU IO
The C8051F005 IO-system uses a Priority CrossBar Decder to assign the internal
digital resources to the IO-pins. This gives the designer full complete control over
which functions are assigned, limited only by the physical amount of IO-pins in the
selected package. A block diagram of the system is displayed in figure 48.
The CrossBar assigns the selected internal digital resources to the IO-pins based on
the Priority Decode Table [reference 36], shown in figure 49. It starts at the top with
the SMBus, which means that when it is selected it will be assigned to P0.0 and P0.1.
The decoder always fills IO-bits from LSB to MSB starting with Port 0, then Port 1,
finishing if necessary with Port 2. If a resource is not used, the next function in the
priority table will fill the priority slot.
80
Figure 49 C8051F00x priority decode table [reference 16]
In the design of the wireless audio system, the SMBus will be used to configure the
audio codec, thus it must be assigned. Next, the SPI-interface will be used to send
data to and from the CC2400 RF-transceiver. The UART will not be used, neither will
the timer outputs, since all control signals are generated by external logic. The
interrupt input /INT0 will be used however, since the MCU must receive an interrupt
when sending or receiving data. The /INT1 is used by the Chipcon CC2400. The
/SYSCLK output will also be used to clock external circuits, while the rest will be
unused. This results in a configuration of the CrossBar Decoder as shown in figure 50.
81
Figure 50 Configuration of MCU IO CrossBar Decoder
The IO-pins P0.0 to P0.7 will be assigned to digital functions as shown in figure 50,
while the rest of the ports will be general IO (GIO) ports used to transmit and receive
necessary data and other signals. The complete circuit schematics (appendix 4) shows
the entire allocation of the IO-pins for the MCU and the complete connections for the
circuit.
82
3.3.5 The finished circuit
The complete circuit with all connections are shown in figure 51 (a bigger, higher
resolution version is found in appendix 4). For clarity some connections are shown as
buses.
In addition to all the circuits including logic, the power supply and analog connections
are also shown. Some of the lines are routed through a 10-pin connector to provide
extra flexibility. One jumper can choose between normal mode and digital loopback.
In the latter mode, the audio output from the codec is fed back directly to it’s inputs.
This gives the user the opportunity to test if the codec works, if it’s properly set up
and so on without having to connect or program the entire system. This should also
enhance the circuit’s testability significantly, since locating errors will be much
easier. If the digital input is selected, this is routed back to the codec in loopback
mode. The second jumper selects master or slave mode for the MCU, while the third
one is a digital/analog input selector. The jumper settings are shown in figure 52.
83
Figure 52 Jumper settings
To further enhance testability, several zero-ohm resistors are put on critical lines. In
addition, the circuit has two logic analyzer connections, compatible with the standard
logic analyzer port of figure 53.
The pinout is such that the logic analyzer can be used to both monitor all critical
signals under operation, but also to take directly control over the audio codec and the
SP-dif receiver if it is necessary during testing. This is useful if for instance the MCU
for some reason fails to provide the clock signals or control signals necessary to
operate the other devices and thus test them. The complete logic analyzer port pinout
is shown in figure 54.
84
Figure 54 Logic analyzer connections
In addition, there are two LEDs in the circuit to indicate power-on and /AUDIO from
the SP-dif receiver respectively. Also, a third LED is connected to a MCU IO-port
and can be used for whatever the user finds desirable.
85
4 Analysis of Lossy Compression Algorithms
& 1 N # & N #
( "
N i=0
(x[n]) 2
% ( " (x[n])2 %
Eq. 34 SER = 10 ! log ( % = 10 ! log ( i =N0 %
(
N
1 2 % ( 2 %
(
'
" (e[n]) %$
N j=0 (' " (e[n]) %$
j=0
The analysis was done with a file called ”littlewing.wav” a recording of myself
playing guitar. The recording has much dynamics, so performance could be evaluated
at both low and high signal levels, a fairly wide spectrum but even more importantly,
a very clear and unedited sound. When doing subjective listening-based quality tests it
is important to have a reference that sounds both natural and familiar. Then distortion
and colouring of the sound can more easily be identified since one knows how it’s
supposed to be like. The necessity of subjective listening tests is obvious. Although
the SER together with an error plot gives a good indication of how much loss there is,
it tells quite little about the nature of the loss. Lossy compression algorithms use
perception-based models, whose quality can affect the resulting fidelity significantly,
even if the loss is the same in absolute quantity.
Figure 55 shows the waveform and spectrum of the used test file.
86
Figure 55 Waveform and spectrum, "littlewing.wav"
To put the numbers into perspective, the tests are first presented on 8-bit and 4-bit
LPCM requantization of the audio data. When doing LPCM quantization, the 6dB per
rule tells us the maximum acheivable SNR, the resolution, is 6"B, where B is the
number of bits. Since LPCM quantization does a random roundoff, the noise is almost
white and the level is thus constant and about 6"B dB below the maximum signal
level. For a maximum level signal, the SER would then be identical to the resolution,
but for normal music signals it will be significantly lower as the results for
”littlewing.wav” show.
87
Table 11 Performance, 8-bit and 4-bit LPCM
8-bit LPCM 4-bit LPCM
SER 28.8dB 8.3dB
Maximum error 0.004 0.07
As we can see, the SER is well below the resolution. This is of course because the
signal level and thus signal power is lower than maximum while there is no related
shaping of the noise. We can clearly see that the quantization noise is white, at least
for the 8-bit version. For 4-bit there actually is some visible correlation between the
signal and the noise. It can be shown that LPCM quantization noise in reality is not
completely white, but does produce some distortion, especially for low-level signals
or very coarse quantizations. Since distortion sound worse than white noise this is
often compensated for by adding random noise, also called dithering16.
It should also be noted that the noise is not in any way psychacosutically shaped.
When the signal level is low, the masking threshold is also low, but the noise is still
high. It is then very audible. Perceptive-based shaping of the noise can provide
significant improvements in audio fidelity, even when the SER value is the same.
Both 8-bit and 4-bit LPCM is classified as low-fidelity.
The source-code is given in appendix 6. As one can see, the quantization spaces are
small for low levels and very large for high levels. It’s therefore assumed that the
DPCM will perform poorly when the levels (or rather, the differences, since first
order prediction is used) are high. Since some music recordings are very dynamic, it’s
likely that DPCM will be less suitable in a hifi-application than for voice-coding,
where the levels are usually quite low. The algorithm was tested for performance
using ”Littlewing.wav”.
As expected, the 4:1 DPCM compressor was very fast, but did not perform well when
it came to audio quality. Especially for loud signals, the quantization error is huge (as
can be seen by looking at it’s exponential quantization table) and the distortion is
clearly audible. At low volumes, the noise-level is far better than for 4-bit LPCM
16
See appendix 2, ”Data converter fundamentals” for details
88
quantization, and the music quality is improved somewhat. But the error ”bursts” as
seen in figure 57 are far above the masking threshold and are clearly audible.
17
It can be shown that if the variance in the difference between samples (i.e. the predicted residual),
* ∆x, is larger than the variance of the samples, *2x, prediction will give more distortion since the bit-
2
rate/distortion ratio is dependent on variance. Also, the nonlinear quantization can yield worse results
when the signal is in the area where the quantization steps are larger than the linear ones (i.e. >4096).
For most music signals however, one would be likely to get an improvement with DPCM over LPCM
and for speech signals even more so.
89
4.3 Analysis of IMA ADPCM
The algorithm written to test ADPCM was made compliant with the IMA ADPCM
standard. The reader is referred to the IMA ADPCM theory chapter and the source-
code for a more detailed insight in how it is made. It was tested with the same file as
the DPCM algorithm for a subjective (listening test) and an objective (Matlab)
evaluation of audio quality. The result was a massive improvement over normal
DPCM. There still is some audible distortion on loud or dynamic passages, but
nothing compared to DPCM. Subjectively, IMA ADPCM does provide fairly high-
fidelity music, the quality is resonable for background music or casual use, but still
not sufficient for critical listening over a high-performance hifi-system. Again, the
noise ”bursts” are clearly above the masking threshold, although nowhere near
DPCM, while the average background noise is very low, almost inaudible.
The analysis done with Matlab is given in the figure and table below.
90
The errors are now much smaller, between –0.05 and 0.05 in amplitude and with a
very low nominal noise level. We can also see that it clearly follows the signal level
and thus also the hearing threshold. The calculated values are given in table 14.
As we can see, the SER har increased dramatically. 32.5dB is still not true hi-fi
performance, even when the noise is phychoacoustically shaped, but compared to the
8.5dB achieved with the DPCM algorithm the improvement is very significant indeed.
As we can see, it is still ”bursts” of distortion at dynamic passages which dominates.
With less dynamic music the subjective results as expected were better. The huge
maximum absolute error in table 14 is not as worrying as it seems, it’s just a result of
the very first prediction being way off, since the index-variable and the previous
sample variable must be given start values before the first run (see source-code).
The penalty of using ADPCM is increased complexity, it is about 2.5 times slower
than the basic DPCM algorithm. With effective programming however, real-time
IMA ADPCM should be possible to implement on a resonably powerful MCU.
The µ-law algorithm is an algorithm made for 2:1 (16-bit to 8-bit) compression and
frequently used in digital telephony (it’s also used in DAT-recorders with longplay-
function). It is adaptive since the quantization depends on the input level and provides
a significant improvement in dynamic range over 8-bit LPCM. The reader is referred
to the theory section for details. The algorithm is standardized, fast and easy to
implement. A µ-law codec was developed in C and run on the Powerbook using the
same test setup as for DPCM and ADPCM.
91
Figure 59 µ-law performance measurement, ”Littlewing.wav”.
The 8-bit µ-law algorithm is clearly better than the 4-bit ADPCM, which was as
expected. In numbers, the performance is as shown in table 15.
During programming and testing, it became evident that µ-law is actually faster than
ADPCM and provides higher audio quality, though at twice the output bitrate. Since
the bitrate still is within the requirements, µ-law is definetely a viable alternative that
provides decent fidelity music quality and is fast and well-tested.
Another advantage with µ-law that became evident when listening is that the errors
are better spread throughout the signal range. The noise is, as it should be, highest for
loud signal levels, but the ”bursts” found in DPCM and ADPCM are not nearly as
present in µ-law. The error follows the signal level, and thus the masking threshold, in
92
much better way. Subjective listening tests confirm and reinforce the superiority µ-
law has over 8-bit LPCM and also IMA ADPCM. If the wireless loudspeaker system
is to use lossy compression with a standard algorithm, µ-law is regarded the most
suitable of the ones tested.
18
Velocity Engine is a special instruction set within the G4, used to increase multimedia performance.
It also has a dedicated maths co-processor and other special hardware which is not utilized by the
compression routines written for this thesis. The encoding of a 10min wav-file takes less than 20s with
either MP3 or AAC on the 1Ghz Powerbook, almost as fast as the simple DPCM codec. To write
dedicated compression programs that utilizes the Mac hardware is beyond the scope of, and not the
focus of, this thesis.
93
Figure 60 Measured performance, 128kbps MP3, ”littlewing.wav”
94
Figure 61 Measured performance, 256kbps MP3, ”littlewing.wav”.
We can see that MP3 is better than any of the above methods, even at 128kbps bitrate
(12:1 compression ratio). This proves that much can be gained using advanced
algorithms. Unfortunately, dedicated hardware or powerful processors are needed for
real-time implementation. If low compression ratios (2:1 to 4:1) are sufficcient, even
simple algorithms can give quite good results. However, for ratios below 2:1,
dynamic quantization does not seem to be a good alternative due to the ”bursts” of
distortion and the fast-rising complexity of the quantizer due to the number of output
levels rising exponentially with the number of bits.
95
4.6 iLaw: a low-complexity, low-loss algorithm.
For this part of the project, a low-loss compression algorithm was designed especially
to meet the requirements of the wireless loudspeaker system, and to be an alternative
to lossless compression if implementing the latter proved unfeasible. The demands are
as basic as they are fundamental:
Since the DPCM quantizer and ADPCM tables quickly increase in size and
complexity with the number of bits in the compressed stream, they were discarded
from further development. Rather, the coding is based on µ-law coding, whose
complexity is in principal independent of the number of output bits. The minimum
compression is given by:
bps 1!10 6
Eq. 35 WLO = = = 11.3
2 ! fS 2 ! 44,100
Where WLO is the maximum output word length. Since some headroom is desired, a
10-bit version of the µ-law encoding scheme was developed. This would allow for a
15-bit dynamic range using a 3-bit exponent and a 6-bit mantissa as described in the
µ-law theory section. Thus, the 10-bit output word will be on the form:
Since the exponent can hold a zero count up to 8, the sign bit holds the MSB and the
mantissa 6 LSBs, the dynamic range is 15-bits. It’s just an expansion of the standard
8-bit µ-law coding, which has a 4-bit mantissa and thus 2-bits lower performance.
96
Figure 63 Flowchart, iLaw encoder designed for this thesis.
In the case of second order prediction, the filter is on the form H(z) = 2z-1-z-2,
however, no multiplications are used since the residuals are calculated as shown
earlier. The differences are also rounded to 16 bits while they may actually be 17.
Since the 10-bit µ-law throws away the LSB anyway (it has 15-bit dynamic range; the
signbit, 8 zeros and the 6-bits mantissa is what it at most can hold) this will not lead to
any further degradation of the signal quality and the encoder’s complexity is reduced.
The decompression is very simple and easy to implement, it will be the same filter
and the same decoder as used in the compression.
This special iLaw codec was written in C and compiled for Mac OS-X to enable a
performance evaluation. The results for the same tests as the others are shown in table
17.
97
Table 17 Performance iLaw codec, ”littlewing.wav”
SER 49.5dB
Maximum absolute error 0.0055
Complexity estimation Approx. 250 inst./sample
As can be seen, the results are significantly better than for the traditional µ-law codec.
Actually, they exceed the measured numbers achieved with 128kbps MP3, and
subjective listening tests also show very little degradation of signal quality.
This codec provides high-fidelity performance and should also be quite easy to
implement in an MCU. It is thus a viable alternative to lossless coding.
98
4.7 Notes about the performance measurements
Although measurements for only one reference file are shown in this report, the
codecs were tested on several music tracks to ensure that the results were
representative for the algoritms and not caused by special circumstanses. The
”littlewing.wav” file has both a quite large dynamic range and a wide spectrum, so it
won’t mask any bad performance. The check done with other files confirmed this.
Estimations of complexity were done by compiling a single compression run with the
SDCC MCU-compiler and count the instructions in the resulting assembly-file. It
should be noted that this was a very rough estimation since no data retreival or
sending operations were included, the variables were just given certain values. Also,
the code was not significantly optimized for the MCU. But although these estimations
are not very precise, they give an indication on how demanding the different
algorithms are. To do a full implementation of every codec would be too much work
and at this stage rather pointless, since the estimations were just meant to indicate
whether or not the different encoders are at all feasible to implement in an MCU. And
since there are 512 instructions available per sample, they are.
99
5 Design of Lossless Compression Algorithm
The goal for the WLS is to use lossless compression to restrict the datarate to within
the 1Mbps capability of the RF-transceiver while maintaing full audio quality. In
addition, since the algorithm must be able to run on real-time using only a 8-bit MCU,
it has to be very fast. Different solutions were tested by the means of writing
programs in C doing the necessary functions and then evaluate them by compiling for
OS-X and run them on wave-files.
100
Figure 66 Waveform of, from top to bottom, "littlewing.wav", "percussion.wav",
"rock.wav", "classical.wav", "jazz.wav" and "pop.wav", Audacity
The waveform and FFT give a good indication of compressability. The louder the
waveform, the higher the entropy. The effectiveness of prediction (how much the
entropy is reduced) is as explained in the theory chapter dependent on the high-
frequency content. Since the entropy is related to the signal power and the entropy
reduction possibility to the HF-content, the ”compressability” could to some degree
be quantified using the mean signal power level and the spectral centroid (the spectral
”center of gravity”) as well as by looking at waveforms and FFTs. The Matlab-files in
appendix 7 includes calculation of both signal power (the SER calculator) and spectral
centroid for the interested reader to explore.
For simplicity all files are, as can be seen, mixed down to mono. The gain of using
channel decorrelation was tested seperatly. To reduce the workload when testing other
parameters like prediction and coding schemes, only mono codecs were used during
this phase of the developement.
101
Figure 67 Spectrum of the "littlewing.wav", "percussion.wav", "rock.wav",
"classical.wav", "jazz.wav" and "pop.wav”, Audacity
As we can see the files characteristics are very different, some have much high-
frequency content and others less, while some are definetely much louder than the
others. Combined, these files should give a good indication of how well the tested
algorithms will perform.
It should be noted that when the ”pop.wav” file in table 18 is decribed as ”highly
compressed” it is not with reference to any data compression, but to amplitude
compression. What is done is that the volume of all the tracks in the recording is
truncuated and amplified to full level using an amplitude compressor. This technique
is very commonly used in pop recordings to maximise perceptability over low-fidelity
playback systems like radios, car-stereos and TVs. Popular music is sold through
mass media and it is important that the music is ”catchy”, i.e. easy to remember even
when listening to it casually or with low-quality sound. When everything is loud, it’s
easy to perceive. Audiophiles will of course argue that this makes the music ”flat” and
lifeless, but they are not the target audience anyway. However, this poses a problem
when it comes to data compression as well. Since the signal amplitude is very high at
all times, the entropy is also high and the music is difficult to compress. Lossless
compression can because of this be expected to have a lower performance level on
such recordings and they are therefore often used as worst-case scenario benchmarks.
102
5.1 Coding method
One of the most crucial steps in a lossless compression algorithm is the entropy
coding. It should be fast, memory efficient and at the same time eliminate almost all
redundancy.
Rice-coding has the advantage of being very fast, easy to implement and there is no
need to store tables. The clear disadvantage of Rice-coding, as shown in the theory
chapter, is the huge codelengths produced when there is significant overflow (that is,
when the real sample value is significantly larger than the k-bit estimated value that is
sent uncoded). Thus the estimation of the factor k is very critical. A fast method to
calculate k has been shown, but this calculation is still the most demanding bit
computationally. One can trade off effectiveness for speed by using the same k for a
larger number of samples, but then some very long codes will be produced. As
mentioned, this is much more critical in a real-time system than in a computer
application.
Another alternative is the Pod-code. Here, the overflow is also sent uncoded. Ahead
of it comes a number of zeros that indicate how many bits the overflow is.
Consequently, the codelength has been reduced from (overflow+1+k) for Rice-coding
to (2"log2(overflow)+k) while the prefix-property has been preserved.
{
Eq. 36 k = min k' 2 k' N ! A}
Where A is the accumulated sum of previous residual magnitudes and N is a count of
residuals. Programmed in C, this translates to:
The two critical factors in this calculation is how often it is done and also how often A
and N should be reset. Ideally one should calculate a new k for every sample.
However, this will slow down the codec, since this calculation is the most complex in
103
the algorithm. If k is calculated too rarely, the effectiveness will be reduced, the
question is by how much. Also, one has to reset N and A with some interval, so they
don’t use up too much memory. However, to do it too often will decrease the
performance since less previous samples are averaged.
It should be noted that using the same k for several samples will give most
performance decrease for signals with much high frequency content. This is obvious
because the sample values then will vary more within such a frame, and a larger
number of samples are likely to produce much overflow. Prediction is earlier shown
to be equivalent to a high-pass filtering of the signal, and the effect of a non-ideal k
should be different when prediction is performed. To see if this has a big effect on the
performance of the two coding methods, they were tested both without prediction as
well as with first-order prediction.
The results with are shown in following tables, without prediction and with with first
order prediction.
Table 19 Performance of Rice- and Pod-coding, A and N reset every 256th sample,
no prediction, ”littlewing.wav”
Calculation Rice - filesize Rice - max Pod - filesize Pod – max
frequency, k reduction wordlength reduction wordlength
Every sample 25.8% 29 bits 25.3% 20 bits
Every 4th 25.4% 48 bits 25.2% 22 bits
Every 8th 25.3% 48 bits 25.1% 22 bits
Every 16th 25.3% 48 bits 25.0% 22 bits
Every 32nd 25.1% 52 bits 24.9% 22 bits
Every 64th 25.0% 64 bits 24.8% 22 bits
Fixed k = 6 -28.0% 371 bits 14.4% 26 bits
19
Experiment were done with A incrementing over a larger number of samples between each reset, but
the gain in compression ratio was not significant and therefore considered not to be worthwile
exploring further due to the given limitations in processing power.
104
Table 20 Performance of Rice- and Pod-coding, A and N reset every 256th sample,
1st order prediction, ”littlewing.wav”
Calculation Rice - filesize Rice - max Pod - filesize Pod – max
frequency, k reduction wordlength reduction wordlength
Every sample 42.0% 30 bits 41.8% 20 bits
Every 4th 42.0% 44 bits 41.7% 20 bits
Every 8th 41.9% 52 bits 41.7% 20 bits
Every 16th 41.9% 62 bits 41.6% 20 bits
Every 32nd 41.8% 85 bits 41.6% 20 bits
Every 64th 41.8% 116 bits 41.6% 21 bits
Fixed k = 6 31.8% 172 bits 37.5% 24 bits
As we can see, the Rice codec performs significantly better when k is calculated for
every singe sample. This is not unexpected as table 4 in the theory chapter shows that
Rice is the more effective code for very low overflow values. It was a bit surprising to
see that the Rice encoder help up even when a new k was calculated only every 32nd
or 64th sample. However, the difference evens out, and with a fixed k Rice coding
performs very badly. It was also a bit surprising to see that with first order prediction
too, the Rice codec held up very well even with the same k over frames of 64 samples.
This, along with the big gain when doing the prediction, indicates that there is quite
little high-frequency content in the signal. To validate this assumption, as well as the
one stating a decrease in performance for the combination big frames and much HF-
energy, tests were done on ”percussion.wav”; a recording of percussion instruments
with much high-frequency energy. The simulations were done for the two extremes, a
new k for each sample, and a new k for every 64th sample.
Table 22 Performance of Pod and Rice coding with HF-rich file, 1st order
prediction, "percussion.wav".
Coding Cal.frequency of k Filesize reduction Max wordlength
Rice Every sample 36.3% 1105 bits
Rice Every 64th sample 36.0% 2917 bits
Pod Every sample 36.2% 23 bits
Pod Every 64th sample 36.1% 26 bits
Again the Rice-codec performs surprisingly well even when the same k is held over
64 samples. But the gap to the Pod-codec has closed, which shows that k is not as
accurately estimated when there is much high frequency energy. One should also note
that the process of prediction has much less effect on the percussion track. This is
105
obvious since high frequencies means big differences between adjacent samples. That
the compression ratio is as good as it is, is probably due to the fact that parts of this
track are quite silent and in these periods the datarate produced by the encoders is
quite low.
The average performance of the Rice and Pod encoder in all the tests listed above is
shown in figure 68. The cases with a fixed k are excluded from this average, since that
is something than would not be considered in any final algorithm and thus has little
relevance when it comes to evaluating the practical results.
Figure 68 Encoding performance and worst-case word length, all tests averaged
The conclusion after examining and comparing Pod-coding and Rice-coding is that
the gain of using Pod-coding is most significant in real-time systems, where the
excessive wordlengths produced in some cases by Rice-coding can cause serious
problems and would demand a big buffer not to interfere with the data throughput. In
computer compression applications, where real-time operation is not needed, the Pod-
coding is unlikely to give any performance improvement. As figure 68 shows, the
performance is in the Rice-codings favour, although only by 0.2%. Rice-coding is
also the preferred method in almost all commercial lossless audio codecs. But
computer programs is not the target application for this thesis. The codec is to be used
106
in a low-power, low-memory real-time systems, and due to the enormous difference
in worst-case behaviour Pod-coding is clearly considered the better alternative of the
two.
As the table shows, the encoded part is just inverted if the value is less that zero, then
the sign-bit can be discarded. To retain the prefix-property (the code always has to
start with a zero when positive and a one when negative), the code had to be ”shifted
up” one number. It’s then obvious that this scheme would not give any benefit if used
on Rice-coding, since the loss of shifting up is always one bit and the net gain would
20
The name iPod is a registered trademark of Apple Computer Corp. and if the suggested scheme is to
be used in any commercial application, the name should be changed.
107
always be zero, but when used on Pod-coding it gives a one-bit net benefit for most
overflow values and a one bit loss for a few.The only process operation that has to be
done extra is to invert the n-bit overflow after n ones if the number is below zero. If
the overflow is frequently large (>3 bits) this scheme should lead to an improvement,
if it is not, it can actually give a net loss.
The proposed scheme actually gives a decrease in performance for all files except
”pop.wav”. The loss is also bigger than when calculating k more rarely, which can be
seen by comparing the results for ”littlewing.wav” to table 19.
A study of the overflow shows that the calculation of k is very effective, the overflow
is 0 or 1 for most of the samples, which also explains why Rice-coding gave better
compression than Pod-coding. The value 1 gives a 1-bit loss with iPod encoding
compared to ordinary Pod and, as figure 69 shows, it appears much more often than
all values for which iPod gives a net gain put together. Note the logarithmic y-axis in
the figure, the overflow is 0 or 1 for more than 90% of the samples. Because of the
results found, the proposed scheme was discarded.
108
Figure 69 Distribution of overflow, "littlewing.wav"
109
5.3 Prediction scheme
For intra-channel decorrelation different prediction schemes were considered. It is
important that the predictors are simple, but still efficient. Adaptive predictors, very
high order linear predictors or polynomial approximations with many polynoms were
considered unfeasible due to the hardware contraints, and the options were narrowed
down to a few low-complexity alternatives:
1. First order linear prediction, where the residual is the difference between
two adjacent samples.
2. Second order linear prediction, where the residual is the difference
between two adjacent differences from 1.
3. A simple two-alternative polynomal approximation:
a. One polynomal being xˆ 0 [n] = 0 (no prediction) and the second being
xˆ1[n] = x[n !1] (first-order, as in 1).
b. Or with one polynomal being xˆ1[n] = x[n !1] (same as in 1) and the
other xˆ 2 = 2x[n !1] ! x[n ! 2] (same as in 2). Also, two ways of
handling them can be used:
i. Sample-to-sample adaptivity, where the smallest of the two
residuals are encoded for each sample, and a dedicated bit tells
the decoder if it is residual 1 or 2.
ii. Frame-to-frame adaptivity, where the residuals are incremented
over a given frame and the smallest one is chosen for this
particular frame. This saves the dedicated bit for each sample,
but the frame needs a small header (for instance a ”10” / ”11”
after the sign bit where the fist ’1’ indicates the start of a frame
and the second one which residual is encoded).
Alternative 3-a does not demand any extra calculations over 1, since the first order
prediction is the only one being done in both cases (this relation is the same for 3-b vs
2). The extra work is to either find the smallest of the residuals one time per sample
(i) or to accumulate and compare an entire frame (ii). The first of the two is probably
faster, but there is an extra overhead of 1 bit per sample needed to tell the decoder
which residual is used. However this can be made up for by the fact that the smallest
residual is chosen each time and the gain from this might on average be more than 1
bit. The second alternative will produce less overhead, but will not choose the right
sample each time, so experiments must be done to find the best one. Statistically, it’s
obvious that the third alternative will not work very well if one residual is smaller
than the other almost every single time. For the case 3-b-ii, one can just as well
choose from 3 alternatives (zero, 1st or 2nd order) since one extra bit per frame (to
indicate which of the three is chosen) will produce minimal overhead.
For testing alternatives 1 and 2, a codec with a selectable predictor was written. Since
Pod-coding has been shown to be the preferred coding scheme, only this was used
during testing of the other parmeters. The performance is summarized in the
following pages.
110
Table 25 Filesize reduction, no pred., 1st order and 2nd order linear pred.
No prediction 1st order pred. 2nd order pred.
”Littlewing.wav” 25.3% 41.8% 48.1%
”Percussion.wav” 31.2% 35.8% 32.9%
”Rock.wav” 11.7% 27.1% 31.7%
”Classical.wav” 23.3% 40.4% 47.3%
”Jazz.wav” 12.0% 33.5% 38.7%
”Pop.wav” 1.6% 17.2% 18.5%
As the results show, there is a clear correlation between the mean amplitude of the
signal and the compression level achieved. Also, the effect of increasing predictor
order is strongly dependent on the spectrum of the signal. This is expected and in
harmony with theoretical assumptions. For the wireless loudpeaker system, the
requirement is about 30% (from 1.4Mbps to <1Mbps), the results show that this is
within reach for most inputs even with quite simple predictors, but that there at least
for some music will have to be quite frequent usage of a lossy compression mode as
well.
Where Pi is the Prediction Indicator, which tells the decoder what prediction is used,
S is the Sign bit and the rest is normal Pod-encoded data. A codec was written where
the user can select between alternative 3a and 3b above. The results are presented in
table 26.
111
however not as great as when moving up one order in prediction (i.e. 0th and 1st order
polynomal approximation does not perform as well as 2nd order linear prediction),
which suggests a fixed predictor will give a better performance/complexity ratio. The
polynomal approximation however has the advantage that to a large extent the
excessive wordlengths are avioded since the biggest overflows will be eliminated in
the polynom selection process.
The extra bit sent with every sample is the major drawback to this approach. If one
chooses between polynomals only every n’th sample, the overhead will reduce by a
factor of n. However, it is not longer certain that the best polynomal is chosen for
each sample. One has to choose the one giving the smallest total magnitude for an n-
sample frame. A major factor of course is how big this frame should be. Obviously it
should be several samples, to minimize the overhead, but at the same time, the larger
the n is, the more ”wrong” selections are made within each frame. Since the codec
should not operate with several frame lengths, it is logical to do this selection at the
same time as the code-parameter k is calculated. Then the same variables can be used
for accumulation and counting too. Since two bits per frame is a insignificant
overhead, we can in this scenario choose between the 0th, 1st and 2nd order residual
and use to two-bit frame header to tell the decoder which is chosen. The results for
two different frame lengths are given in table 27.
Table 27 Performance, framewise polynomal approximation, 0th, 1st and 2nd order
polynom selection
16 sample frame 256 sample frame
”Littlewing.wav” 48.2% 47.8%
”Percussion.wav” 39.5% 36.0%
”Rock.wav” 30.9% 31.5%
”Classical.wav” 50.5% 47.2%
”Jazz.wav” 40.9% 38.6%
”Pop.wav” 21.5% 19.0%
As we can see, performance increased very little. The cause of this can be a
combination of two things. The first is that the long framelenght gives a more seldom
calculation of k and also that more wrong polynom selections are done within each
frame. This is degrading performance somewhat, with three polynoms to choose from,
the algorithm should otherwise have performed better than the 1st or 2nd order
samplewise approximation. However, the differences are small and the gain compared
to a fixed second order approximation is also very limited. The reason for the small
performance improvement then probably lies in the fact that the same polynom is
chosen almost all the time. To examine this, variables were included in the code
which counted the number of times each polynom was used. The result is given in
figure 71.
112
Figure 71 Polynomal selection, framewise polynomal appr., 255 sample frames, Excel
As we can see, the 2nd order residual is chosen most of the time. The exception is the
very HF-heavy ”percussion.wav”, where the distribution is very different. This is also
supported by the fact that polynomal approximation clearly gives most improvement
over fixed prediction with just that file. One can also see that in some files, the
sample value is actually chosen more often than the 1st order residual. This is most
notable for ”littlewing.wav” and ”jazz.wav” and probably due to the fact that these
pieces have a couple of second of silence at the start for which the sample values are
chosen.
113
Table 28 Third and fourth order fixed predictor, new k for every sample
Third order Fourth order
”Littlewing.wav” 51.1% 52.0%
”Percussion.wav” 28.6% 24.7%
”Rock.wav” 34.6% 36.5%
”Classical.wav” 48.1% 49.7%
”Jazz.wav” 38.8% 36.3%
”Pop.wav” 19.2% 19.8%
As we can clearly see, the gain is decreasing rapidly, even being negative in some
cases. It’s obvious that a brute force method with very high order fixed predictors will
give a low performance/complexity ratio. This is probably also the reason that many
of the best available applications uses some sort of polynomal approximation. The
latter is garuanteed to have better performance when more polynomals are used.
Shorten [reference 6], one of the most successful lossless compression programs
available today, uses a four-polynomal approximation. However, this is beyond the
capability of the wireless loudspeaker system’s performance. Figure 72 shows a
comparison of the average performance for all test files and all prediction schemes.
114
Table 29 Computational cost per sample for the different prediction schemes
No prediction 0
1st order fixed 1 16-bit subtraction
2nd order fixed 1 24-bit assertion
1 24-bit subtraction
1 quantization (24-16 bit)
1 16-bit subtraction
3rd order fixed 2 24-bit assertions
2 24-bit subtractions
1 quantization (24-16 bit)
1 16-bit subtraction
4th order fixed 3 24-bit assertions
3 24-bit subtractions
1 quantization (24-16 bit)
1 16-bit subtraction
Samplewise pol.appr., 0th and 1st. 1 16-bit subtraction
1 16-bit compare
1 16-bit assertion
1 8-bit assertion
Samplewise pol.appr., 1st and 2nd 1 24-bit subtraction
1 24-bit assertion
1 quantization (24-16 bits)
1 16-bit subtraction
1 16-bit compare
1 16-bit assertion
1 8-bit assertion
Framewise pol. appr., 0th, 1st and 2nd 1 24-bit subtraction
1 24-bit assertion
1 quantization (24-16 bits)
1 16-bit subtraction
2 24-bit accumulations
(3 24-bit compares, 1 16-bit assertion and 3 24-bit
clear at the start of each frame)
As can be seen, there is a big leap in performance from no prediction to 1st order,
then there is a significant jump to 2nd order. Increasing the fixed predictor order
further has little effect. The polynomal approximations are a few percent better than
the highest order prediction they consist of, but the 0th and 1st order selection,
probably the most likely to be achieved on the WLS MCU, is not as good as a fixed
second order prediction. Generally, moving to polynomal approximation increases the
number of operations per sample more than moving up an order or two in the
predictor. A fixed predictor also gives a constant processor load, which is much easier
to handle when the operation is real-time. But if resources are available, polynomal
approximation should definetely be considered, as it seems to give the best
compression ratio.
115
measure performance, normal modern live and studio recordings, a live classical
recording with a big and wide soundstage and also a 60’s recording where different
instruments are located in each of the two channels21. The files are described in table
30.
As described in the theory section, the normal way of decorrelating the channels is to
replace the L (left) and R (right) signals with M (mutual) and S (side), one consisting
of the average between L and R and the other the difference. However, a complication
arises when using only integer arithmetic. The mutual signal is calculated by
L+R
Eq. 37 M =
2
21
In the early days of stereo recording, it was often utilized by placing some instruments, like drums and rhythm
guitar, in one channel and the rest, for instance lead guitar, bass and vocals, in the other. During the late 60’s and
early 70’s recording engineers gradually learned to use stereo to pan the sound between the speakers, which gives
a more natural soundstage and also more signal correlation between the channels.
116
50%. However, performance can suffer in special cases where the channel being sent
uncoded is consequently stronger than the other, or if the channels are often in
opposite phase. A way to overcome this, and also improve performance, is to find out
which channel has the smallest absolute value and see to it that this channel is sent
directly for each sample. However, this would demand quite a bit of resources and
also a 1-bit indicator would have to be added to each sample-pair to tell the decoder
which channel is sent directly. Subsequently, this is not investigated further in this
thesis, but if high-performance compression programs for home computers are to be
developed, it could be considered.
117
Figure 73 Entropy of channels, mutual and side signals and filesize reduction,
average results of files in table 14 except ”dualmono.wav”
As we can see, the teoretical and practical results are almost identical. The average
performance is lower with inter-channel decorrelation on since the side signal has
higher entropy than the channel being removed. We can see that the mutual signal is
smaller than the channel signals, this is an obvious consequence of the side signal
being larger, so a bit better performance would be achieved if the mutual signal had
been calculated as well. However, the conclusion is that the sample-to-sample channel
correlation is neglegible and that implementing inter-channel decorrelation is not
worthwhile.
Better results could probably have been achieved by exploiting channel correlation
over larger time windows. There is much correlation between left and right, but
because of the time differences it will not be evident when only one or a few sample
instants are compared at a time. By searching for correlation over larger time periods,
much redundancy could probably be removed, but this will require much memory and
processor power and it thus not feasible on the wireless loudspeaker system. It would
probably also produce too much latency for use in any real-time system, but if
compression for personal computers and file storage is developed, it should definetely
be considered.
These results correspond with the ones found by beforementioned Mat Hans and Al
Wegener, among others, and inter-channel decorrelation is not recommended to
implement in products like the wireless loudspeaker system.
118
5.5 Final algorithm proposal and benchmark
As seen in the previous segments, a large number of methods have been tested. The
results found lays the foundation for the final algorithm proposal. Of course one
should always keep in mind that the target application is an embedded real-time
system. Thus, some of the demands include:
The suggested algorithm is finally tested for performance against the compression
application Carbon Shorten 1.1a for Macintosh. Shorten is considered one of the best
compression algorithms and both Carbon Shorten and Shorten for Windows are
amongst the most popular lossless compression utilities for their respective platforms.
It is thus very relevant as a benchmark for comparison. Shorten is a highly developed
22
It might be that using the same k within a frame is the best option when it comes to implementation
of the WLS, see the implementation considerations chapter for details.
119
scheme based on a higher-order polynomal approximation and Rice-encoding and will
therefore presumably give a better compression ratio than the much simpler algorithm
devised for our purpose. The point of the comparison is to see how close we get to the
more sophisticated algorithm in terms of compression ratio with just a second order
predictor and the Pod-coding. The comparison was done using all the files in table 18
(the six mono test files) and table 30 (the six stereo test files).
120
5.6 Lossy mode
As mentioned before, the wireless loudspeaker system has to include some sort of
lossy mode if the compression ratio over a period of time does not manage to meet the
requirement set by the 1Mbps transfer rate of the transceiver. The data will need to be
buffered in the MCU memory and if the buffer is about to be filled up (if more data is
being sent from the encoder than the CC2400 is able to transmit) the lossy mode must
be engaged. It must stay ”on” for a short while until the buffer is empty again.
The time period the lossy-mode is on will be very short, a few ms at most, but on files
with low compressabillity it will be used quite often. It is still unlikely to be audible,
but a lossy-mode scheme has to be used which does not compromise performance too
much, to minimize the probability of perceptible degradation.
To be able to realize this, the data must be split into frames. A header is needed to tell
the decoder if the frame is encoded in lossless or lossy mode. The frame should be
short enough to not minimize distortion audability, but long enough so the header
does not give too much overhead.
Three different schemes for the lossy mode were considered. A model of the system
was written in C so listening tests and measurements could be made. The three
alternatives are:
If we consider the first method, some kind of low-complexity lossy encoding must be
employed, probably µ-law, iLaw or an eqivalent. However, as shown earlier, each of
these methods will, unless they are very adaptive, give low noise on low-level signals
and high noise for high-level signals, often higher than an LPCM quantization to the
same number of bits. Of course, the lossless encoding will produce the longest output
words when the signal is loud, i.e. at the same time as the lossy compression performs
badly. Thus, alternative 2, to remove some LSBs when the bitrate is too high, will
almost certanly give a better result than, and also be much simpler than, moving to
some special lossy encoding scheme. Thus, this option is discarded. The other two are
evaluated in the following subchapters.
121
5.6.1 LSB-removal lossy-mode
When the data rate from the lossless encoding is too high it is likely that the signal is
loud. Removing one or a few LSBs when the signal is loud is not very perceptible, if
it is done over short periods of time, even less so. Unlike for instance µ-law lossy
encoding, a hybrid scheme like this will cause loss only when the bitrate is too high to
transfer (as shown earlier, µ-law gives high dynamic range, but the instantaneous
quantization error is just (n-4) bits below the sample value for a n-bit encoding).
The frame header will need to tell the decoder two things. First, it needs to know
whether or not the frame itself is encoded in lossy or lossless mode. Secondly, it will
need to know how many bits are removed from the samples in the frame. Since the
data input is 16 bits/sample and the decoder output is around 10 bits/sample, the
number of LSBs needed to be chopped off can be represented with three bits.
Obviously, the zero will tell the decoder that no LSBs are chopped off, which is the
same as lossless transmission, and a separate indicator for this is not needed. One
should note that if k is not calculated for every sample, the wordlength can increase
for some samples where k is way off. So if k is calculated rarely some frames can be
large and one might want to increase the number of bits in the header to accommodate
this. Since a sample-wise calculation of k is suggested for this system, a three-bit
header is used during testing.
During testing, a frame-size of 64 samples was used. The MCU has 2kB RAM and
can thus hold two frames, the uncompressed input frame and the compressed output
frame. Since the application is in real-time it was also important to develop a scheme
which is causal. The result was a low-complexity algorithm for employing lossy-
mode: During decompression of frame N, the output data length in words is counted.
If frame N is larger than a threshold, corresponding to 1Mbps, LSBs are removed
when reading frame N+1. The number of LSBs removed always correspond to the
overshoot from the length of frame N relative to the threshold. Thus the average
output datarate will always converge towards 1Mbps. There are 64 samples in each
frame and the desired output data rate was set to 10 bits per sample. The threshold is
then:
BOUT ! 640
Eq. 39 LSB _ rem =
64
122
Where BOUT is the number of bits produced when decompressing the previous frame.
When mono files are read, the bitrate is already below 1Mbps and no lossy-mode is
employed.
The lossy-mode was tested on several files and very little if any degradation of audio
quality was detected. Figure 76 shows the performance on a 30s excerpt of
”modernlive.wav”, a file of normal loudness. The performance is also compared to
the iLaw codec and the LAME MP3 codec at 192kbps.
123
Figure 76 Lossy-mode performance, "modernlive.wav", 30s excerpt, left channel
As we can see, the error clearly follows the frames. For many frames no bits are
removed, while for a few others up to four bits are removed. The vast majority
however, are between zero and two. The measured results in numbers are shown in
table 32.
As we can see, the loss measured in numbers is a lot better than for iLaw or MP3.
This was not unexpected, as listening tests showed no audible degradation. Figure 76
also shows that for most frames, zero or one LSB is removed, two for quite a few,
while there in some rare instances are three to four removed. But this is in very loud
parts of the track and also on high-frequency signals (due to the prediction), and does
not appear to be audible. The lossy-modus as suggested here works very well.
124
5.6.2 Mono samples lossy-mode
The mono mode lossy algorithm developed is very similar to the LSB-removal
algorithm. It checks the output length of frame N. Then, if it is too long, it sends some
samples in frame N+1 in mono to compensate for the overshoot. The threshold is
calculated in the same way as with the LSB-removal. Since 16 bits are saved for each
sample sent in mono, the number of mono samples for frame N will be
BOUT ! 640
Eq. 40 SMONO =
16
Where SMONO is the number of samples in frame N+1 to be sent in mono and BOUT is
the number of bits used in frame N. Thus, the output bitrate will average at 640 bits
per frame or 10 bits per sample. The algorihm is the same as shown in figure 75,
except that SMONO is calculated instead of lsb_rem and mono samples are sent instead
of LSBs removed. When in mono-mode only the left channel is sent and the decoder
copies it to both left and right after decompressing.
During testing, it soon became evident that the mono samples lossy-mode had
problems that could not be solved without significantly compromising performance.
For most frames, the right channel will toggle between being itself and being a copy
of the left. Since this happens twice for each frame (when the lossy-mode is engaged)
and the frames are 64 samples, the right channel will toggle it’s mode about 1,400
times per second. This introduces very audible high frequency distortion. To confirm
that this was not an implementation issue, a very simple program converting a fixed
number of samples per frame to mono, without any compression or signal processing,
was written. This produced the same result. The only way to avoid this distortion was
to force the mono-mode to be on for quite long periods each time, at least five to ten
thousand samples (so the toggling rate is below any audible frequency). And even
then it was easy to hear the audio going from stereo to mono and back again, the
soundstaging was almost rendered unrecognisable.
125
Figure 77 Spectrum with mono-mode, 64-sample frames, ”modernlive.wav”, 30s
excerpt.
As figure 77 shows, the HF noise level added to the right channel is significant and
the result is not by any means of high-fidelity standard. Since the LSB-removal lossy
mode gave excellent results, this is a no-brainer. The mono-samples lossy-mode is
discarded.
126
6 WLS Implementation Considerations
As mentioned in the introduction, delays in design and manufacturing of the hardware
made it impossible to do a full implementation before the thesis deadline. This is
detailed in the project review. But even so, algorithm design has consistently been
done with MCU implementation in mind. As a result of this, some optimization
suggestions and general considerations will be presented as well as the work actually
done with the hardware.
The residual being sent is the difference between the real value and the predicted
value. Since both of these are 16-bit in length, the residual can, although it is higly
unlikely, be a 17-bit value23. In a powerful computer or DSP, which uses 32- or 64-bit
instructions, this is not a problem. In a 16- or 8-bit MCU however, the requirement to
handle 17-bit values instead of 16-bit will give a significant performance reduction.
Every operation will have to use a significantly higher number of instructions.
But this problem can be avoided with wrap-around arithmetic. When the arithmetic
only includes summation and differenciation and the operations in the decoder are
inverse of the ones in the encoder, using a 16-bit variable for the residual will not be a
problem, even if it’s value overflows. This is easiest explained using an example.
A 16-bit two’s complement variable has the value range [-32768, 32767]. If you try to
go outside these values, it will wrap around. For instance:
23
For the residual to use 17-bit, the difference between the real and predicted value must be more than
±32,767. This rate of change is very unlikely to occur in music signals. If for instance a first order
predictor is to give such a residual, the signal must be at almost 0dBfs (full level) and close to 20kHz,
no normal recording has such an output level at those frequencies. For a higher order predictor it is
even more unlikely.
127
If we use a first order predictor and the last value x[n-1] was 19,000 and the current
value x[n] is –32,000, the residual, x[n]-x[n-1], will be –51,000. This is outside the
value range and the residual will wrap around to 14,536. The decoder now has the last
sample value 19,000 and receives a residual of 14,536. In the decoder the sample
value is of course found by adding the residual or difference to the last value, which
gives 19,000+14,536 = 33,536. This is outside the range and will again wrap around
to –32,000, the correct value. As long as the encoder and decoder do the same
operations, this is not a problem.
It should be noted that the wrap-around process will affect the compression, since a
different value is compressed. However, one must remember that this is an event that
is very unlikely to happen, the probability of the prediction residual being outside the
16-bit value range is almost non-existent. Thus the practical compression ratio will
not be affected and by restricting oneself to 16-bit values, significant hardware
resources are saved.
if(variable>>7) {
….;
}
or by
if(variable&0x80) {
….;
}
clearly, alternative two is much faster. If bit 8 is to be set one can likewise use
variable=variable|0x80;
char bittable[8] = {0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01};
Then we can use bittable[i] to either check or assign bit i in any variable. For 16-bit
variables the bit-table should of course start with 0x8000 and end with 0x0001. This is
done throughout the code.
Look-up tables should also where possible be used to replace for-loops, for instance
when finding variables dependant on other variables. For instance, finding the
128
exponent in the µ-law encoder (the number of zeros before the leftmost ’1’ in the
magnitude) is done by
value = sample<<1;
for (exp=7;exp>0;exp--) {
if(value&0x8000) break;
value = value<<1;
}
This can easily be replaced with a table as shown in the source-code in appendix 6.
Static look-up tables like this can be included in the program memory rather than the
RAM, and will thus not affect the systems memory resources (assuming that there is
enough program memory, of course).
The ”data field” would in our case then start with the frame header and also contain
the compressed audio data for the next 64 samples. The optional 8/10 coding in the
figure is an encoding of the data (IBM 8B/10B encoding scheme, see reference 22)
that is in some applications used for spectral properties and error detection. However
it reduces the data rate to 80% of the original 1Mbps. In the WLS 8/10 coding is not
considered necessary. However, CRC (cyclic redundancy check) should be included
129
to avoid noise corrupting the data too much. As seen in figure 78, CRC adds an
overhead of 16 bits per frame.
When all the decoder parameters are transferred instead of being calculated, the
consequences of transmission errors are likely to be much smaller. Also, errors will
not be able to accumulate, at least not from one frame to the next.
Even so, packets will be lost. This can be handeled by either repeating the last packet
or putting out silence. 64 samples correspond to 0.73ms of audio for a stereo signal
sampled at fS = 44,100hz, the question is if occasional periods of silence with this
24
For details on frequency hopping the reader is referred to Chipcon Application Note AN24
130
length is audible at all and if so, will a repetition of the previous frame instead of
silence give better or worse sound.
A program that emulated the loss of frames was written to test the audability and
compare the alternatives. The source-code is given in appendix 6. The program lets
the user select the packet length in samples, how often packets are lost (a fixed
”loose-interval”, where a value of 1,000 means that 1,000 packets are sent for each
time a loss happens), how many successive packets are lost and whether silence or a
repetition of the last packet should be done to compensate.
It soon became evident that when only one packet was lost at a time, the two methods
of handling it sounded more or less identical. A 64-sample packet is just 0.7ms of
audio, and in both cases the loss of a single packet sounded like a small ”tick”.
Differences were not heard until several successive packets were dropped. A blind
test using five different audio files was set up to see how many packets had to be lost
before a difference between the two methods could be identified, and when it could,
which alternative was preferred25.
6
Number of audio files for which difference
4
was audible
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of successive packets lost
25
For information on how to set up a scientifically credible blind test, the reader is recommended the
webside of The ABX-Company, http://www.pcavtech.com/abx/index.htm
131
As we can see, 5 successive packets (3.6ms) must be lost before the difference
between the two metods is audible on a majority of the files. When many packets are
lost they sound different. Which leads to the question of which one which one is
preferable.
80
70
60
50
40
30
20
10
0
Silence Repetition
Peferred alternative
Of the 75 occasions for which a difference was detected, silence was preferred in 72.
It should be noted that when more than 8-10 successive frames were lost the listener
could easily identify whether silence or repetition was used and knew which one he
”voted for”. The evaluation in figure 81 is thus highly subjective. Nevertheless, the
distortion-like effect caused by repeating a packet several times was perceived as
worse than moments of silence.
When listening to what rate of packet loss could be tolerated, the loss in audio quality
was characterized as ”significant” when the ”loss-interval” was less than 1,000-1,500
(i.e. more than one packet loss per second). When below 300-500 it was characterized
as ”annoying”.
The conclusions from the test is that a packet loss of less than one per second can be
tolerated and that inserting silence is preferable when packet loss happens. Inserting
silence should also be easier to implement in the WLS as no extra buffering or
calculations are required.
132
133
Part 3
- Summary –
134
7 Project Review
In this part of the thesis a review of the project itself and the work done will be
presented.
From the beginning the goal of the thesis was to evaluate and find a suitable low-
complexity and high-fidelity compression algorithm and implement this using a MCU
demonstration board strapped to the CC2400EWB. However it soon became apparent
that no such demonstration boards had the necessary peripherals. Since Chipcon
wanted a reference design anyway we agreed on including hardware design as part of
the thesis. The WLS was thus designed from scratch.
To design the WLS proved to be more work than anticipated; as shown earlier in this
thesis, a custom communications system using logic devices had to be developed. As
a consequense, the hardware design phase took almost a month more than planned
and the complete and verified circuit design was delivered to Chipcon for
manufacturing in mid March instead of mid Febuary as intended.
The plan was then for me to do compression algorithm research using my computer
while waiting for the finished PCB. I was supposed to receive it soon after easter and
use the last four to six weeks on implementation. However, there were also significant
delays in the manufacturing of the circuit. Since this was beyond my control I used
the time to do a much more extensive research and development on compression than
first planned and both a custom lossless and a custom lossy algorithm has been
proposed. The finished WLS-hardware would not leave production before the thesis
deadline and an implementation was thus not possitble to achieve before finishing the
thesis.
135
8 Summary
This thesis convers the work done developing the Wireless Loudspeaker System. It
has been my intention to make it a complete document, and I have therefore presented
a theory section so the reader is able to understand the work and the results even
without looking up in the references. The theory that is not directly related to the
thesis main focus, but still has been relevant to the development process, is presented
in appendixes 1 and 2. This covers the different formats and protocols used in the
system (appendix 1) as well as general data conversion theory (appendix 2). The other
appendixes include the circuit, the PCB-design, the equipement used as well as the
source-code.
Regarding the source-code, only the most relevant applications are included. During
the project over 50 different versions of various compression algorithms were
compiled and tested. To include all these would make the report much to extensive.
The source-code in the appendix includes DPCM, ADPCM, µ-law, iLaw, a Rice-
/Pod/iPod entropy coder and decoder, a lossless codec with selectable prediction, a
hybrid lossy/lossless codec and the frame drop test algorithm. In addition the MatLab
scripts referred to in the thesis are also given in appendix 7.
The practical part of the thesis documents the work done, from finding the appropriate
components to hardware and software design. The hardware documentation ends in a
finished design, while the software documentation, due to the implementation not
being done, ends with considerations and suggestions. Based on measurements and
subjective listening tests, compared to an assessment of computational complexity
and MCU implementation feasibility, conclusions are drawn and algorithms
suggested. For the lossy option, a custom made iLaw algorithm is proposed which
features high performance (comparable to 128kbps MP3) and very low complexity
(estimated at around 250 instructions per sample in an 8-bit MCU). The lossless
algorithm suggested uses a second order predictor and Pod-coding. It features
compression ratios within a few percent of the much-recognised home computer
application Shorten, a lossy-mode for constant bitrate and an encoding with very good
worst-case performance to ensure minimum influence from this lossy-mode.
Complexity tests show it should be feasible to implement in an MCU-based system.
To summarize, I think this project, despite it not being completely finished within the
thesis deadline, has been an academical success. I have learned a lot about audio
compression, digital signal processing, general programming and embedded systems
and hardware design. These are all important areas for an engineer to master and I
consider the gained knowledge to be extremely valuable. Also, much practical
engineering work has been done, which I find very rewarding since it has made me
better equipped to face the challenges that meet me outside the university.
136
9 References
1. ”Data Compression Basics”, Slides, EECC694
2. ”Optimization of Digital Audio for Internet Transmission”, Ph.d Thesis
Mathieu Claude Hans
3. Windrow, B. et.al.: ”Stationary and non-stationary learning
characteristics of LMS adaptive filter.” Proc IEEE, 1976
4. G. Mathew et.al: ”Computationally simple ADPCM based on exponential
power estimator” Proc. IEEE, 1992
5. Monkey’s Audio theory
6. T. Robinson: ”Shorten: simple lossless and near-lossless waveform
compression.”, Technical Report 156, Cambridge University, 1994
7. Lesson 8: Compression Basics, Computing and Software Systems lecture
notes, University of Washington Bothell
8. Introduction to multimedia, Cardiff University
9. Debra A. Lelewer , Daniel S. Hirschberg, “Data compression”, ACM
Computing Surveys (CSUR), v.19 n.3, p.261-296, Sept. 1987
10. LOCO-1: Weinberger, M, Seroussi, G, Sapiro, G: “A low-complexity,
Context-based, lossless image compression algorithm.” IEEE Data
Compression Conference, 1996
11. Weinberger, M, Seroussi, G: “Modeling and low-complexity adaptive
coding for image prediction residuals”, IEEE.
12. Robin Whittle: “First Principles, Lossless compression of audio”
13. Fraunhofer institute, ”MPEG-1 Layer 3 overview”.
http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html
14. Microsoft corp.: ”Windows Media Encoder whitepaper”
http://download.microsoft.com/download/winmediatech40/Update/2/W98
NT42KMeXP/EN-US/Encoder_print.exe
15. Sony: ”ATRAC whitepaper” http://www.minidisc.org/aes_atrac.html
16. Fraunhofer Institute, “MPEG-2 AAC overview”.
http://www.iis.fraunhofer.de/amm/techinf/aac/index.html
17. Chipcon Application Note 126 ”Wireless Audio using CC1010”
18. IMA Digital Audio Focus and Technical Working Groups:
”Recommended practices for enhancing digital audio compatibility in
multimedia systems.” rev. 3.00, October 21, 1992
19. W.M. Hartmann ”Signals, sounds and sensations”, AIP Press, 1997
20. R.G. Baldwin: ”Java Sound, Compressing Audio with mu-Law encoding”
21. ”Vorbis-1 specification”, Xinph.org foundation,
http://www.xiph.org/ogg/vorbis/doc/Vorbis_I_spec.html
22. Chipcon SmartRF CC2400 datasheet
http://www.chipcon.com/files/CC2400_Data_Sheet_1_1.pdf
23. AKM AK4550 datasheet
24. Texas Instruments TLV320AIC23B datasheet
25. Analog Devices AD1892 datasheet
http://www.analog.com/UploadedFiles/Data_Sheets/294553517AD1892_0
.pdf
26. Crystal Semiconductors CS8420 datasheet
http://www.cirrus.com/en/pubs/proDatasheet/CS8420-5.pdf
137
27. AKM AK4122 datasheet preliminary
www.akm.com/datasheets/ak4122.pdf
28. Crystal Semiconductors CS8416 datasheet
http://www.cirrus.com/en/pubs/proDatasheet/CS8416-4.pdf
29. Wegener, Albert: ”MUSICompress: Lossless, Low-MIPS Audio
Compression in Software and Hardware.”Soundspace Audio, 1997
30. Atmel AVR Mega169 datasheet
http://www.atmel.com/dyn/resources/prod_documents/doc2514.pdf
31. Atmel AVR Mega32 datasheet
http://www.atmel.com/dyn/resources/prod_documents/doc2503.pdf
32. Texas Instruments MSP430F1481 datasheet.
http://www-s.ti.com/sc/ds/msp430f1481.pdf
33. Motorola DSP56F801 datasheet
http://e-www.motorola.com/files/dsp/doc/data_sheet/DSP56F801.pdf
34. Motorola DSP56800-family reference manual
http://e-www.motorola.com/files/dsp/doc/ref_manual/DSP56800FM.pdf
35. Hitachi/Rensas R8C/10 datasheet
http://www.eu.renesas.com/documents/mpumcu/pdf/r8c10ds.pdf
36. Silicon Laboratories C8051F005 datasheet
http://www.silabs.com/products/pdf/C8051F0xxRev1_7.pdf
37. 74HC4094N datasheet
38. 74HC166N datasheet
39. 74HC4020 datasheet
40. Kernighan, Brian W. & Ritchie, Dennis M. : ”The C Programming
Language”, 2nd edition, Prentice Hall 1989.
138
139
APPENDIXES
140
141
Appendix 1. Data Formats
The wireless audio system must both in hardware and software be compliant with
several standard interfaces used by the various chips. The digital audio input is based
on the SP-dif (Sony/Philips digital interface format) format and is decoded with a
dedicated receiver. Both this receiver and the audio codec which manages the analog
inputs and outputs use the I2S (Inter IC Sound) standard for communicating with other
circuits. Thus the communication between the MCU and these units must be
compatible with their I2S interfaces. Finally, the communication between the MCU
and the RF-chip uses the SPI (Serial Peripheral Interface) format.
In addition, the compression algorithms used in this project have been developed and
tested on Mac OS-X and Windows computers. The most widespread uncompressed
audio format for computer use is the WAV-format (Waveform Audio Format), which
has been used during testing and development. The WAV-file format is also
examined in the following sections.
Every sample is transferred in a 32-bit subframe. The left and right channel subframes
represents one frame. The subframes and frames are seperated with a preamble, a bit-
pattern containing a biphase-coded error. This because the receiver must be able to
identify the start of a sample or a data block. Figure A1-1 shows a how the subframes
and frames are built up.
142
Figure A1- 1 SP-dif subframes and frames [reference A1-1]
- Preamble X: Tells us the subframe has data for the left channel. The
subframe is not at the start of the data block.
- Preamble Y: Tells us the subframe has data for the right channel. The
subframe is not at the start of a data block.
- Preamble Z: The subframe has data for the left channel and we are at
the start of a new data block.
In a subframe, the first four bits are preambles. After these, four AUX data bits
follow. They are used to tranfer information about tracks, like name, track number
and soforth. Bit 8 to 27 contains the actual audio data, max 20-bits. If the data
wordlength is 24-bits, the AUX-bits are also used for audio data. After the audio data
comes a validity-bit, a user bit, a channel status bit and a parity bit.
Figure A1- 2 The content of SP-dif subframes and data blocks [ref.A1-1].
143
As seen in figure A1-2, each data block contains 192 frames and will always start
with a left channel sample. In each frame, a total of 384 channel-status and subcode-
information bits are transferred. This data information must be decoded by the SP-dif
receiver as shown in figure A1-3.
Figure A1- 3 Channel status block data, SP-dif (left) and AES/EBU (right) [ref.A1-1]
Figure A1-3 also shows the difference between the SP-dif consumer format and the
AES/EBU professional format. The latter does not have copyright information, but it
does contain some other information like reliability, reference, when the data is
recorded etc. It also contains some user configurable bits like channel setup override
and sample frequency. This information is not needed in consumer equipement, which
are meant only to playback the data, and not to alter it.
It must also be mentioned that the IEC958 standard was renamed IEC60958 in 1998
and has been expanded to also carry IEC61937 datastreams. IEC61937 data can
contain multichannel sound like MPEG-2, AC3 or DTS [reference A1-2].
144
are proportionate to the clock frequency, thus higher sample-rates can be allowed in
future applications.
Figure A1-4 shows the I2S data transfer, where SCLK is the serial- or bit-clock while
LRCK is the left-right- or word-clock. SDTI is the data transfer pin. As shown the
SCLK usually runs at 32fS or 64 fS, where fS is the sample frequency. At the former
frequency the PCM word-length can be 16-bit, 20-bit or, in theory, up to 31-bit (or
32-bit with left- og right-justification, which is explained below). However, no current
audio equipement exceeds 24-bit resolution26. In a 16-bit or less system, the SCLK
usually runs at 32 fS, easing the timing requirements. The LRCLK runs at the sample
frequency fS.
One should also notice that the sample MSB comes one BCLK-cycle after a transition
on LRCK. This is how the I2S-standard is specified and is often referred to as I2S-
justification. However, most audio components also allow for left-justification (the
MSB comes when the LRCK toggles, one cycle earlier than I2S-justification) or right-
justification (the LSB comes at the last BCLK cycle before LRCK toggles) of the data
stream. One should notice that right-justification as mentioned demands the same
wordlength on transmitter and receiver.
26
Even though the digital resolution or wordlength in modern audio equipement is usually 24-bits, the
effective resolution, given by A/D- and D/A-converter linearity and system noise levels, is currently
only around 20-bit in state-of-the-art systems. However, a seemingly excessive wordlength (true 24-bit
resolution seems impossible with todays technology) allows for more accurate digital signal
processing, with less degradation of signal quality.
145
transfer between it and the master. The MOSI and MISO are the data lines between the
master and slave, and the SCK is used to clock the transfer. A typical SPI-system with
a master (for instance a microcontroller) and three slave devices is shown in figure
A1-5.
The data on both the MOSI and MISO pin is transferred MSB-first. A SPI-pin also
places the MISO-pin in tristate (high-impedance) when it is not selected, so it’s output
does not load the bus.
The WAV-format is very simple. In addition to raw audio data, it consists of a header
which identifies it as a WAV-file. The header also tells the application if it’s mono or
stereo, what the the sample rate and resolution (wordlength) is, the filesize and also
some other information. The 44-byte header is stored at the start of the file and is
followed by the audio data like shown in figure A1-6
146
Figure A1- 6 WAV audio file header [reference A1-6]
1. RIFF Chunk
a. Byte 0-3: ”RIFF” (ASCII characters); identifies the file as a RIFF-file.
b. Byte 4-7: Total length of package to follow (binary, little endian).
c. Byte 8-11: ”WAVE” (ASCII characters); identifies the file as a WAV-
file.
2. FORMAT Chunk
a. Byte 0-3: ”fmt_” (ASCII characters); identifies the start of the format
chunk.
b. Byte 4-7: Length of format chunk (binary, always 0x10).
c. Byte 8-9: Always 0x10.
d. Byte 10-11: Number of channels (1 – mono, 2 – stereo).
e. Byte 12-15: Sample rate (binary, in Hz).
f. Byte 16-19: Bytes per second (samplerate"#channels"bitspersample/8).
g. Byte 20-21: Bytes per sample (align: 1 = 8-bit mono, 2 = 8-bit stereo
or 16-bit mono etc.).
h. Byte 22-23: Bits per sample (sw).
3. DATA Chunk
a. Byte 0-3: ”data” (ASCII characters); identifies start of the data chunk.
b. Byte 4-7: Length of data to follow.
c. Byte 5…: Audio data.
The header can also in some cases contain other chunks that specifies index marks,
textual description of the sound etc., but these are not relevant for this project, so they
will not be investigated further in this report. The interested reader is recommended
reference A1-6, ”The File Format Handbook” by Gunter Born.
References:
A1-1: IEC 958 ”Digital Audio Interface” whitepaper, European Broadcasting
Union, 1989.
A1-2: ”About SP-dif”, Tomi Engdahl
A1-3: ”The Inter IC Sound” whitepaper, Philips corp.
A1-4: AKM 4553 datasheet, AK corp.
A1-5: Silicon Laboratories C8051F00x datasheet rev. 1.7
A1-6: ”The File Format Handbook”, Gunter Born, 1995, ITP-Boston
147
Appendix 2. Data Converter Fundamentals
As mentioned in the theory chapter, digitizing an audio signal involves two processes,
sampling and quantization. When sampling, the amplitude of the signal is measured at
a fixed sampling interval T. The interval is usually described with the sampling
frequency fS=1/T. Sampling converts the signal from continous time to discrete time.
When quantizing the amplitude is assigned to a number of discrete values between 2-B
and 2B where B is the number of bits in the digital representation. This is, as
previously explained, LPCM code. The result is a discrete-time and discrete-
amplitude digital signal. The illustration from figure 2 is repeated for clarity.
Where n is an integer variable (the sample instant) and !ˆ is the sample frequency
given by ,"T. , is the signal’s ”analog” frequency in radians per second (, = 2-f
where f is the frequency in hertz) and T is the sample period. By definition a discrete-
time signal x[n] is periodic only if it’s frequency !ˆ is a rational number, that is:
The smallest period N for which this is true is called the fundamental period. It can
easily be shown that for the discrete time sinusiod the fundamental period is 2-
because:
148
Eq. A2- 3 cos[(!ˆ + 2" )n + # ] = cos(!ˆ n + 2" n + # ) = cos(!ˆ n + # ) ;[reference A2-1]
This means that every discrete sinusoidal sequence where !ˆ k = !ˆ + 2k" are
indistinguishable when [!" # $̂ # " ] . On the other hand the sequence of any two
sinusoids with frequency in the range [!" # $̂ # " ] are distinct. The frequencies
outside this range is thus described as aliases of the distinct frequencies. Since
!ˆ = ! T = ! / fs it becomes apparent that:
$ 1 f 1
Eq. A2- 4 !" # $̂ # " => ! " # # " or ! " " ;[reference A2-1]
fs 2 fs 2
must be fulfilled for any analog signal to be given a distinct sampled sequence. The
signal must be below half the sampling frequency. This is known as the Nyquist
frequency or Shannon’s sampling theorem after Harry Nyquist and Claude Shannon
who derived it. An attempt to sample anything outside the Nyquist frequency will as
equation A1-3 indicates produce an unwanted signal of which the input is an alias. To
avoid this, filtering must be performed before AD-conversion. Likewise, filtering is
done after DA-conversion to avoid aliases as well as the original spectrum being
reproduced from the digital sequence. Both pre-ADC and post-DAC filtering is
referred to as antialias-filtering og just antialiasing.
The other fundamental limitation in digital signals is the resolution, given by the
quantization. For a B-bit digital quantization the smallest distance, the quantization
step Q, is given by R/2B where R is the signal range (see figure A1-1). A roundoff
error is subsequently made. If it is assumed to be random, it is given as a white
distribution between:
Q Q
Eq. A2- 5 ! "e" ;[reference A2-3]
2 2
Q /2
1 Q
Eq. A2- 6 eRMS = e2 = "
Q !Q /2
e2 de =
12
;[reference A2-3]
If the signal to be quantized is a random signal distributed between 0 and R the signal-
to-noise ratio (SNR) will be:
149
This is referred to as the ”6dB per bit rule”. For a sinusoidal input the SNR can easily
be calculated to 6.02B+1.76 dB by using the RMS-value for a sinusiod of amplitude
R/2.
However, although these are the only fundamental limitations of a signal digitized at fs
and with B bits there are other nonidealities in the conversion that can compromise
performance.
Figure A2-2 shows the transfer characteristic for an ideal 2-bit ADC and a DAC
Figure A2- 2 Transfer characteristic for ideal 2-bit ADC and DAC [ref A2-2]
The ideal ADC assigns a new value exactly at the quantization interval while the ideal
DAC draws a completely straight line between the sample values. In real-life
however, there are several factors that compromise performance:
- Offset-error: DAC: The output that occurs for the input code that
should produce zero output. ADC: The output code for a zero volt
input level.
- Gain-error: The difference between the ideal and actual full-scale value
when the offset error has been reduced to zero.
- Differential nonlinearity error (DNL): the variation in analog step sizes
away from 1 LSB with the two above removed. DNL values are
defined for each digital value.
- Integral nonlinearity error (INL): The difference between the ideal and
actual transfer curve when offset- and gain-error has been removed.
The maximum INL is also often referred to as absolute accuracy.
150
Figure A2- 3 INL error and red. in SFDR (spurious free dynamic range) [ref A2-4]
Another non-ideality of data conversion is jitter. Jitter occurs when there is variation
in the sample period T due to inaccuracy in the systems clock signals. Jitter leads to
distortion of the signal as shown in figure A2-4.
The final performance limitation reviewed here is granulation noise. It was previously
assumed that the quantization noise e is random. However, for low values or a very
few bit representation this is not the case. This can easily be understood by looking at
the output and error from a few-bit ADC.
151
Figure A2- 5 Transfer curve and error for few-bit ADC [ref A2-2]
As we can see, there is correlation between the signal and the noise, which leads to a
distortion called granulation noise. The granulation noise is mostly audible at low
volumes and sounds much more uncomfortable than straight white noise. Therefore
requantization is often done together with dithering, a process where white noise is
added to the signal before it is truncuated. The point is to decorrelate the signal and
the noise and thus substitute uncomfortable distortion with white noise. Dithering is
displayed in figure A2-6 and the effect of it in figure A2-7.
152
Figure A2- 7 The effect of dithering on a signal with amplitude 2Q [ref. A2-3]
However, it can be shown that a random noise source as a dither generator is not
ideal. Using random noise is known as rectangular dither. This because the dither
signal has a rectangular probability density function (PDF). It can be shown that the
rectangular dither does not completely decorrelate the signal and the quantization
noise. Triangular dither does exactly that. Realized as a convultion of two random
noise-sources it will have a decreasing or triangular PDF. The amplitude can however
reach ±Q and it can be shown that the nominal noise floor will increase with 4.77dB
as opposed to 3dB. This is made up for by the resulting quantization noise, with
triangular dithering, having a completely uniform mean value and variance, i.e.
complete decorrelation from the signal (white noise). Thus triangular dither is usually
preferred in audio applications. Triangular dithering can digitally be realized easily by
passing the output from a random noise source through a (1-z-1) filter. The noise-
source is in normally made by a pseudo-random number generator.
When quantizing an analog signal on the other hand, the dither source also has to be
analog. To generate triangular dithering with solely analog components is not
possible. Analog dithering is often realized with a gaussian PDF, since this is the
same probability distribution as for natural white noise or thermal noise. Thermal
153
noise is generated by resistance in a circuit and a ±Q gaussian dither can thus be
realized with nothing more than a simple diode or resistor (diodes are normally used,
to avoid loading the input). Gaussian dithering is however less ideal than triangular
since it increases the nominal noise-floor by 6dB.
Figure A2- 8 The PDF of gaussian, rectangular and triangular dither [ref. A2-3]
References:
A2-1: Proakis, John et.al.: ”Digital Signal Processing, Principles, Algorithms
and Applications”, Prentice Hall 1996.
A2-2: Johns, David et.al: ”Analog Integrated Circuit Design”, John Whiley
& Sons, 1992
A2-3: Løkken, Ivar et.al: ”One-O digital amplifier”, bachelor thesis, HiST
2002
A2-4: Løkken, Ivar: ”Delta-sigma Audio DAC for SoC applications”, project
report, NTNU 2003
154
Appendix 3. Schematics
155
156
157
158
159
160
161
Appendix 4. Components List
162
Appendix 5. PCB-Layout
163
164
165
166
167
168
169
170
171
172
173
174
175
Appendix 6. Source-Code, C.
DPCM encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//DPCM encoder, 4:1 compression..............………..//
//Works with 16-bit mono WAV-file on big-endian//
//systems....................................…………………….//
//...........................................………………………..//
//Ivar Løkken, NTNU, 2004....................………….//
/////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void)
{
FILE *fp, *op;
if (fp) {
//wav header data
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
short value = 0; //current input sample value
short value_temp = 0; //for endian change
unsigned char delta = 0; //current dpcm output value
int diff = 0; //difference, actual and predicted value
short valpred = 32767; //predicttion value for feedback
unsigned char outputbuffer; //two-sample buffer
int bufferstep; //toggle between outputbuffer fill/write
176
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
// RUN COMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
177
delta |= 1;
}
}
}
//S0xx
} else {
if (diff >= 16) {
//S01x
delta |= 2;
if (diff >= 64) {
//S011
delta |= 1;
}
//S00x
} else {
if (diff >= 4) {
//S001
delta |= 1;
}
}
}
178
//////////////////////////////////////////////////////////////////////////////
//DPCM decoder, 4:1 compression.............………...//
//Works with 16-bit mono WAV-file on big-endian//
//systems....................................……………………..//
//...........................................………………………...//
//Ivar Løkken, NTNU, 2004....................…………..//
/////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void)
{
FILE *fp, *op;
fp = fopen("in.dp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
char delta = 0; //current dpcm input value
int valpred = 32767; //predicted output value
short valout; //ouput value for writing
char inputbuffer = 0; //2-sample input buffer
int bufferstep = 0; //toggle between inputbuffer/input
179
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
//RUN DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
//step 1, read the 8-bit buffer containing two samples and put the right one to the delta variable
if (bufferstep) {
delta = inputbuffer & 0x0f;
} else {
fread(&inputbuffer, sizeof(char), 1, fp);
delta = (inputbuffer >> 4) & 0x0f;
}
//the above must be done every second run so that the char is split
//and read into two deltas, since there are two residuals in each
bufferstep = !bufferstep;
//update predicted output value (last value + dequantized current difference)
valpred += quantTable[delta & 0x0f];
//limit output value to 16-bits
if ( valpred > 32767 )
valpred = 32767;
else if ( valpred < -32768 )
valpred = -32768;
valout = 0;
//reverse endian and write to wav-file
valout = ((valpred & 0x00ff)<<8);
valout = (valout | ((valpred & 0xff00)>>8));
fwrite(&valout, sizeof(short), 1, op);
}
}
fclose(fp);
fclose(op);
}
180
IMA ADPCM encoder and decoder:
/////////////////////////////////////////////////////////////////////////////////
//IMA ADPCM compatible encoder, 4:1 compression//
//Works with 16-bit mono WAV-file on big-endian…//
//systems....................................……………………….//
//...........................................…………………………...//
//Ivar Løkken, Mar. 2004.....................……………….//
////////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void) {
FILE *fp, *op;
if (fp) {
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
struct adpcm_state *state; //encoder status structure
short value; //current input sample value
short value_temp; //temp value for endian-flip
int sign; //current adpcm sign bit
int delta; //current adpcm output value
int diff; //difference (prediction result)
int step; //stepsize
int valpred; //predicted output value
int vpdiff; //current change to valpred
int index; //step change index
char outputbuffer; //2 sample buffer
int bufferstep = 1; //toggle between outputbuffer/output
181
char out; //output variable
//START COMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
182
if ( sign ) {
diff = (-diff);
}
//Quantize
delta = 0; //output value initialization
vpdiff = (step >> 3); //vpdiff = step/8
if (diff >= step) { //if the difference diff is bigger than step
delta = 4; //first value bit is set (4=100)
diff -=step; //decrement diff by value step
vpdiff += step; //vpdiff = step/8+step = 9step/8*/
}
step >>=1; //rightshift step 1 bit
if (diff >= step) { //diff bigger than new step (step/2)?
delta |=2; //if yes, set second bit
diff -= step; //decrement diff by value step
vpdiff += step; //vpdiff = 9step/8 + step/2 = 13step/8
}
step >>=1; //rightshift step 1 bit
if (diff >= step) { //diff bigger than new step?
delta |=1; //set the third and final value bit
vpdiff += step; //vpdiff = 13step/8 + step/4 = 15step/8
} //(the same as absolute value for step + sign bit)
183
fclose(fp);
fclose(op);
}
//////////////////////////////////////////////////////////////////////////////////
//IMA ADPCM compatible decoder, 1:4 compression//
//Works with 16-bit mono WAV-file on big-endian…//
//systems....................................……………………….//
//...........................................…………………………...//
//Ivar Løkken, Mar. 2004.....................……………….//
////////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void)
{
FILE *fp, *op;
fp = fopen("in.adp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
if (fp)
{
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
struct adpcm_state *state; //encoder status structure
short value_out = 0; //output value
int sign; //current adpcm sign bit
int delta; //current adpcm output value
int diff; //difference, current and previous value
int step; //stepsize
int valpred; //predicted output value
int vpdiff; //current change to valpred
184
int index; //step change index
char inputbuffer; //place to keep previous 4-bit value
int bufferstep = 0; //toggle between outputbuffer/output
//START DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
185
bufferstep = !bufferstep;
//restore sign
if ( sign )
valpred -= vpdiff;
else
valpred += vpdiff;
186
µ-law encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//mu-law encoder, 2:1 compression...........………....//
//Works with 16-bit mono WAV-file on big-endian//
//systems..................................……………………...//
//.......................................………………………......//
//Ivar Løkken, Mar. 2004.....................……………//
////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
//if you do not want to use table, the exponent can be found with the following
//equation. Lookup-table requires memory, but is faster
// value_temp = (value << 1);
// for (exp = 7; exp > 0; exp--) {
// if (value_temp & 0x8000) break;
// value_temp = (value_temp << 1);
// }
int main(void)
{
FILE *fp, *op;
if (fp) {
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short sign = 0; //sign-bit
char exp = 0; //exponent (position of rightmost 1)
short mantis = 0; //mantissa
unsigned char outputbuffer = 0; //output buffer
187
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
// RUN COMPRESSION
// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 0x0080;
} else {
sign = 0x0000;
}
// clip value
if (value > 32635) {
value = 32635;
}
// add bias
value = value + 0x84;
188
// get the mantissa
mantis = (value >> (exp + 3)) & 0x000f;
//////////////////////////////////////////////////////////////////////////////
//mu-law decoder, 2:1 compression...........………....//
//Works with 16-bit mono WAV-file on big-endian//
//systems..............................…………………….......//
//................................……………………….............//
//Ivar Løkken, Mar. 2004..............…………….......//
////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void)
{
FILE *fp, *op;
if (fp) {
//wav file info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//data variables
short valout = 0; //ouput value for writing
unsigned char inputbuffer = 0; //input buffer
char sign = 0; //sign
char mantis = 0; //mantissa
char exp = 0; //exponent
short out = 0; //ouput variable
189
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
//RUN DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
190
iLaw encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//Custom mu-law-based encoder..........……….........//
//Works with 16-bit mono WAV-file on big-endian//
//systems...................................……………………..//
//.......................................………………………......//
//Ivar Løkken, Mar. 2004.............……………........//
////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
int main(void)
{
FILE *fp, *op;
if (fp) {
//wav file info variables
char id[4];
unsigned long size, data_size, data_size_sw, loop;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//predictor variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short valprev = 0; //previous value
int dif; //first order prediction value
int dif2 = 0; //second order prediction value
short d2 = 0; //predictor output
int difprev = 0; //previous first order prediction value
int d2o; //decoded error value for feedback
//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short exp = 0; //exponent (position of rightmost 1)
unsigned short mantis = 0; //mantissa
unsigned short outputbuffer[8]; //output buffer
unsigned short shortbuffer[5]; //16-bit buffer for writing to file as short
int i = 0;
191
//read and write wav info
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
// RUN COMPRESSION
for(i=0; i<=7;i++){
// convert to sign-magnitude
if (d2 < 0) {
d2 = (-d2);
192
sign = 0x0200;
} else {
sign = 0x0000;
}
// clip value
if (d2 > 32635) {
d2 = 32635;
}
// add bias
d2 = d2 + 0x84;
//////////////////////////////////////////////////////////////////////////////
//Custom mu-law-based decoder.............………......//
//Works with 16-bit mono WAV-file on big-endian//
//systems.............................……………………........//
//..................................………………………...........//
//Ivar Løkken, Mar. 2004.........……………............//
////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
193
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
//static int exp_lut[8] = {0,
int main(void)
{
FILE *fp, *op;
if (fp) {
//data variables
unsigned short inputbuffer[5]; //input buffer
unsigned short tempbuffer[8]; //decoded sample buffer
int valout = 0; //ouput value for writing
char i; //counting variable
//decoder variables
unsigned short sign = 0;
unsigned short mantis = 0;
unsigned short exp = 0;
//predictor
short out = 0; //output variable
int difout = 0; //difference
int d1out = 0; //difference of differences (2nd order)
194
printf("Error: not a RIFF-file\n");
}
//RUN DECOMPRESSION
for( ; loop>0; loop=loop-1) {
//RUN DECOMPRESSION
for(i=0; i<= 7; i++) {
//find sign, exponent, mantissa
sign = tempbuffer[i] & 0x0200;
exp = (tempbuffer[i]>>6) & 0x0007;
mantis = (tempbuffer[i] & 0x003f);
//prediction
d1out += difout;
valout += d1out;
195
Entropy coding tester, Rice-, Pod- and iPod encoder and
decoder
///////////////////////////////////////////////////////////////////
//entropy coding test program................……//
//pod vs rice vs ipod test encoder...........…...//
//no prediction, but it......................………...//
//can easily be included in main if desired...//
//...........................................………………..//
//Ivar Løkken, NTNU 2004.....................….//
//x86 users, remove byteswapping.............…//
//////////////////////////////////////////////////////////////////
#include <stdio.h>
//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
unsigned short out = 0; //output variable
unsigned short maxwordlength = 0; //max wordlength indicator
unsigned char coding = 0; //Pod or Rice selector
unsigned char prefixbits = 0; //number of bits in the prefix
//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k = 6;
unsigned long A = 0; //accumulated value for calculation of k
unsigned char N = 0; //sample count
short i = 0; //counting variable
short j = 15; //counting variable
short x = 0; //how often is new k calculated
//encoder functions
void pod_encoder(void);
void rice_encoder(void);
void ipod_encoder(void);
int main(void)
{
if (fp) {
//read and write wav header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
196
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
// RUN COMPRESSION
printf("Please select encoding method (0 = Pod-coding, 1 = Rice-coding, 2 = iPod-coding): ");
scanf("%u", &coding);
if (coding == 0) {
pod_encoder();
} else if (coding == 1) {
rice_encoder;
} else if (coding == 2) {
ipod_encoder();
}
}
fclose(op);
fclose(fp);
fclose(tp);
}
//pod encoder
void pod_coder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
197
value = (value | ((value_temp & 0xff00)>>8));
// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {
sign = 0;
}
// perform Pod-coding
// find overflow
overflow = 0;
overflow = value >> k;
fwrite(&numzeros, sizeof(char), 1, tp);
// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
198
j=15;
out=0;
}
}
// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
}
//rice encoder
void rice_encoder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {
199
sign = 0;
}
// perform Rice-coding
// find overflow
overflow = 0;
overflow = value >> k;
// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
200
//new k is calculated for every n samples, where (x==n) is given in if
//comment out if when you want new k calculated for every sample
//if (x==4 || N ==255) {
for (k=0; (N<<k)<A; k++);
x=0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}
}
printf("Max wordlength: %d \n", maxwordlength);
}
//iPod encoder
void ipod_encoder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {
sign = 0;
}
// perform iPod-coding
// find overflow
overflow = 0;
overflow = value >> k;
// shift coding up one number
overflow = overflow + 1;
fwrite(&prefixbits, sizeof(char), 1, tp);
201
// if the value is positive
if (sign == 0) {
// zeros followed by overflow
//zeros
for (i=prefixbits; i>0; i--) {
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}
// overflow
for (i=prefixbits; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
// if the value is negative, output 1's and inverted overflow
} else if (sign == 1) {
//ones
for (i=prefixbits; i>0; i--) {
out = out | bittab[j];
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}
// inverted overflow
for (i=prefixbits; i>0; i--) {
if ((overflow & bittab[i-1]) == 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
}
202
N++;
A+=value;
x++;
// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
//if (x==64 || N == 255) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}
}
printf("Max wordlength: %d \n", maxwordlength);
}
///////////////////////////////////////////////////////////////////
//entropy coding test program................……//
//pod vs rice vs ipod test decoder...........…...//
//no prediction, but it......................………...//
//can easily be included in main if desired...//
//...........................................………………..//
//Ivar Løkken, NTNU 2004.....................….//
//x86 users, remove byteswapping.............…//
//////////////////////////////////////////////////////////////////
#include <stdio.h>
//data variables
unsigned short in = 0; //input variable variable
short out = 0;
short valout = 0;
unsigned short maxwordlength = 0; //max wordlength indicator
unsigned char coding = 0; //Pod or Rice selector
unsigned char prefixbits = 0; //number of bits in the prefix
//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k = 6;
unsigned long A = 0; //accumulated value for calculation of k
unsigned char N = 0; //sample count
short i = 0; //counting variable
short j = 15; //counting variable
short x = 0; //how often is new k calculated
//decoder functions
void pod_decoder(void);
void rice_decoder(void);
void ipod_decoder(void);
203
int main(void)
{
if (fp) {
//read and write wav header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
// RUN COMPRESSION
printf("Please select decoding method (0 = Pod-coding, 1 = Rice-coding, 2 = iPod-coding): ");
scanf("%u", &coding);
if (coding == 0) {
pod_decoder();
} else if (coding == 1) {
rice_decoder;
} else if (coding == 2) {
ipod_decoder();
}
}
fclose(op);
204
fclose(fp);
fclose(tp);
}
// read sign
sign = in & bittab[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
// count zeros
numzeros = 0;
while ((in & bittab[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fwrite(&numzeros, sizeof(char), 1, tp);
205
A=0;
}
void rice_decoder(void)
{
//read the 16 first bits
fread(&in, sizeof(short), 1, fp);
// read sign
sign = in & bittab[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
206
N++;
A += valout;
x++;
//new k is calculated for every n+1 samples, where (x==n) is given in if
//comment out if when you want new k calculated for every sample
//if (x==4 || N==65535) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}
void ipod_decoder(void)
{
//read the 16 first bits
fread(&in, sizeof(short), 1, fp);
207
prefixbits++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fclose(tp);
// and put "deinverted" overflow to variable
for (i=(prefixbits); i>0; i--) {
if (in & bittab[j] == 0) {
overflow = overflow | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
}
// remove upshift
overflow = overflow - 1;
valout = 0;
valout = (overflow << k);
// read the next part (k bits) bit by bit and construct output
for (i=k ; i>0; i--) {
if ((in & bittab[j]) != 0) {
valout = valout | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
208
Final lossless codec, encoder and decoder
//////////////////////////////////////////////////////////////////////
//compression test program..........……….........//
//pod-encoding........................……………......//
//selectable prediction......................……….…//
//from no prediction up to fourth order....…...//
//mono or stereo.............................…………...//
//...........................................………………….//
//Written for Macintosh, Intel users remove...//
//endian conversion....................………….......//
//........................................…………………....//
//encoder...............................……………........//
//.........................................…………………...//
//Ivar Løkken, NTNU, 2004............……........//
/////////////////////////////////////////////////////////////////////
#include <stdio.h>
//predictor variables
long valprev[2] = {0,0}; //previous value
long diff[2] = {0,0}; //difference
long diffprev[2] = {0,0}; //previous difference
long diff2[2] = {0,0}; //second order difference
long diff2prev[2] = {0,0}; //previous second order difference
long diff3[2] = {0,0}; //and so forth, [2] because of stereo
long diff3prev[2] = {0,0}; //one for each channel
long diff4[2] = {0,0};
long diff4prev[2] = {0,0};
long residual = 0; //prediction residual
//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, output wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
int chandec = 0; //channel decorrelation indicator
//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short left = 0; //left channel value
short right = 0; //right channel value
long side = 0; //Side = L-R
//misc. variables
short i = 0; //counting variable
short j = 15; //counting variable
unsigned char m = 0; //left/right indicator
short x = 0; //how often is new k calculated
unsigned short out = 0; //output variable
unsigned short maxwordlength = 0; //max wordlength indicator
int order = 0; //prediction order
//compress function
209
void compress(long invalue);
FILE *fp, *op, *tp;
//main routine
int main(void)
{
fp = fopen("reference.wav", "rb"); //open wav-file for reading
op = fopen("out.comp", "wb"); //open output-file for writing
tp = fopen("test.hex", "wb"); //test file for whatever test data
//the user will include
if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//select parameters
printf("Select prediction order (0, 1, 2, 3 or 4): ");
scanf("%u", &order);
printf("%u", order);
printf("\nDo you want to include channel decorrelation (0=no, 1=yes)? ");
scanf("%u", &chandec);
210
printf("%u", chandec);
// RUN COMPRESSION
// check if order is ok
if (order < 0 | order > 4) {
printf("Error, invalid prediction order \n");
break;
}
// byteswap channels variable
channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));
//compression routine
void compress(long invalue)
{
//0th, 1st or 2nd order prediction, depending on what's chosen
if (order == 0) {
residual = invalue;
211
} else if (order == 1) {
residual = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 2) {
residual = diff[m] - diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 3) {
residual = diff2[m]-diff2prev[m];
diff2prev[m] = diff2[m];
diff2[m] = diff[m] - diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 4) {
residual = diff3[m]-diff3prev[m];
diff3prev[m] = diff3[m];
diff3[m] = diff2[m]-diff2prev[m];
diff2prev[m] = diff2[m];
diff2[m] = diff[m]-diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
}
// convert to sign-magnitude
if (residual < 0) {
residual = (-residual);
sign = 1;
} else {
sign = 0;
}
// perform Pod-coding
// find overflow
overflow = 0;
overflow = residual >> k[m];
fwrite(&numzeros, sizeof(char), 1, tp);
212
// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
213
//}
// reset accumulation every 255th sample
if (N[m]==255) {
N[m]=0;
A[m]=0;
}
}
//////////////////////////////////////////////////////////////////////
//compression test program..........……….........//
//pod-encoding........................……………......//
//selectable prediction......................……….…//
//from no prediction up to fourth order....…...//
//mono or stereo.............................…………...//
//...........................................………………….//
//Written for Macintosh, Intel users remove...//
//endian conversion....................………….......//
//........................................…………………....//
//decoder...............................……………........//
//.........................................…………………...//
//Ivar Løkken, NTNU, 2004............……........//
/////////////////////////////////////////////////////////////////////
#include <stdio.h>
//data variables
unsigned short in = 0; //input variable variable
long outvar = 0; //output variable from function call
short PCMout_t = 0; //16-bit output data
short PCMout = 0; //16-bit output data right endian
long side = 0; //side band (left-right)
long left = 0; //left channel
long right = 0; //right channel
//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short numzeros=0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, estimation of output length
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
//predictor variables
long residual = 0; //decoded residual
long diff[2] = {0,0}; //calculated difference when 2nd o. pred
long diff2[2] = {0,0}; //3rd order
long diff3[2] = {0,0}; //4th order
long out[2] = {0,0};
//misc variables
short i = 0; //counting variable
short j = 15; //counting variable
unsigned char m = 0; //channel indicator
214
short x = 0; //how often is new k calculated
unsigned int order = 0; //prediction order
int chandec = 0; //channel decorrelation indicator
//decompression function
long decompress(void);
//main routine
int main(void)
{
fp = fopen("out.comp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
tp = fopen("test.hex", "wb"); //test-files for whatever the user will store
if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
215
//enter parameters
printf("Select prediction order (same as used in encoder): ");
scanf("%u", &order);
printf("\nIs channel decorrelation used in compressed file (0=no, 1=yes)? ");
scanf("%u", &chandec);
//RUN DECOMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {
//check if order is ok
if (order < 0 | order > 4) {
printf("Error, invalid prediction order \n");
break;
}
216
fwrite(&PCMout, sizeof(short), 1, op);
data_size_sw=data_size_sw-2;
//right channel write
outvar = right;
if (outvar > 32767)
outvar = 32767;
if (outvar < -32768)
outvar = -32768;
PCMout_t = outvar;
//convert back to big endian
PCMout = ((PCMout_t & 0x00ff)<<8);
PCMout = (PCMout | ((PCMout_t & 0xff00)>>8));
//write output value
fwrite(&PCMout, sizeof(short), 1, op);
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
}
fclose(fp);
fclose(op);
}
//decompression function
long decompress(void) {
// read sign
sign = in & bittabs[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
// count zeros
numzeros = 0;
while ((in & bittabs[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fwrite(&numzeros, sizeof(char), 1, tp);
// read the next part (numzeros and k bits) bit by bit and construct output
residual = 0;
for (i=(numzeros+k[m]) ; i>0; i--) {
if ((in & bittabs[j]) != 0) {
residual = residual | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
217
}
218
Hybrid lossless/lossy encoder and decoder
///////////////////////////////////////////////////////
//Hybrid lossless/lossy codec..…….//
//.........................……………….......//
//10 bit per sample output rate…….//
//LSB-removal or mono-samples.....//
//lossy-mode...................…………...//
//fixed 2nd order prediction.…….....//
//and Pod-encoding.............………..//
//.............................………………...//
//Written for Macintosh........……....//
//intel users, remove endian conv....//
//......................………………..........//
//encoder...........……………............//
//...................………………............//
//Ivar Løkken.............…………......//
//NTNU, 2004..........………....…...//
/////////////////////////////////////////////////////
#include <stdio.h>
#include <math.h>
//wav-header data
char id[4];
long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;
//prediction variables
long valprev[2] = {0,0}; //previous value
long diff[2] = {0,0}; //difference
long diffprev[2] = {0,0}; //previous difference
long residual = 0; //prediction residual
//encoding variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k[2] = {6,6}; //variable k, wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
//lossy-mode variables
unsigned char lsb_rem = 0; //lsbs to be removed if lossy-mode 1
unsigned char mono_samples = 0; //samples in a frame to be sent if mono in lossy-mode 2
unsigned char header = 0; //frame header
unsigned char frame_length = 128; //frame length in samples
unsigned int lossy_mode = 0; //selects lossy mode (0 = none, 1 = lsb_removal, 2 = mono, 3 = mono test)
219
//counting variables
short i = 0;
short j = 15;
unsigned short outbuf_pos = 0;
short y = 0;
//functions
void mono_test_only(void);
void compress_frame(unsigned char length);
void compress_sample(void);
void read_write_wavinfo(void);
void read_frame(void);
void write_header(void);
void write_frame(void);
void check_lossy(void);
//main program
int main(void)
{
fp = fopen("modernlive2.wav", "rb"); //open wav-file for reading
op = fopen("out.comp", "wb"); //open output-file for writing
read_write_wavinfo();
printf("please select lossy-mode (0=none, 1 = lsb removal, 2 = mono samples, 3 = mono samp. test only): \n");
scanf("%u", &lossy_mode);
while (data_size_sw > 0) {
if (lossy_mode == 3) {
mono_test_only();
break;
} else {
if (channels == 1 | channels == 2) {
//overrides lossy-mode selection if signal is mono.
if (channels == 1)
lossy_mode = 0;
outbuf_pos = 0;
read_frame();
if(data_size_sw < 0)
break;
compress_frame(frame_length);
write_frame();
if (lossy_mode != 0)
check_lossy();
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
}
fclose(op);
fclose(fp);
}
220
//function that reads wav header and copies it to output file
void read_write_wavinfo(void)
{
// read wave header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
//function that reads a frame of data and puts it in the input buffer
void read_frame(void)
{
for(y=0; y<frame_length; y++) {
if(data_size_sw<0)
break;
// read input value and change endian
fread(&value_temp, sizeof(short), 1, fp);
data_size_sw=data_size_sw-2;
inputbuffer[y] = 0;
221
inputbuffer[y] = ((value_temp & 0x00ff)<<8);
inputbuffer[y] = (inputbuffer[y] | ((value_temp & 0xff00)>>8));
//remove LSBs if lossy mode 1
if (lossy_mode == 1) {
//convert to sign magnitude
if (inputbuffer[y] < 0) {
inputbuffer[y] = (-inputbuffer[y]);
i = 1;
} else {
i = 0;
}
//remove LSBs
inputbuffer[y] = inputbuffer[y]>>lsb_rem;
//back to twos complement
if (i == 1) {
inputbuffer[y]=(-inputbuffer[y]);
}
}
}
}
222
}
//compression routine
void compress_sample(void)
{
//2nd order prediction
residual = diff[LR] - diffprev[LR];
diffprev[LR] = diff[LR];
diff[LR] = inputbuffer[y] - valprev[LR];
valprev[LR] = inputbuffer[y];
// convert to sign-magnitude
if (residual < 0) {
residual = (-residual);
sign = 1;
} else {
sign = 0;
}
// perform Pod-coding
// find overflow
overflow = 0;
overflow = residual >> k[LR];
// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out = 0;
}
223
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out=0;
}
}else{
//zeros
for (i=numzeros; i>0; i--) {
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out=0;
}
}
// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out = 0;
}
}
}
224
}
225
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
///////////////////////////////////////////////////////
//Hybrid lossless/lossy codec..…….//
//.........................……………….......//
//10 bit per sample output rate…….//
//LSB-removal or mono-samples.....//
//lossy-mode...................…………...//
//fixed 2nd order prediction.…….....//
//and Pod-encoding.............………..//
//.............................………………...//
//Written for Macintosh........……....//
//intel users, remove endian conv....//
//......................………………..........//
//decoder...........……………............//
//...................………………............//
//Ivar Løkken.............…………......//
//NTNU, 2004..........………....…...//
/////////////////////////////////////////////////////
226
#include <stdio.h>
//outbut buffer
long outbuffer[128];
//predictor variables
long residual = 0; //decoded residual
long diff[2] = {0,0}; //calculated difference when 2nd order
long out[2] = {0,0}; //output variable
//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short numzeros=0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, compressed wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
unsigned char LR = 0; //channel indicator
//counting variables
short i = 0; //counting variable
unsigned char y = 0; //counting variable
short j = 15; //counting variable
char x = 0; //counting variable
//functions
void decompress_frame(unsigned char length);
void decompress_sample(void);
void read_write_wavinfo(void);
void put_back_lsbs(void);
void back_to_stereo(void);
void write_frame_tofile(void);
227
FILE *fp, *op;
//main program
int main(void)
{
fp = fopen("out.comp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
read_write_wavinfo();
228
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}
LR = 0;
// decompress the samples in the frame
if (lossy_mode != 2) {
for (y=0;y<length;y++) {
decompress_sample();
if (channels == 2) {
LR = !LR;
}
}
//if mono-mode, decompress stereo samples first, then mono samples
} else {
y=0;
for (y=0;y<(length-mono_samples);y++) {
decompress_sample();
LR = !LR;
}
229
LR = 0;
y=(length-mono_samples);
while (y<length) {
decompress_sample();
y=y+2;
}
}
}
//decompression routine
void decompress_sample(void)
{
// read sign
sign = in & bittabs[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
// count zeros
numzeros = 0;
while ((in & bittabs[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
230
residual = residual;
// construct output data, depending on prediction order used
diff[LR] += residual;
out[LR] += diff[LR];
outbuffer[y] = out[LR];
}
231
Dropped packet simulator
/////////////////////////////////////////////////
//File that emulates dropped.….//
//packets and different ways....//
//of handling to see how it…...//
//affects audio quality....……...//
//....................…………............//
//Ivar L√∏kken, NTNU 2004.//
///////////////////////////////////////////////
#include <stdio.h>
#include <math.h>
//parameter variables
unsigned int packet_length = 0; //length of packet in samples
unsigned int drop_interval = 0; //how often packets are dropped
unsigned int lost_in_a_row = 0; //how many packets are lost in a row
unsigned int handling_mode = 0; //how to handle lost packets
unsigned int losson = 0; //packet loss on/off indicator
//counting variables
short i = 0;
short j = 0;
//main program
int main(void)
{
fp = fopen("modernlive.wav", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
read_write_wavinfo(); //read wav-header
printf("\nDo you want packets to be lost (0=no, 1=yes): ");
scanf("%u", &losson);
if (losson) {
//select parameters
printf("\nPlease select packet length in samples (max 256): ");
scanf("%u", &packet_length);
printf("\nPlease select packet drop interval: ");
scanf("%u", &drop_interval);
printf("\nPlease select how many packets should be dropped each time (default = 1): ");
scanf("%u", &lost_in_a_row);
printf("\nPlease select handling mode (1 = insert silence, 2 = repeat last OK packet): ");
scanf("%u", &handling_mode);
//run
run_with_drop();
} else {
//run with no packet loss
232
run_no_drop();
}
}
233
//back up last packet in case of repetition mode selected
last_packet[j] = value;
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
}
if (data_size_sw<0) break;
}
//dropped packets
for (i=(drop_interval-lost_in_a_row);i<drop_interval;i++) {
for (j=0;j<packet_length;j++) {
//if handling mode 1, write silence to output file
if (handling_mode==1) {
fread(&value, sizeof(short), 1, fp);
fwrite(&silence, sizeof(short),1,op);
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
//if handling mode 2, write last OK packet to output file
} else if (handling_mode == 2) {
fread(&value, sizeof(short), 1, fp);
fwrite(&last_packet[j], sizeof(short),1,op);
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
}
}
if (data_size_sw<0) break;
}
}
fclose(fp);
fclose(op);
}
234
Appendix 7. MatLab Scripts
Prediction w. selectable filter and resulting entropy
calculation
function [ErLeft, ErRight]=decorr(path, B, A)
%Matlab-function for intra-channel decorrelation of wavfile.
%The function plots histogram and calculates entropy
%
%FIR- or IIR-filters of any order may be used
%
%Designed for two-channel 16-bit wavefile
%
%[error_left, error_right]=diffdecorr('c:\path\filename.wav', B, A)
%
%B and A are filter coefficients
%a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
% - a(2)*y(n-1) - ... - a(na+1)*y(n-na)
%
%Made by: Ivar Løkken, 19/1-04
signal=wavread(path);
samples=signal*(2^15-1);
%Removes normalization of wavefile.
%The function Wavread normalizes sample values to [-1 1]
%actual sample values for 16-bits is [-32767 32767]
vector=samples(:);
%Puts the two channels in one vector
LCH=vector(1:length(vector)/2);
RCH=vector(length(vector)/2+1:length(vector));
%separates left channel from right
ErLCH=filter(B,A,LCH);
ErRCH=filter(B,A,RCH);
%calculates prediction error
subplot(2,1,1);
hist(ErLCH,min(ErLCH):max(ErLCH));
title('Histogram, predicted error Left Channel');
ylabel('Number of occurances');
xlabel('sample value');
subplot(2,1,2);
hist(ErRCH,min(ErRCH):max(ErRCH));
title('Histogram, predicted error Right Channel');
ylabel('Number of occurances');
xlabel('sample value');
%plots normalized histograms
histoL=hist(ErLCH,min(ErLCH):max(ErLCH));
probsL=histoL/sum(histoL);
histoR=hist(ErRCH,min(ErRCH):max(ErRCH));
probsR=histoR/sum(histoR);
%generates probability distribution based on histogram
IL=-log2(probsL);
IL(IL==-Inf) = 0;
IL(IL==Inf) = 0;
prodL=probsL.*IL;
ErLeft=sum(prodL);
IR=-log2(probsR);
235
IR(IR==-Inf) = 0;
IR(IR==Inf) = 0;
prodR=probsR.*IR;
ErRight=sum(prodR);
%Calculates entropy using standard formula
signal=wavread(path);
samples=signal*(2^15-1);
%Removes normalization of wavefile.
%The function Wavread normalizes sample values to [-1 1]
%actual sample values for 16-bits is [-32767 32767]
vector=samples(:);
%Puts the two channels in one vector
LCH=vector(1:length(vector)/2);
RCH=vector(length(vector)/2+1:length(vector));
%separates left channel from right
histoL=hist(LCH,min(LCH):max(LCH));
probsL=histoL/sum(histoL);
histoR=hist(RCH,min(RCH):max(RCH));
probsR=histoR/sum(histoR);
%generates probability distribution based on histogram
IL=-log2(probsL);
IL(IL==-Inf) = 0;
IL(IL==Inf) = 0;
prodL=probsL.*IL;
Left=sum(prodL);
IR=-log2(probsR);
IR(IR==-Inf) = 0;
IR(IR==Inf) = 0;
prodR=probsR.*IR;
Right=sum(prodR);
%Calculates entropy using standard formula
236
Lossy compression error calculator
function [PCMerror, SER] = errorcal(infile, outfile)
%%%%%%%%%%%%%%%%%%%%%
%Funtion that calculates SER and............%
%error rate for lossy compression...….....%
%.......................................……………....%
%Ivar Løkken, NTNU 2002.............…....%
%%%%%%%%%%%%%%%%%%%%%
%[Error, SER] = errorcal(infile, outfile)..%
%.......................................……………....%
%%%%%%%%%%%%%%%%%%%%%
in_t = wavread(infile);
compfile_t = wavread(outfile);
if size(in_t)<size(compfile_t)
siz = size(in_t);
else
siz = size(compfile_t);
end
in = in_t(1:siz);
compfile = compfile_t(1:siz);
subplot(3,1,1), plot(in);
title('Input (normalised & quantised)');
subplot(3,1,2), plot(compfile);
title('Output');
subplot(3,1,3), plot(error);
title('Error');
s=sprintf('SNR = %4.1fdB\n', SER);
text(0.5,-90,s);
s=sprintf('Max absolute error (normalized) = %4.1fdB\n', PCMerror);
text(0.5,-110,s);
function centroid=speccent(path)
237
Spectral centroid calculator
%function for calculating spectral centroid of wav-file
%
%Ivar Løkken
%
%centroid=speccent(path)
%where path is wavefile path,
%for instance. 'c:\music\file.wav'
%read signal
[signal, FS, NBITS] = wavread(path);
N=length(signal)+1;
FT=abs(fft(signal));
%calculate centroid
sumFA=0;
sumA=0;
for i=1:N/2
sumFA=sumFA+i*FTfilt(i);
sumA=sumA+FTfilt(i);
i=i+1;
end
cent=sumFA/sumA;
238
Appendix 8. Tools Used During Development
Hardware
Apple Powerbook G4, 1Ghz/512Mb/40Gb/12”, running Mac OS-X 10.3 ”Panther”
and Windows 2000 SP4 through Virtual PC 6.
Toshiba Satelite 4070CDS Celeron, 366Mhz/192Mb/4Gb/13”, running Windows
2000 SP4.
Eizo FlexScan F57 external 17” CRT-monitor.
The WLS hardware developed as part of this thesis.
Software
General programming: Xcode Tools v.1.1 for OS-X.
Schematics design: DesignWorks Lite 4.5 for OS-X.
Calculations, testing: Mathworks MatLab 6.5 for Unix/OS-X.
Analog simulations: AimSpice 3.8 for Windows, MacInit Mi-Sugar 0.5.2 for OS-X.
Chart and diagram drawings: The Omni Group OmniGraffle for OS-X.
MCU-programming: SDCC microcontroller compiler for UNIX/OS-X.
CC2400-setup: Chipcon SmartLink RF for Windows.
General documentation: Microsoft Office-X for OS-X.
239