Você está na página 1de 240

Master Thesis

Wireless Loudspeaker
System With Real-Time
Audio Compression
Author: Ivar Loekken
Employer: Chipcon AS
University: Norwegian University of Technology and Science (NTNU)
Instructor: Robin Osa Hoel, Chipcon
Language: English
Number of pages: 240 including appendixes.

Abstract: Hardware for a fully digital wireless loudspeaker system based


around the Chipcon CC2400 RF-transceiver has been designed.
Research on suitable low-complexity compression algorithms is
documented. This includes both lossy and lossless
compression. In both cases, algorithm suggestions have been
made based on measurements, complexity estimations and
listening tests. A lossy algorithm, iLaw, is presented, which
improves µ-law encoding to provide audio quality comparable
to MP3. A lossless algorithm is suggested, which features a
lossy-mode to provide constant bitrate with minimal quality
degradation. The algorithm is based around Pod-coding, a
scheme not previously used in any compression software. Pod-
coding is simple, efficient and has properties that are very
advantageous in a real-time application.

Keywords: Audio compression, low-complexity, lossless, lossy, Pod-


encoding, Rice-encoding, µ-law, ADPCM, wireless
loudspeaker

Ivar Loekken, 28/5-2004.


1
Introduction

This thesis will cover the work done developing a system for wireless audio
transmission. The intended application is a wireless loudspeaker system1 where a hifi
control and playback unit transmits data to remote active speakers using an RF
tranceiver.

This concept is not new, but while most such systems use analog FM transfer, which
will inevitably compromise audio quality, the transmission in this WLS will be fully
digital with AD-conversion in the transmitter and DA-conversion in the receiver. A
digital input will also be available. The transmission will be done using a Chipcon
CC2400 RF-transceiver with 1Mbps transfer rate. Chipcon, who is the employer for
this project, intends to use the WLS as a demonstration or reference design for the
CC2400.

The informed reader might notice that the 1Mbps transfer rate is insufficient for CD-
quality audio, which requires about 1.4Mbps. This will be resolved using real-time
compression. The main focus of the thesis work has been on developing a low-
complexity and high-quality compression algorithm that can be run using only a
simple MCU. The employer required the design to be low-cost so separate DSPs or
ASICs for compression was not an option. Both lossless and lossy algorithms have
been explored2.

Originally, development was intended to be done using a MCU evaluation board.


However, none of these had the necessary peripherals. Design of the reference system
was thus included as a part of the thesis. This led to some delays, and also the
manufacturing of the PCB (done by Chipcon) was significantly delayed3. Because of
this, a full implementation in hardware was not achieved before the thesis deadline.
But although implementation is an important task, this has not had any significant
effect on the thesis itself. As mentioned the academical focus was on developing a
suitable compression scheme, and both a custom lossless and lossy algorithm has
been suggested. These algorithms have been tested and documented by writing and
compliling them on a computer4 and running them on waveform audio files.

The thesis is divided into two main parts. The first covers audio compression theory
and gives the reader the basis knowledge necessary to understand how the algorithms
work. The second part covers the deveopement itself and provides documentation of
the work done. This includes both hardware and software design. Finally, the project

1
Througout the thesis, the target application will be referred to as the Wireless Louspeaker System or
simply the WLS.
2
A lossless compression algorithm is one where the output after decompression is identical to the
original data. In lossy algorithms, psychoacoustic models are used to remove audio information that is
not perceptible.
3
This is detailed in the project review, included at the end of the thesis.
4
Apple Powerbook G4 running Mac OS-X 10.3 ”Panther”.

2
itself is reviewed and a discussion around the work process, the achievements made as
well as the academical rewards is presented.

Finally I’d like to thank the following persons who have been of great help during the
project:

- Robin Osa Hoel, my supervisor at Chipcon, for giving of his time to


answer questions, review my work and provide general guidance througout
the project.
- Albert Wegener of Soundspace Audio for providing an evaluation license
of his algorithm MusiCompress for study, and also for patiently answering
questions I’ve had regarding audio compression.
- Tore Barlindhaug, engineer at NTNU, for lending me a computer monitor
the entire semester, so I was releived from the ergonomical strain of
staring at a small laptop display ten hours a day.

3
Table of Contents

1 Wireless Loudspeaker System Description ......................................................11


2 Audio Compression; Theory and Principles ....................................................13
2.1 An information-based approach to digital audio...................................................13
2.2 Lossless compression of audio..................................................................................16
2.2.1 Framing................................................................................................................................. 16
2.2.2 Decorrelation ........................................................................................................................ 17
2.2.2.1 Inter-channel decorrelation ........................................................................................ 17
2.2.2.2 Intra-channel decorrelation ........................................................................................ 18
2.2.2.2.1 Linear prediction................................................................................................... 20
2.2.2.2.2 Adaptive prediction .............................................................................................. 23
2.2.2.2.3 Polyonimal approximation ................................................................................... 24
2.2.3 Entropy-coding..................................................................................................................... 26
2.2.3.1 Run-length encoding (RLE)....................................................................................... 26
2.2.3.2 Huffman-coding ......................................................................................................... 26
2.2.3.3 Adaptive Huffman coding.......................................................................................... 30
2.2.3.4 Rice-coding................................................................................................................. 33
2.2.3.4.1 Calculating the parameter k.................................................................................. 34
2.2.3.5 Pod-coding, a better way to code the overflow......................................................... 36
2.3 Lossy compression of audio ......................................................................................37
2.3.1 The human auditory system................................................................................................. 37
2.3.2 Lossy compression algorithms ............................................................................................ 41
2.3.2.1 MPEG-based algorithms............................................................................................ 42
2.3.2.2 Differential Pulse Code Modulation (DPCM) .......................................................... 44
2.3.2.3 Adaptive DPCM (ADPCM)....................................................................................... 45
2.3.2.3.1 IMA ADPCM adaptive quantizer ........................................................................ 46
2.3.2.4 µ-Law.......................................................................................................................... 49
3 Hardware Design.............................................................................................53
3.1 Selection of components ............................................................................................53
3.1.1 RF-transceiver: the Chipcon SmartRF! CC2400................................................................ 54
3.1.2 Audio codec.......................................................................................................................... 55
3.1.3 SP-dif receiver...................................................................................................................... 57
3.1.4 Selection of microcontroller ................................................................................................ 59
3.1.4.1 Speed requirements .................................................................................................... 59
3.1.4.2 Memory requirements ................................................................................................ 60
3.1.4.3 I/O requirements......................................................................................................... 61
3.1.4.4 Evaluated microcontrollers ........................................................................................ 62
3.1.4.4.1 Atmel AVR Mega169L and Mega32L ................................................................ 63
3.1.4.4.2 Texas Instruments MSP430F1481 ....................................................................... 64
3.1.4.4.3 Motorola DSP56F801........................................................................................... 65
3.1.4.4.4 Hitachi/Rensas R8C/10 Tiny................................................................................ 66
3.1.4.4.5 Silicon Laboratories C8051F005 ......................................................................... 67
3.1.5 Conclusions: ......................................................................................................................... 68
3.2 Audio transfer to MCU .............................................................................................69
3.2.1 Principle for data transfer, audio device - MCU................................................................. 69
3.2.2 Realization of data transfer, audio device - MCU .............................................................. 70
3.2.2.1 Serial-to-parallell and parallell-to-serial conversion ................................................ 70
3.2.2.2 Design of logic to create necessary control signals .................................................. 73
3.3 Circuit design .............................................................................................................75
3.3.1 Configuration of the SP-dif receiver. .................................................................................. 76
3.3.2 Configuration of the audio codec ........................................................................................ 77

4
3.3.3 Configuration of the RF-transceiver.................................................................................... 79
3.3.4 Configuration of the MCU IO ............................................................................................. 80
3.3.5 The finished circuit .............................................................................................................. 83
4 Analysis of Lossy Compression Algorithms.....................................................86
4.1 Reference for comparison; 8-bit and 4-bit LPCM ................................................87
4.2 Analysis of 4-bit DPCM ............................................................................................88
4.3 Analysis of IMA ADPCM .........................................................................................90
4.4 Analysis of µ-law ........................................................................................................91
4.5 Reference for comparison II: MP3..........................................................................93
4.6 iLaw: a low-complexity, low-loss algorithm. ..........................................................96
4.7 Notes about the performance measurements .........................................................99
5 Design of Lossless Compression Algorithm...................................................100
5.1 Coding method .........................................................................................................103
5.1.1 Evaluation of Pod-coding and Rice-coding ...................................................................... 103
5.2 iPod: an attempt at improving the Pod-coding....................................................107
5.3 Prediction scheme ....................................................................................................110
5.4 Channel decorrelation.............................................................................................115
5.5 Final algorithm proposal and benchmark............................................................119
5.6 Lossy mode................................................................................................................121
5.6.1 LSB-removal lossy-mode .................................................................................................. 122
5.6.2 Mono samples lossy-mode................................................................................................. 125
6 WLS Implementation Considerations............................................................127
6.1 MCU implementation considerations....................................................................127
6.1.1 Wrap-around arithmetic ..................................................................................................... 127
6.1.2 Look-up tables.................................................................................................................... 128
6.2 RF-link implementation considerations................................................................129
6.2.1 Packet handling .................................................................................................................. 129
6.2.2 Transmission or calculation of k ?..................................................................................... 130
6.2.3 Lost packet handling .......................................................................................................... 130
7 Project Review ...............................................................................................135
8 Summary .......................................................................................................136
9 References .....................................................................................................137

Appendix 1 Data Formats _________________________________________142


Appendix 2 Data Converter Fundamentals____________________________148
Appendix 3 Schematics____________________________________________155
Appendix 4 Components List_______________________________________162
Appendix 5 PCB- Layout__________________________________________163
Appendix 6 Souce-Code C_________________________________________176
Appendix 7 Matlab-Scripts ________________________________________235
Appendix 8 Tools Used During Development__________________________239

5
List of Figures
Figure 1 Wireless loudpeaker system........................................................................................................... 12
Figure 2 Digital representation of audio signal............................................................................................ 13
Figure 3 Histogram of samples in Stevie Ray Vaughan, ”Voodoo Chile” wav-file .................................. 15
Figure 4 Basic principles of lossless audio compression............................................................................. 16
Figure 5 Histogram of mutual and side, "Voodoo Chile", 30s excerpt....................................................... 18
Figure 6 Prediction model [reference 2]....................................................................................................... 19
Figure 7 Histogram, prediction error e[n], "Voodoo Chile", 30s excerpt................................................... 20
Figure 8 Signal flow chart, difference prediction ........................................................................................ 21
Figure 9 General filter-based prediction [reference 2] ................................................................................ 21
Figure 10 Entropy vs. predictor order, fixed FIR predictor........................................................................ 23
Figure 11 The four polynomal approximations of x[n] [reference 2] ......................................................... 25
Figure 12 Binary tree with prefix property code (code 2 from table 3)...................................................... 28
Figure 13 General depiction of Huffman-tree, seven symbols W1-W7 ..................................................... 29
Figure 14 Algorithm FGK processing the ensemble EX: (a) Tree after processing "aa bb"; 11 will be
transmitted for the next b. (b) After encoding the third b; 101 will be transmitted for the next
space; the tree will not change; 100 will be transmitted for the first c. (c) Tree after update
following first c. [reference 9] ............................................................................................................ 31
Figure 15 Complete Huffman-tree for example EX .................................................................................... 32
Figure 16 The human auditory system ......................................................................................................... 37
Figure 17 Cross-section of the cochlea ........................................................................................................ 38
Figure 18 Cochlea filter response................................................................................................................. 39
Figure 19 Masking threshold ........................................................................................................................ 39
Figure 20 The Fletcher-Munson curves (equal loudness curves)................................................................ 40
Figure 21 Temporal masking........................................................................................................................ 41
Figure 22 MP3 encoding and decoding block diagram ............................................................................... 42
Figure 23 AAC compression block diagram................................................................................................ 43
Figure 24 DPCM-encoder block diagram [reference 17] ............................................................................ 44
Figure 25 DPCM decoder block diagram [reference 17] ............................................................................ 45
Figure 26 ADPCM general block diagram [referene 18] ............................................................................ 46
Figure 27 IMA ADPCM stepsize adaptation [reference 18]....................................................................... 47
Figure 28 IMA ADPCM quantization [reference 18].................................................................................. 48
Figure 29 Basic block diagram, wireless audio transceiver ........................................................................ 53
Figure 30 Typical application circuit, Chipcon CC2400 [reference 22]..................................................... 54
Figure 31 Texas Instruments TLV320AIC23B block diagram [reference 24] ........................................... 56
Figure 32 Block diagram, Crystal CS8416 [reference 28] .......................................................................... 58
Figure 33 Communication through a) 2 SPI-ports or b) 1 SPI-port and parallell IO via shift registers.... 61
Figure 34 I2S data transfer timing diagram .................................................................................................. 69
Figure 35 Principle for data transfer between audio device and MCU....................................................... 70
Figure 36 Simplified schematics, 74HC4094N [reference 37] ................................................................... 71
Figure 37 Tming diagram, transfer from audio device to MCU ................................................................. 71
Figure 38 Logic diagram, 74HC166N [reference 38].................................................................................. 72
Figure 39 Timing diagram, transfer from MCU to audio device ................................................................ 72
Figure 40 Logic circuit for generation of control signals ............................................................................ 73
Figure 41 Timing diagram for control signals ............................................................................................. 74
Figure 42 Block diagram, wireless loudspeaker system............................................................................. 75
Figure 43 Configuration of SP-dif receiver.................................................................................................. 76
Figure 44 Recommended filter layout [reference 27].................................................................................. 77
Figure 45 220µF, 330µF, 470µF decoupling caps frequency response, 32/16Ω load ............................... 78
Figure 46 Configuration of audio codec....................................................................................................... 78
Figure 47 Connection, Chipcon CC2400 RF-transceiver............................................................................ 79
Figure 48 C8051F00x IO-system functional block diagram [reference 36]............................................... 80
Figure 49 C8051F00x priority decode table [reference 16] ........................................................................ 81
Figure 50 Configuration of MCU IO CrossBar Decoder ............................................................................ 82
Figure 51 Complete circuit diagram............................................................................................................. 83
Figure 52 Jumper settings ............................................................................................................................. 84
Figure 53 Logic analyzer standard connection ............................................................................................ 84

6
Figure 54 Logic analyzer connections.......................................................................................................... 85
Figure 55 Waveform and spectrum, "littlewing.wav" ................................................................................. 87
Figure 56 Performance measurements, 4-bit and 8-bit LPCM................................................................... 87
Figure 57 4:1 DPCM performance measurement, "Littlewing.wav".......................................................... 89
Figure 58 IMA ADPCM performance measurement, ”Littlewing.wav”.................................................... 90
Figure 59 µ-law performance measurement, ”Littlewing.wav”.................................................................. 92
Figure 60 Measured performance, 128kbps MP3, ”littlewing.wav”.......................................................... 94
Figure 61 Measured performance, 256kbps MP3, ”littlewing.wav”........................................................... 95
Figure 62 10-bit µ-law data format .............................................................................................................. 96
Figure 63 Flowchart, iLaw encoder designed for this thesis....................................................................... 97
Figure 64 Flowchart, iLaw decoder designed for this project..................................................................... 97
Figure 65 Measured performance, custom codec, "littlewing.wav". .......................................................... 98
Figure 66 Waveform of, from top to bottom, "littlewing.wav", "percussion.wav", "rock.wav",
"classical.wav", "jazz.wav" and "pop.wav", Audacity .................................................................... 101
Figure 67 Spectrum of the "littlewing.wav", "percussion.wav", "rock.wav", "classical.wav", "jazz.wav"
and "pop.wav”, Audacity .................................................................................................................. 102
Figure 68 Encoding performance and worst-case word length, all tests averaged................................... 106
Figure 69 Distribution of overflow, "littlewing.wav"................................................................................ 109
Figure 70 Bit-wise polynomal approximation encoder data structure ...................................................... 111
Figure 71 Polynomal selection, framewise polynomal appr., 255 sample frames, Excel........................ 113
Figure 72 Performance, different tested prediction schemes .................................................................... 114
Figure 73 Entropy of channels, mutual and side signals and filesize reduction, average results of files in
table 14 except ”dualmono.wav”...................................................................................................... 118
Figure 74 Performance evaluation, Shorten vs. suggested algorithm for WLS........................................ 120
Figure 75 Algorithm for LSB-removal lossy mode................................................................................... 123
Figure 76 Lossy-mode performance, "modernlive.wav", 30s excerpt, left channel................................. 124
Figure 77 Spectrum with mono-mode, 64-sample frames, ”modernlive.wav”, 30s excerpt. .................. 126
Figure 78 Chipcon CC2400 packet format [reference 22] ........................................................................ 129
Figure 79 Proposed frame for WLS-implementation with transfer of frame-static k .............................. 130
Figure 80 Left: Audibility of difference between method 1 (silence) and 2 (repitition), 1000 packet
"loose interval", 64 sample packet. .................................................................................................. 131
Figure 81 Preferred lost packet handling method ...................................................................................... 132

7
List of Tables
Table 1 Higher-order FIR-prediction [reference 2] ..................................................................................... 21
Table 2 Entropy with FIR-prediction, first to third order, ”Little Wing”, 30s excerpt .............................. 22
Table 3 Two example binary codes [reference 7]....................................................................................... 27
Table 4 Pod-codes vs. Rice-codes ................................................................................................................ 36
Table 5 DPCM nonlinear quantization code [reference 17]........................................................................ 44
Table 6 First table lookup for IMA ADPCM quantizer adaptation [reference 18] .................................... 47
Table 7 Second table lookup for IMA ADPCM quantizer adaptation [reference 18]................................ 47
Table 8 AKM4550 versus TI TLV320AIC32B comparison [references 23 and 24] ................................. 55
Table 9 Crude MIPS requirement estimation for MCU .............................................................................. 59
Table 10 Comparison between seriously considered MCUs [references 30-36]........................................ 62
Table 11 Performance, 8-bit and 4-bit LPCM ............................................................................................. 88
Table 12 DPCM quantization table .............................................................................................................. 88
Table 13 Performance 4-bit DPCM, ”littlewing.wav” (see text) ................................................................ 89
Table 14 Performance 4-bit ADPCM, ”littlewing.wav ............................................................................... 91
Table 15 Performance 8-bit µ-law, ”littlewing.wav” and ”speedtest.wav”................................................ 92
Table 16 Measured performance, LAME MP3, ”littlewing.wav” .............................................................. 93
Table 17 Performance iLaw codec, ”littlewing.wav”.................................................................................. 98
Table 18 Wav-files used for characterization of lossless algorithms........................................................ 100
Table 19 Performance of Rice- and Pod-coding, A and N reset every 256th sample, no prediction,
”littlewing.wav” ................................................................................................................................ 104
Table 20 Performance of Rice- and Pod-coding, A and N reset every 256th sample, 1st order
prediction, ”littlewing.wav”.............................................................................................................. 105
Table 21 Performance of Pod- and Rice-coding with HF-rich file, no prediction, "percussion.wav". ... 105
Table 22 Performance of Pod and Rice coding with HF-rich file, 1st order prediction,
"percussion.wav"............................................................................................................................... 105
Table 23 Regular Pod-coding vs. iPod-coding .......................................................................................... 107
Table 24 Pod-coding vs. iPod coding, filesize reduction (no prediction)................................................. 108
Table 25 Filesize reduction, no pred., 1st order and 2nd order linear pred. ............................................ 111
Table 26 Filesize reduction, sample-wise polynomal approximation....................................................... 111
Table 27 Performance, framewise polynomal approximation, 0th, 1st and 2nd order polynom selection
............................................................................................................................................................ 112
Table 28 Third and fourth order fixed predictor, new k for every sample ............................................... 114
Table 29 Computational cost per sample for the different prediction schemes........................................ 115
Table 30 Recordings used to test stereo decorrelation .............................................................................. 116
Table 31 Results of inter-channel decorrelation ........................................................................................ 117
Table 32 Lossy-mode performance ............................................................................................................ 124

8
List of Acronyms and Abbreviations
A list of acronyms and abbreviations that are not explicitly explained in the text.

ADC: Analog to Digital Converter, also called A/D-converter.


ASIC: Application Specific Integrated Circuit. Circuit custom made for an application.
BPS: Bits Per Second.
CAD: Computer-Aided Design.
CMOS: Complementary Metal-Oxide Semiconductor. The most commonly used method to design
transistors for digital circuits.
Codec: CoderDecoder. An application or program containing both an encoder and a decoder.
CPLD: Complex Programmable Logic Device.
DAC: Digital to Analog Converter, also called D/A-converter.
DAT: Digital Audio Tape. Digital recording and playback medium introduced by Sony in 1987.
DFT: Discrete Fourier Transform. A method to transform signals from the time-domain to the
frequency-domain.
DSP: Digital Signal Processor.
FFT: Fast Fourier Transform. Fast algorithm to perform DFT.
FIR: Finite Impulse Response. Digital filter family that uses only previous input values (no
feedback).
FPGA: Field Programmable Logic Device. Logic device that can be programmed while in-circuit.
IC: Integrated Circuit.
IEC: International Electrotechnical Comission.
IIR: Infinite Impulse response. Digital filter family that uses both previous input and output values.
IO: InOut.
ISM: Industrial, Scientific and Medical radio bands. Reserved for non-commercial use or lisence-
free communications applications.
ISO: International Organisation for Standardization.
LED: Light emitting diode.
LSB: Least Significant Bit. The last figure in a base-two (binary) number.
MCU: MicroController Unit. Single IC containing processor, memory, IO and peripherals.
MIPS: Million Instruction Per Second.
MPEG: Motion Picture Expert Group. Group defining the framework for a wide range of video and
audio compression standards.
MSB: Most Significant Bit. The first figure in a base-two (binary) number.
MUX: Multiplexer. Unit that allows a control signal to select one of several inputs to be routed to an
output.
PCB: Printed Circuit Board.
PCM: Pulse Code Modulation. Method to represent a signal as discrete-time and discrete-amplitude
(digital) values (samples).
PLL: Phase Locked Loop. Circuit with a voltage- or current-driven oscillator that is constantly
adjusted to match in phase (and thus lock on) the frequency of an input signal. Used for clock
recovery, in frequency synthesizers and in demodulators.
PWM: Pulse Width Modulation. A signal representation where the duty cycle (the percentage of a
period when the signal is high) of a high-frequency pulse wave represents the amplitude of the
modulated signal.
RAM: Random Access Memory. Volatile memory used for data storage during operation.
RF: Radio Frequency. Frequency range where a signal if connected to an antenna which will
generate an electromagnetic field. From 9khz to thousands of Ghz.
RISC: Reduced Instruction Set Computing. Processor architectures where a low amount of
instructions are needed to perform the necessary tasks.
RMS: Root-Mean-Square.
ROM: Read Only Memory. Nonvolatile memory often used as program memory.
SNR: Signal-to-Noise Ratio. The ratio beween signal level and noise level. Usually expressed in dB.
SPICE: Simulation Program with Integrated Circuits Emphasis. General purpose analog circuit
simulator.
TTL: Transistor-Transistor Logic. Method to design digital circuits. Uses bipolar transistors which
act on direct-current pulses.

9
Part I

- Theory -

Albert Einstein – in his study at Princeton, 1937

10
1 Wireless Loudspeaker System Description

In the modern hifi-market, it is required by a system to provide high quality audio


playback as well as being user friendly and easy to place in a domestic enovirement.
Especially the latter factor has opened up the demand for wireless solutions. This
makes it possible to have one main playback central, communicating with active
loudpeakers elswhere in the room or even in other rooms.

To date most wireless loudspeaker systems have used analog FM-transfer. This
compromises the quality of playback, analog transfer will inevitably decrease SNR
and increase distortion. However, more recently fully digital RF-transceivers with
high data bandwidth have become cheap and available in the market. Norwegian
circuit manufacturer Chipcon offers amongst others the CC2400 RF-transceiver, a
1Mbps unit operating in the 2.4Ghz ISM-band. They wanted to explore the
possibillities of using it in a wireless loudspeaker system and thus initiated the project
resulting in this thesis.

The wireless loudspeaker system is required to provide CD-quality or almost CD-


quality. Also, compatibility with the digital SP-dif5 output provided with many CD-
players would be an advantage. The CD digital audio format (CD-DA or ”Red-book”)
is specified by the ISO-908 standard. It uses a LPCM (linear pulse code modulation)
digital representation of it’s audio content. It uses 44,100 stereo samples, each at 16
bits, per second. This gives a total bandwidth of

Eq. 1 44,100 hz " 16 bits " 2 = 1,411,200 bits/sec

This is beyond the transfer capability of the Chipcon CC2400. Because of this the
audio must be compressed, and compression must happen in real-time. Since the
hardware was required to have very low cost, the compression algorithm must be of
such nature that it does not require any dedicated hardware. Irrespective of audio
processing, a microcontroller unit (MCU) is necessary to control the data transfer and
setup of the hardware. If this MCU can do the compression as well, the system cost
will been lowered significantly. But it requires a low-complexity scheme. Besides
hardware design, reseach and development of a suitable compression algorithm has
been the main focus of this project.

5
Sony-Philips digital interface formats – it, and other formats and protocols relevant for this thesis, is
presented in appendix 1.

11
Figure 1 shows the intended system. A audio playback unit provides either analog or
digital signals to the transmission module. This performs either AD-conversion or SP-
dif decoding depending on whether the input signal is analog or digital. Then the data
is compressed and transmitted to the RF-transceiver. The receiver module sits in the
loudspeaker. Data is received and decompressed before being DA-converted and fed
to the loudspeaker’s built-in amplifier. Since the transmission is digital, it should not
result in any loss of audio quality. The only significant loss factors are AD- and DA-
conversion, and possibly the compression. These will both be adressed thoroughly.

Figure 1 Wireless loudpeaker system

Audio compression can be divided into two main categories, lossless and lossy
compression. The former has no signal degradation, the decoded output is sample-to-
sample identical with the input. Lossy compression tries to model the human auditory
system to remove audio content that is not perceptible. The ratio between input and
output bandwidth, the compression ratio, of lossless algorithms is limited, usually in
the range of 2:1, while good lossy algorithms can provide ten times that ratio and still
maintain decent audio quality. Another advantage with the lossy approach is that the
output bitrate can be set at whatever the user desires. The effectiveness of lossless
algorithms vary with the input’s data redundancy, or in other words it’s
”compressability”. In the WLS a quite small ratio is required, but the real-time
operation does add some complications when it comes to variable output bitrate. In
this thesis, both lossless, lossy and hybrid6 algorithms have been developed and
studied, and suggestions are made for all alternatives.

6
What is reffered to as a hybrid algorithm is one that is lossless during normal operation, but goes into
a lossy-mode if necessary, for instance when the compression ratio does not meet the instantaneous
bitrate requirements given by the transceiver operating in real-time.

12
2 Audio Compression; Theory and Principles

2.1 An information-based approach to digital audio

A digital audio signal is usually represented by uniformly sampled values with a fixed
word length N, which means that each sample can have a value between –(2N-1) and
(2N-1-1). The digital sample value represents the signal amplitude at a specified instant
(the sample instant) as shown in figure 2. The number of samples per second is
specified by the sampling frequency fS. This technique is called linear quantization or
LPCM (Linear Pulse Code Modulation).

Figure 2 Digital representation of audio signal


LPCM-quantization performs a roundoff of the value to the nearest LSB. Thus an
error has been introduced. Since the roundoff is random, the error is modeled as a
white noise source called quantization noise. The resulting SNR (signal-to-noise ratio)
is the ratio between the signal level and the quantization noise level. This and a
limitation of the signal bandwidth, are the only fundamental nonidealities of LPCM. It
can be shown that the maximum signal bandwidth is fS/2 (the Nyquist frequency) and
that the maximum SNR is 6.02"N (the ”6dB per bit rule”, applicable for a maximum-
level, random signal)7. The wordlength N is therefore often referred to as the
resolution of the signal.

Since each sample, regardless of it’s value, is represented with N bits, the bandwidth
requirement for transfer of the LPCM-signal will be given by

Eq. 2 B = N ! f s [bits/sec]

7
The Nyquist theorem and the 6dB per bit rule are explained in appendix 2, ”Data converter
fundamentals”.

13
For CD-audio the sample frequency is 44.1kHz, the resolution is 16 bits and there is
two channels to transfer. Then the total bandwidth requirement B will be

Eq. 3 B=16bits ! 44,100hz ! 2 = 1, 411, 200bits / sec

This number does not depend on the actual value of the samples, it depends on the
number of possible values they can have, the resolution. Thus it is natural to assume
that one could reduce the bandwidth by using a coding scheme where the code-length
depends on the actual values rather than the resolution.

Since the signal from an audio source is unknown (not deterministic) it must be
described using information theory. It can be shown that the average binary
information value of a sample S is quantifiable as

Eq. 4 Average information = !log 2 ( p(S))bits ;[reference 1]

where p(S) is the probability of the value S occuring. A measurement of the binary
information content of a statistically independent source derived by this is it´s entropy
H(s), given by the equation

n
Eq. 5 H(s) = " ( pi ! log 2 ( p1i )) ;[reference 1]
i=1

In which pi is the probability that the value i occurs. The entropy is in other words a
probability-weighted average of the information. If we look at a signal uniformly
distributed over all possible values within CD-audio, from i=-(215) to i=(215-1), the
entropy is

215 !1
Eq. 6 H (s) = ! # 2 !16 " log 2 (2 !16 ) =16bits
i = !(215 )

This is hardly surprising. When you quantize to 16-bits, what you really do is to
assume that each sample can have any value between –(215) and (215-1). The
probability of any given value to occur then is 2-16. As equation 6 shows this
corresponds to a uniform distribution between the two limit values.

When we know that the entropy gives us the average information content of a signal
we can use this to draw a some important conclusions:

- The entropy tells us how many bits the data will use when coded ideally (if
the coding does not remove any information and also contains no
unnecessary data it is ideal)
- The difference between the entropy and the coded binary wordlength tells
us how much redundancy there is in the coding scheme.

14
When quantizing to LPCM-code you assume that you have no knowledge about the
signal, except that it can have any given value between a minimum and a maximum.
You assume random values or in other words a uniform distribution. The question is
whether or not music actually has such a distribution, or if the entropy in reality is
smaller and we are coding with redundancy.

In practice the music signal almost always has a probability distribution that is closer
to a Laplacian one than a uniform one. In figure 3, a histogram is shown of a 30
seconds excerpt from the music track ”Voodoo Chile”, a recording of late guitar
legend Stevie Ray Vaughan. The histogram is made in MatLab. It shows that a
overwhelming majority of the samples have quite low values.

Figure 3 Histogram of samples in Stevie Ray Vaughan, ”Voodoo Chile” wav-file

The histograms show the left channel (upper) and right channel (lower). As one can
see, they are very similar and much closer to a Laplacian than a uniform distribution.
A script was made in MatLab [appendix 7] which reads a music-file and calculates the
entropy using equation 5. For the excerpt of ”Voodoo Chile” it gave the results shown
in 7 and 8.

Eq. 7 H (SRVvoodoo.wav, L) = 13.62bits

Eq. 8 H (SRVvoodoo.wav, R) = 13.65bits

Since practically all music has a distribution similar to the one shown in figure 3 one
can make good assumptions of its probability distribution and therefore code it in

15
ways that in almost all cases gives less redundancy than the uniform LPCM-variant.
In addition one can change the representation of the signal to reduce the entropy
further. These techniques makes up the basis for all types of compression of audio
signals. If the compression only removes redundant data and not information, it is said
to be lossless. The other type, lossy coding, tries to find and remove any information
that is unnecessary. For audio data, models of the human auditory systems are used to
find and remove information that we can’t here even when it’s there.

2.2 Lossless compression of audio


Lossless compression is based on representing the signal in a way that makes the
entropy as small as possible and then to employ coding based on the statistical
properties of this new representation (entropy coding). The former is made possible
by the fact that music in reality is not statistically independent, there is correlation in
the signal. By using techniques to decorrelate the signal one can reduce the amount of
information (and thus obtain a smaller entropy) without loss, since the deleted
information can be calculated and put back in the signal by exploiting the correlation
with the data that is retained.

Entropy coding is based on giving short codes to values with a high probability of
occurrence and longer codes to the values with lower probability. Then, if the
assumptions of probabilities are correct, there will be many short codes and few
longer ones.

Figure 4 Basic principles of lossless audio compression

Figure 4 shows a block schematic of how audio is compressed. Framing is to gather


the audio stream in blocks so it can easily be edited. The blocks often contain a header
that gives the decoder all necessary information. Decorrelation is done using a
mathematical algorithm. This algorithm should be effective, but not to
computationally complex, while entropy-coding can be done in several different ways
explained later.

2.2.1 Framing
In most lossless compression algorithms, the data is divided into frames before
compression. If the prediction or encoding is adaptive, information about what
parameters are used has to be sent with the audio data in the shape of a header. To
send this header with each sample will give too much data overhead, thus frames are
used instead. Over the duration of a frame, the same parameters are used for
compression and it only needs one information block, for obvious reasons called the
frame header.

16
The application will determine how big each frame is. If the frames are small, it will
compromise the bandwidth reduction since the number of headers, which also use
data space, will increase. If the frame is too large, the same parameters will have to be
used over many samples for which they might not be ideal, and this will again reduce
the compression ratio. Determining the frame size is often a question of trying and
evaluating. There is no absolute answer to what is the best framesize, one just has to
find a resonable tradeoff. It is generally sensible to make the framesize a multiple of
the wordlength so a fixed number of samples fit within one frame. The most usual in
existing algorithms is 576-1152 samples [reference 2], but this can to a large extent be
adjusted to the intended application.

2.2.2 Decorrelation

2.2.2.1 Inter-channel decorrelation

As mentioned correlation in the signal can be exploited to remove redundancy. In


figure 3 one can see that the left and right channels are very similar. For stereo
recordings there often exists correlation beween the two channels because the
soundstage is panned between the two speakers. To remove redundancy the
representation of the signal using L and R can be replaced with a representation using
M and S, where M (mutual) is the average of the two channels and S (side) is the
difference between them. Then correlation will be removed while the information
remains intact. M and S are given by equations 9 and 10.

L+R
Eq. 9 M=
2
Eq. 10 S = L ! R

For the file ”Voodoo Chile” the histograms for M and S are as shown in figure 5.

17
Figure 5 Histogram of mutual and side, "Voodoo Chile", 30s excerpt

As we can see S has many more small values than L or R. It should be evident by
looking at equation 5 that the entropy of S should be smaller than that of L or R. The
script that calculates entropy gives the following results for M and S:

Eq. 11 H (SRVvoodoo.wav, Mutual) = 13.60bits

Eq. 12 H (SRVvoodoo.wav, Side) = 12.47bits

As we can see the information amount has been reduced. Still it’s easy to calculate L
and R in the decoder by using M and S. Redundancy due to inter-channel correlation
has been removed without losing any information.

2.2.2.2 Intra-channel decorrelation

In addition to correlation between the channels, there is also a varying degree of


correlation between the samples within a channel (autocorrelation). The signal kan be
decorrelated and the entropy reduced by the means of prediction. Prediction is to
approximate the next sample using the previous ones and transmit the error instead of
the original signal. If there is a significant extent of autocorrelation, the approximation
will be good and the errors will then be small. When the reciever or decoder knows

18
what type of approximation is used and also knows the error, it can calculate it’s way
back to the original values and the information will be regained without loss. A model
for the predicion process is shown in figure 6.

Figure 6 Prediction model [reference 2]

The easiest way to understand this is by looking at the simplest prediction possible: to
assume that the current sample has the same value as the last one. In other words

Eq. 13 xˆ[n] = x[n ! 1]

Then the error will be

Eq. 14 e[n] = x[n] ! xˆ[n] = x[n] ! x[n ! 1]

Simply the difference between the two adjacent samples. If there is absolutely no
correlation between them, e[n] will have a totally random value from time to time or a
uniform probability distribution. However, if there is correlation it is likely that the
error e[n] will be small and the entropy will then be reduced. It is also evident that
when the decoder knows what the difference between one sample and the next is, it
just needs an initial value to be able to calculate every sample with no other input than
e[n]. To check if the entropy really is decreased, the simple prediction from equation
13 was performed on the excerpt of the music file ”Voodoo Chile”. The result e[n] is
shown in figure 7.

19
Figure 7 Histogram, prediction error e[n], "Voodoo Chile", 30s excerpt

It’s easy to see that the prediction error in general has much smaller values than the
actual signal shown in figure 2. A calculation of the entropy gives the result shown in
equations 15 and 16.

Eq. 15 H (SRVvoodoo.wav, ErLCH ) = 10.81bits

Eq. 16 H (SRVvoodoo.wav, ErRCH ) = 10.94bits

As the calculations clearly proves, even a simple prediction gives a significant


reduction of the entropy in the music file, so there is definetely some autocorrelation
in the signal. More advanced prediction methods will however be able to give even
greater improvement.

2.2.2.2.1 Linear prediction

If you take a closer look at the simple prediction given by equation 14, you will see
that a signal flow chart will be like the one in figure 8.

20
Figure 8 Signal flow chart, difference prediction

It becomes evident by looking at it that the figure actually shows a first-order FIR
high-pass filter. So difference prediction and first order high-pass-filtering are the
same. This is logical when one considers what the prediction actually does. If the
freqency is low, the difference between adjacent samples, which is the output of the
predictor, is small. If the frequency is high, the differences are large. This is clearly
high-pass filtering. It’s then obvious that more advanced prediction algoritms must be
based on higher order filters. First to third order FIR-prediction is shown in table 1.

Table 1 Higher-order FIR-prediction [reference 2]


Order Transferfunction Prediction-value
1. H(z) = 1-z-1 xˆ[n] = x[n ! 1]
xˆ[n] = 2 x[n ! 1] ! x[n ! 2]
-1 2
2. H(z) = (1-z )
3. -1 3
H(z) = (1-z ) xˆ[n] = 3 x[n ! 1] ! 3 x[n ! 2] + n[n ! 3]

In addition to higher order filtering, past values of the error can also be used for
prediction, in other words IIR-prediction. However, since implementing prediction of
very high order FIR- or IIR-filters is beyond the capability of the hardware used in the
WLS, this thesis will not deal with such in any greater detail.

A general schematic for all filter predictors is shown in figure 9.

Figure 9 General filter-based prediction [reference 2]

21
Q denotes quantization of the filter output to the same wordlength as the original
signal. The figure depicts the equation

&M N #
Eq. 17 e[n] = x[n] ! Q'" aˆ k x[n ! k] ! " bˆk e[n ! k]$ ;[reference 2]
( k=1 k=1 %

The quantization operation makes the predictor a nonlinear predictor, but since it is
done with 16-bits precision, it is resonable to neglect the effects it has on the level of
compression. This quantization is necessary in lossless codecs since we want to be
able to reconstruct x[n] exactly from e[n] and possibly on a different machine
architecture [reference 2]. Since the same quantization is done in the decoder’s
inverse filter, the reconstruction is still exact i.e. lossless.

A MatLab-script was developed which implements the general prediction shown in


figure 9 and calculates histogram and entropy [appendix 7]. The results are shown in
table 2.

Table 2 Entropy with FIR-prediction, first to third order, ”Little Wing”, 30s excerpt
Order Entropy left channel Entropy, right channel
1. 10.81 bits 10.94 bits
2. 10.38 bits 10.29 bits
3. 10.34 bits 10.34 bits

It is clear that the gain in entropy reduction decreases rapidly when the order
increases. Thus a prediction of very high order is probably not worth the extra
computationally complexity. Another MatLab script was written to examine the
effectiveness of different prediction orders when inter-channel decorrelation is
included. The results are presented in figure 10.

22
Figure 10 Entropy vs. predictor order, fixed FIR predictor

As we can see, there is a huge gain from no prediction to first order prediction. Also,
there is a clear improvement from first order to second order. After that, the gain is
small, and in some cases, a higher order predictor even gives worse results. This
underlines the conclusion that a very high order fixed predictor is unlikely to produce
results that are worth the extra cost in complexity.

2.2.2.2.2 Adaptive prediction

Although a fixed predictor can yield significant reduction in the entropy, it is evident
that it will not be optimal for every combination of input signals. For instance, when
the difference between adjacent samples is large, the difference predictor will provide
a poor result. Many good predictors are adaptive which means that they adjust to the
input signal. To illustrate how this work, a simple example [reference 5] is used:

In this example, a facor m is used to adjust the predictor, the parameter m varies from
0 to 1024 where 0 is no prediction and 1024 is full prediction. After each prediction,
m is adjusted up or down depending on whether the prediction was helpful or not. For
the example we use a second order predictor (see table 1) and consider an input
sequence x=[2, 8, 24, ?]. Since the predictor is adaptive it uses the value m to
determine the level of prediction and compares the result p[n] with the real value x[n]
to see if the prediction was good and to update m for the next one. Thus, the output
will be:

23
m
Eq. 18 xˆ [n] = x[n] ! p[n]= x[n] ! pF [n]"
mmax

Where pF[n] is a second order fixed predictor pF [n] = 2x[n !1] ! x[n ! 2]. If, in the
example ? = 45 and m = 512 then

512
Eq. 19 xˆ [n] = ?! pF [n]" m = 45 ! (24 " 2 ! 8) " = 25
1024

Since the prediction underestimated the real value, m will be adjusted upwards for the
next run.

On a more general basis, the prediction coefficients aˆ k and bˆ k in equation 17 (the


general formula for all linear predictors) are the ones being adjusted depending on the
input signal. The filters Aˆ (z) and B
ˆ (z) are thus general adaptive filters, for which
many algorithms and methods of realization has been developed.

One of the best known algorithms is the least mean square, or LMS, algorithm where,
at each iteration, the predictor coefficients are updated in a direction opposite to that
of the instantaneous gradient of the squared prediction error surface [reference 3]. A
less computationally demanding algorithm, the exponential power estimation, or EPE,
is also much used. In this, the envelope of the magnitude of the input sequence x[n] is
tracked and used to adapt the prediction [reference 4].

2.2.2.2.3 Polyonimal approximation

Although effective, adaptive prediction is quite demanding computationally and will


slow down a lossless compression algorithm significantly. For the program Shorten
[reference 6], one of the most successful lossless compression applications, an
alternative solution was proposed. It maintains adaptivity somewhat, but compared to
LMS and other schemes it is very simple to implement. The algorithm can be seen as
being ”semi-adaptive” as it does not have sample-to-sample adaptivity, but frame-to-
frame adaptivity instead.

For each sample, four FIR-polynomals are computed. These are:

" xˆ 0 [n] = 0

$ x1[n] = x[n !1]
Eq. 20 # ;[reference 6]
$ xˆ 2 [n] = 2x[n !1] ! x[n ! 2]
$% xˆ 3 [n] = 3x[n !1] ! 3x[n ! 2] + x[n ! 3]

Corresponding to a 0th to 3rd order FIR prediction respectively. An interesting


property of these approximations is that the resulting residual signal,
e[n] = x[n] ! xˆ [n], can be easily calculated as:

24
"e0 [n] = x[n]
$
$e1[n] = e0 [n] ! e1[n]
Eq. 21 # ;[reference 6]
$e2 [n] = e1[n] ! e1[n !1]
$%e3 [n] = e2 [n] ! e2 [n !1]

No multiplications are needed and the cost in extra resources is small. For each frame,
the four residuals e1[n], e2[n], e3[n] and e4[n] are computed as well as the sums of the
absolute values of these residuals over the complete frame. The residual with the
smallest sum magnitude is then defined as the best approximation for this frame, and
sent to the entropy encoder. In figure 11, this principle is illustrated.

Figure 11 The four polynomal approximations of x[n] [reference 2]

Since the approximator selects the best predictor for each frame, the structure can be
said to be frame-adaptive. It yields a significant improvement over fixed predictors at
a low computational cost. However, since four sets of residuals need to be saved, as
well as variables containing the absolute value of the sums, the memory usage
increases. But this principle does not have to be locked to four polynomals as used in
Shorten, one can for instance calculate and choose the best between the 0th order and
1st order predictions or maybe the 1st order and the 2nd order. This would have to be
decided depending on the compression ratio requirement and the available resources
in form of processing power and memory.

25
2.2.3 Entropy-coding
As mentioned, lossless or entropy-based compression ignores the semantics of the
data, it is based purely on the statictics of the data content. These statistics can be the
frequencies of occurrence for different symbols or the existence of repetitive
sequences of symbols (in information theory, ”symbol” is often used even if it in the
case of digital audio in reality is sampled values). For the former, statistical
compression which assigns variable-length codes to symbols based on their
frequencies of occurrence is used. For the latter, repetitive sequence encoding, like for
instance run-length encoding, is the simplest option

2.2.3.1 Run-length encoding (RLE)

In some applications it is normal to have long sequences of repeating values or


symbols. For instance, in recordings of conversations it is common for there to be
pauses when nobody is talking. In still images it is not unusual for large areas to have
the same color. All of these situations have the same feature in their stream of
samples; long, identical sequences. Many bits are used to send a relatively small
amount of information.

The idea of run-length encoding is to replace long sequences of identical values with a
special code that indicates the value to be repeated and the number of times which to
repeat it. As an example a text file with the input string: ”aaaaaaabbbbbaaaabbaaa”
will be replaced with ”7a5b4abb3a”. As we can see, the coding is only effective, and
thus only used, on runs greater than 3 samples.

Since there in audio playback is relatively few repeating strings to be found (in music
long identical sequences usually only appear in pauses), the effectiveness of RLE-
coding in itself is very limited. However, it can be used as a step in more elaborate
compression schemes.

2.2.3.2 Huffman-coding

As shown earlier, a coding based on linear quantization, where every sample with a
possible value between 0 and 2B (or –2B-1 to 2B-1-1) is represented by B bits, is not the
most space-efficient coding scheme, simply because some values are more common
than others. As the histograms has shown, in recorded audio small values are much
more frequent, thus it is inefficient to code using a fixed number of bits large enough
to contain even the biggest possible number. Huffman-coding uses a variable-length
representation where short codes are assigned to the most frequent values and longer
codes to the ones that appear more rarely. Huffman-coding can be shown to be
optimal only if all probabilities are integral powers of 1/2, however it still yields
significant improvement over normal LPCM-code even in audio applications.

Since the number of bits per symbol is variable, in general the boundary between
codes will not fall on byte boundaries, there is no built-in ”decimation” between
symbols. One could add a special ”marker”, but this would waste space. Instead, a set

26
of codes with a prefix property is generated, each symbol is encoded into a sequence
of bits so that no code for a symbol is the prefix of the code for any other. This
property allows decoding of a bit string by repeatedly deleting prefixes of the string
that are codes for symbols. The prefix property can be assured using binary trees. An
example [reference 7] will be used to show how it’s done.

Table 3 Two example binary codes [reference 7]


Symbol Probability Code 1 Code 2
1 0.12 000 000
2 0.35 001 11
3 0.20 010 01
4 0.08 011 001
5 0.25 100 10

Two example codes with the prefix propery are given in table 3. Decoding code 1
(standard binary code) is simple, as we can just read three bits at a time (for example
”001010011” is decoded to 2,3,4). For code 2, we must read one bit at a time so that,
for instance, ”1101001” would be read as ”11”=2, ”01”=3 and ”001”=’4’. Clearly, the
average number of bits per symbol is less for code 2 (2.2 vs. 3, for a data reduction of
27%).

When a set of symbols and their probabilities is known, the Huffman algoritm lets ut
find a code with the prefix propery such that the average length of code for each
symbol is a minimum. The basic principle is that we select the two symbols with the
lowest probabilities (in table 3; 1 and 4) and replace them with a symbol s1 that has a
probability equal to the sum of the original two (in the example, 0.20). The optimal
prefix for this set is the code for s1 with a zero appended for 1 and a one appended for
4. This process is repeated until all symbols have been merged into one symbol with
probabillity 1.00. This is equivalent to constructing a binary tree from the bottom up.
To find the code for a symbol, we follow the path from the root to the leaf that
corresponds to it. Along the way, we output a zero every time we follow a left link
and a one for each right link. If only the leaves of the tree are labeled with symbols,
then we are guaranteed that the code will have the prefix property (since we only
encounter one leaf on the path from the root to the symbol). An example code tree
(for the code in table 3) is shown in figure 12.

27
Figure 12 Binary tree with prefix property code (code 2 from table 3)

To compress a signal, we build a Huffman-tree (there are more efficient algorithms


which don’t actually build the tree) and then produce a look up table (like table 3) that
allows us to generate a code for each symbol, - or decode the symbol in the
decompression program. This table must of course be sent with the compressed signal
(or stored in the compressed file) so the decoder can access it. It can alternatively be
present in the decoder if (and only if) it is fixed for any input signal.

Huffman coding is clearly a bottom-up approach. It can be summarized in the


following steps:

1. Initialization: put all nodes in an OPEN list, keep it sorted at all times (e.g.
12345).
2. Repeat until the OPEN list has only one node left:
a. From OPEN, pick the two nodes having the lowest frequencies, create
a parent node of them.
b. Assign the sum of the childrens frequencies to the parent node and
insert it into OPEN
c. Assign code 0, 1 to the two branches of the tree and delete the children
from OPEN.

Since the probabilities are usually estimates used for weighting of the different
symbols (the source is not deterministically known), they are expressed as a list of
weights {w(1), ... ,w(n) } where ∑w(n) for all n is 1. The Huffman-coding in reality
then is a merging of weights and the Huffman tree is usually depicted as shown in
figure 13.

28
Figure 13 General depiction of Huffman-tree, seven symbols W1-W7

As we can see there is a total of seven symbols arranged after weighting with W1 as
the smallest.

Mathematical analyzis of the Huffman-encoding is very complex and will not be


included in this thesis. However, a few of it’s more important properties should be
mentioned (the interested reader is referred to reference 9 for more details): The
Huffman-mapping can be generated in O(n) time, where n is the number of messages
in the source ensemble. The algoritm maps a source message a(i) with probability p to
a codeword of length l (-log(p) ≤ l ≤ -log(p)+1). Encoding and decoding time depend
upon the representation of the mapping. It the mapping is stored as a binary tree, then
decoding the codeword for a(i) involves following a path of length l in the tree. A
table indexed by the source messages could be used for encoding, the code for a(i)
would be stored in position I of the table and encoding time would be O(l). It can also
be shown that the redundancy bound for Huffman coding is p(n)+0,086, where p(n) is
the probability of the least likely source message [reference 9]. This does not include
the cost of transmitting the code mapping, which can be significant (up to 2n bits). If
the transmitter and receiver agrees on the code mapping the real overhead can be
significantly reduced (the tables are stored both in sender and receiver and not
transmitted, as mentioned above). But this is at the cost of less optimal coding.

29
2.2.3.3 Adaptive Huffman coding

The basic Huffman algorithm clearly requires a statistical knowledge of the data
which is often unavailable. For audio playback it is definetely not available, although
as the histogram examinations show, an estimation can be done that will make
Huffman coding quite effective in most cases (the prediction residuals can be
estimated well with a laplacian probability density function - high probalility for
small values, exponentially decreasing probability as the values increases). But even
if it is available, there could be a heavy overhead, especially when many tables has to
be sent because a non-zero-order model is used (i.e. taking into account the impact of
the previous symbol to the probability of the current symbol).

The adaptive Huffman algorithms determine the mapping of source messages to


codewords based upon a running estimate of the source message probabilities. The
code is adaptive, changing to remain optimal for the current estimates. In essence, the
encoder is ”learning” the characteristics of the source. The decoder must learn along
by continually updating the Huffman tree to stay in synchronization with the encoder.

The most frequently used adaptive Huffman algorithm is the FGK-algoritm [reference
9] which is based on the sibling propery. A binary code tree has the sibling property if
each node (except the root) has a sibling and if the nodes can be listed in order of
nonincreasing weight with each node adjacent to its sibling. It can be proved that a
binary prefix code is a Huffman code only if the code tree has the sibling property.

In the algorithm, both sender and receiver maintain dynamically changing Huffman
code trees. The leaves of the code tree represent the source messages and the weights
of the leaves represent frequency counts for the messages. At any point in time, k of
the n possible source messages have occurred in the message ensemble.

To illustrate the algorithm, an example [reference 9] is shown using a message


containing a string of characters (it is much simpler to illustrate with characters than
with 16-bit audio codewords).

Eq. 22 EX = aa bbb cccc ddddd eeeeee fffffffgggggggg

Initially, the code tree consists of a single leaf node, called the 0-node. The 0-node is
a special node used to represent the n-k unused messages. For each message
transmitted, both parties must increment the corresponding weight and recompute the
code tree to maintain the sibling propery.

30
Figure 14 Algorithm FGK processing the ensemble EX: (a) Tree after processing "aa
bb"; 11 will be transmitted for the next b. (b) After encoding the third b; 101 will be
transmitted for the next space; the tree will not change; 100 will be transmitted for
the first c. (c) Tree after update following first c. [reference 9]

At the point in time when t messages has been transmitted, k of them distinct, and
k<n, the tree is a legal Huffman code tree with k+1 leaves., one for each message and
one for the 0-node. If the (t+1)st message is one of the k already seen, the algorithm
transmits a(t+1)’s current code, increments the appropriate counter and recomputes
the tree. If an unused message occurs, the 0-node is split to create a pair of leaves, one

31
for a(t+1), and a sibling which is the new 0-node. Again the tree is recomputed. In
this case, the code for the 0-node is sent; in addition, the receiver must be told which
of the n-k unused messages have appeared. At each node a count of occurrences of the
corresponding message is stored. Nodes are numbered indicating their position in the
sibling property ordering. The updating of the tree can be done in a single traversal
from the a(t+1) node to the root. This traversal must increment the count for the
a(t+1) node and for each of its ancestors. Nodes may be exchanged to maintain the
sibling property, but all of these exchanges involve a node on the path from a(t+1) to
the root. The final code tree for the example is shown in figure 15.

Figure 15 Complete Huffman-tree for example EX

The Adaptive Huffman coding basically updates the Huffman-tree for every new
occurrence of a symbol, since it’s frequency then increases. It is in many cases more
effective, produces less overhead (n·log(n) as compared to 2n for the static Huffman
code). However it is more demanding computationally. It is proved that the time
required for each encoding og decoding operation is O(l) where l is the current length
of the codeword.

32
2.2.3.4 Rice-coding

Although Huffman-coding is very common in compression algoritms, some of it’s


properties are not ideal for encoding of audio signals. The Huffman-table has to be
stored, which increases the memory-usage, adaptive Huffman-coding is
computationally demanding and a fixed Huffman-table can behave very poorly if it
does not correspond well to the distribution of the incoming signal. The concept of
Rice-coding has therefore become widespread in lossless audio (and video) codecs. It
has a high efficiency and is very simple to implement. Another attractive feature is
that there is no need to store any code tables.

Generalized Rice-coding is based on two steps, Rice preprocessing followed by run-


length encoding using Rice codes, also called Golomb-power-of-2 (GP2) codes. Rice
coding takes advantage of the fact that music usually has a exponentially decreasing
probability function with the highest probabilites for small numbers. It uses few bits
to represent smaller numbers while still maintaining the prefix property. Explained in
words, the algoritm works as follows:

1. Make a guess as to how many bits a number will take and call that k.
2. Store the rightmost k bits of the number in their original form.
3. Imagine the binary number without there k rightmost bits, this is the
overflow that doesn’t fit in k.
4. Encode this value with a corresponding number of zeros followed by a
terminating ’1’ to indicate the end of the encoded overflow.

The code will then consist of:

1. Sign bit (1 for positive, 0 for negative8)


2. n/(2k) zero’s
3. terminating 1
4. k least significant bits of the number.

As an example, if n=578 (”01000010”) and k=8; then sign = ’1’, n/(2k) = 578/256 = 2
= ”00”, terminator = ’1’, k least significant bits = ”01000010”.

Eq. 23 (578)RICE = ”100101000010”

while, as a comparison

Eq. 24 (578)16-bit PCM = ”1000000001000010”

As we can see 4 bits are saved. It’s also obvious from looking at the algorithm that for
this to work, absolute values must be used.

8
The same as for LPCM, but if desired, the opposite sign representation can of course also be used

33
It is clearly apparent that a good estimation of k is necessary, if not the number of
zeros (n/(2k)) will be large and the code will be ineffective. The optimum k is
determined by looking at the average value over a number of past samples (16-128 is
normal, this is a speed vs. efficiency trade-off) and choosing the optimum k for that
average. The optimum k can be calculated as:

log(n avg )
Eq. 25 kopt = ;[reference 5]
log(2)

2.2.3.4.1 Calculating the parameter k

By looking at the algoritm it is evident that the crucial step is the calculation of the
parameter k. The exhaustive method of calculating the average of a large number of
past samples and employing formula 25 is computationally demanding.
Overcompensating by using very few samples will increase the redundancy since
there is a larger possibillity of k being far from optimal. During the development of
the JPEG-LS (JPEG Lossless) image compression standard [reference 10] an
alternative and much simpler method was proposed. However, understanding this
demands a more formal expression of the Rice algorithm.

Given a positive integer parameter m, the Golomb code Gm encodes an integer n ≥ 0 in


two parts, a binary representation of (n mod m), and a unary representation of (n div
m). Golomb codes are optimal for exponentially decaying (geometric) probability
distributions of the nonnegative integers, i.e. distributions on the form Q(n) = (1-#)#n,
where 0<#<1. For every distribution of this form, there exists a value of the
parameter m such that Gm yields the shortest possible average code length over all
codes for the nonnegative integers. The optimal value of m is given by

log(1+ !)
Eq. 26 m = ;[reference 10]
log( ! "1 )

A special case of the Golomb codes is when m = 2k. If m is a power of two, the code
for n consists of the k least significant bits of n, followed by the number formed by the
remaining higher order bits of n, in unary representation. This is exactly the same
representation as described above (minus the sign bit, as this derivation assumed n ≥
0), thus G2 k - codes are the same as Rice-codes as described. It also becomes apparent
why they are called GP2-codes. To match the assumption of a two-sided
exponentially (laplacian) distribution of the prediction residuals to the optimality of
Golomb-codes for geometric distributions, the predicion residuals $ in the range -%/2
& 0 & %/2-1 are mapped to values M($) in the range 0 & M($) & %-1 by:

$ 2! !"0
Eq. 27 M(!) = % ;[reference 10]
&2 ! #1 !<0

34
If the values $ follow a laplacian distribution centered at zero, then the distribution of
M($) will be close to (but not exactly) geometric, and can then be encoded using an
appropriate Golomb-Rice code9.

As mentioned, the original Rice-algorithm uses a sequential approach to calculate the


optimal value for k, using an average of a number of past values. The method
proposed in JPEG-LS is based on an estimation of the expectation E[|$|] of the
magnitude of prediction errors in the past observed sequence. This results in a very
simple calculation of k.

In a discrete laplacian distribution P($)=p0'|$| for prediction residuals are in the range
-%/2 & 0 & %/2-1 where 0<'<1 and p0 is such that the distributions sums to 1, the
expected prediction residual magnitude is given by

" / 2%1
Eq. 28 a! ," E [ $ ] =
#
&p ! 0
$
$ ;[reference 10]
$ =%" / 2

We are interested in the relation between the value of a',% and the average code length
L',k resulting from using the Golomb-Rice code Rk on the mapped prediction residuals
M($). In particular, we seek to find the value k yielding the shortest code length. It can
be shown [reference 11] that a good estimate for the optimal value of k is

Eq. 29 k = [log 2 a! ," ] ;[reference 11]

In order to implement this estimation, the encoder and decoder maintain two variables
per context: N, a count of prediction residuals seen so far and A, the accumulated sum
of magnitudes of prediction residuals seen so far. The expectation a',% is estimated by
the ratio A/N and k is computed as

{
Eq. 30 k = min k' 2 k' N ! A } ;[reference 10]

In software, the computation of k can be realized with one line in C

for( k = 0; (N << K) < A; k + +); ;[reference 10]

9
To do this with a two’s complement representation is very simple, one left shift for positive values
and inverting the sign bit for negative values.

35
2.2.3.5 Pod-coding, a better way to code the overflow

Standard Rice-coding is very inefficient when the value k is not ideal. Any overflow
Ov that does not fit in the k-bit binary coded part is unary coded with Ov zeros
followed by a one. If these numbers are large, the code length will be very long and
the efficiency will suffer. An alternative is to use the Rice-preprocessing part of the
Rice algorithm (find a value k, store the k rightmost bits unchanged and encode the
owerflow), but to use another method to encode the overflow remainder [reference
12]. A code suited for this is the Pod-code10. Instead of using Ov zeros, the Pod-code
works as follows:

1. For 0, send 1
2. For 1, send 01
3. For 2-bit number 1Z, send 001Z
4. For 3-bit numbers 1YZ, send 0001YZ
5. For 4-bit numbers 1XYZ, send 00001XYZ etc.

It is no problem for the decoder to know how many bits WXYZ… to expect, it is one
less than the number of 0s which preceeds the 1. Thus, the prefix propery is
maintained. An integer of B significant bits encoded using the Pod-code is represented
in max 2B+1 bits, while the standard Rice-code will use 2B+1 bits. A comparison is
shown in table 4 (sign-bit is omitted for clarity).

Table 4 Pod-codes vs. Rice-codes


Overflow Binary Pod-code Rice-code Benefit in
value bits
0 00000 1 1 0
1 00001 01 01 0
2 00010 0010 001 -1
3 00011 0011 0001 0
4 00100 000100 00001 -1
5 00101 000101 000001 0
6 00110 000110 0000001 1
7 00111 000111 00000001 2
8 01000 00001000 000000001 1
9 01001 00001001 0000000001 2
10 01010 00001010 00000000001 3
11 01011 00001011 000000000001 4
12 01100 00001100 0000000000001 5
13 01101 00001101 00000000000001 6
14 01110 00001110 000000000000001 7
15 01111 00001111 0000000000000001 8
16 10000 0000010000 00000000000000001 7
17 10001 0000010001 000000000000000001 8
18 10010 0000010010 0000000000000000001 9
19 10011 0000010011 00000000000000000001 10
20 10100 0000010100 000000000000000000001 11

10
The code described is a variant of the Elias-'-code, which itself is a variant of the Elias group of
codes, these will not be investigated in any further detail in this report. P. Elias: ”Universal Codeword
Sets and Representations of the Integers”, IEEE Transactions on Information Theory, is recommended
to the interested reader.

36
As the table shows, the gain when coding overflow values larger than 5 is positive.
When the parameter k is more than three bits off, Pod-coding will give better results
than Rice-coding. The potential loss in efficiency is small, just one bit inferior
performance for the overflow values 2 and 4.

2.3 Lossy compression of audio

Lossy compression is based on using psychacoustic models to find and remove


information that is not perceptible to the human auditory system. It’s therefore often
referred to as perception-based compression. There are many methods available,
whose complexity and quality vary a lot. The best systems may provide close to CD-
quality even with high compression ratios (10:1 or more), but they are complex and
require fast processors or custom-made hardware (ASICs).

This section will contain a quick introduction to the human auditory system, with
emphasis on the aspects relevant to perception-based compression. Then the relevant
compression methods will be introduced and explained

2.3.1 The human auditory system


The auditory system is probably the most complex and sensitive part of the entire
human anatomy. With a dynamic range of 120dB and a spectral range of 10 octaves it
can detect and process an extremely wide range of stimulus and our ability to hear
even the smallest of differences has impressed scientists for ages and continues to do
so. Figure 16 shows a cross-section of the auditory system.

Figure 16 The human auditory system

37
In the outer ear we have the ear itself and an external auditory canal, leading to the
eardrum. The eardrum is a membrane which resonates as air pressure varies. To
maintain pressure equality on the two sides, we have a canal (the eustachian tube)
leading down to the nose. Inside the eardrum, in the middle ear, we have three bones
functioning as a mechanical transformer. These three bones, the hammer, the anvil
and the stirrup, are the smallest bones in the entire human body. They connect the
eardrum to the oval window, the ”entrance” to the cochlea. The cochlea is a fluid-
filled chamber where resonances in the oval window are processed. Inside the
cochlea, the basilar membrane transports the resonances. A cross section is shown in
figure 17.

Figure 17 Cross-section of the cochlea

The basilar membrane is connected to the outer haircells which transforms resonances
into neural signals. The inner haircells provide a feedback to increase sensitivity. An
interesting propery of the cochlea is that it works as a spectral filter bank. High
frequencies excite resonances in the outer part, close to the oval window, while lower
frequencies excite resonances further inside. Thus different haircells transports
different frequencies and the system works like a bank of filters. The response might
look like shown in figure 18.

38
Figure 18 Cochlea filter response

In figure 18, the frequency axis is denoted ”Bark”. The bark-scale is a standardized
scale where each ”Bark” constitutes one critical bandwidth. The Bark-scale is defined
as a table, but good mathematical approximations have been done [reference 19]. The
critical bandwidth is defined as the width of a noise band beyond which
increasing the bandwidth does not increase the masking effect imposed by the
noise signal upon a sinusoid placed at the center frequency of the band. Which
leads to the concept of masking. A dominating tone will render weaker signals
inaudible. The distance in frequency between the “masker” and the masked
sound decides how loud the inaudible sounds can be (down to one critical
band). This is known as the masking threshold.

Figure 19 Masking threshold

39
We are not able to hear anything below the masking threshold and this is what
perceptual audio algorithms exploit; if we can’t hear it, it can be removed. The
signal is divided down to small frequency bands using a filter bank. Then,
within each band, the signal can be quantized down until the noise level is just
below the masking threshold. As figure 19 shows, high noise levels are
allowable within each band and very significant data reduction can be
achieved. Furthermore, we see that the sensitivity of the ear is lower in the bass
and treble range than in the midrange (1-5kHz). The frequency-dependent
sensitivity of the hearing is quantified by the Fletcher-Munson diagram and
was proved in 1933. As a result, a lowering of the resolution would give
smaller degradation in sound quality if done in the bass and treble than in the
midrange. The Fletcher-Munson diagram, given in figure 20, also shows the
sensitivity is dependent on the loudness. The curves, called equal loudness
curves, show what sound pressure level we perceive being of a certain
loudness. The perceived loudness is denoted phon.

Figure 20 The Fletcher-Munson curves (equal loudness curves)

In addition to masking in the frequency domain, we also experience temporal


masking. In the moments after being “hit” by a loud sound, the ear is less
sensitive than normally. This can also be exploited by allowing for a higher
quantization noise for a short time following a loud transient.

40
Figure 21 Temporal masking

Fascinating as it might be, the human auditory system has flaws that can be used to
reduce the amount of data without compromising audio quality. In general, lossy
compression algorithms introduce some degree of sonic degradation, how perceptible
it is depends on the application (high-end hifi-system or cheap computer speakers),
the level of compression and of course how good the algorithm is.

2.3.2 Lossy compression algorithms


There are many lossy compression algorithms available, ranging from the very simple
to the very sophisticated. For small embedded systems, DPCM and ADPCM are the
ones mostly used. These are simple algorithms, but do not allow much data reduction
without significantly compromising audio quality. Other much-used algorihms in the
same category is µ-law (pronounced ”my-law”) and a-law, known from digital
telephone systems.

Recent advances in processing capability of home computers and digital devices (like
ASICs, DSPs and FPGAs) has however pushed the development of much more
sophisticated systems. The spearhead of this development has been the Motion Picture
Expert Group (MPEG) that made the basis framework for the current standard, MP3,
as well as other up-and-coming systems. However, other vendors like Microsoft and
Sony have also made their own systems. In recent times, even open-source
alternatives have become competitive, much due to the development of the Ogg-
Vorbis project, now believed to be at least on par with most commercial systems.
Generally, these algorihms allow for a reduction in file-size to 1:10 or less of the
original with minmal quality loss.

41
2.3.2.1 MPEG-based algorithms

The most widespread compression standard is the MP3 or MPEG-1 Layer-3


algorithm, developed by the Motion Picture Expert Group and the Fraunhofer Institute
[reference 13]. Is is based around the concept of masking in sound perception,
explained earlier. In the MP3-system, a filter-bank is used to divide the spectrum into
32 subbands (corresponding closely to the critical bands). Within each subband, the
quantization uses a fewer number of bits, so that the quantization noise is just below
the masking threshold. The subbands are processed in the frequency domain
following a MDCT-transform (Modified Discrete Cosine Transform). It also employs
joint stereo coding and Huffman-coding. The level of compression can be significant,
and good quality is obtained at 128-256kbps. A block-diagram of an MP3-codec is
shown in figure 22.

Figure 22 MP3 encoding and decoding block diagram

Recent advances in processing power and the growing requirement for online
distribution of high-fidelity music has advanced the demand for even more elaborate
compression algorithms. Microsofts Windows Media Audio [reference 14] and Sonys
most recent ATRAC-algorithm [reference 16] use more advanced auditory models
than MP3. Also, the completely free and open-source Ogg-Vorbis [reference 21]
algorithm has gained a reputation for being significantly better than MP3. The
Fraunhofer Institute has however responded by launching AAC or Advanced Audio
Codec [reference 16], a system utilizing the much more sophisticated MPEG-2
compression scheme.

42
Figure 23 AAC compression block diagram

As the figure shows, AAC also uses TNS or temporal noise shaping, intensity stereo,
adaptive prediction and more in addition to the MP3 features. Research show that
AAC allows around 1,4 times better compression ratios than MP3 with the same
audio quality.

It is however apparent that none of these algorithms are suitable for implementation
on a simple MCU. Thus they are not applicable in the wireless loudspeaker system
this report documents. They will therefore not be investigated in any further detail
here.

Much simpler algorithms for lossy audio compression has existed long before the
introduction of MP3 and related systems. Back then processor power was very
limited, which forced quite crude models and calculations to be used. The result was
of course vastly inferior to modern systems, but in our application the required
compression ratio is very small (approximately 2:1), which makes high-fidelity
reproduction possible with much simpler schemes. While MP3 or other internet-audio
based algorithms must deliver almost CD-quality audio at 128kbps or even lower, we
can tolerate a system which is inferior at that bitrate, as long as it’s transparent11 at the
1Mbps (including overhead) the CC2400 RF-transceiver allows.

11
In the digital audio vocabulary, ”transparent” usually refers to ”no detectable quality degradation”. If
listeners can’t hear the difference between the uncompressed original and the compressed version of
the music in a blind test enviroment, the codec is said to be ”transparent”.

43
2.3.2.2 Differential Pulse Code Modulation (DPCM)

One of the simplest and fastest methods for lossy audio compression is differential
pulse code modulation or DPCM. This algorithm utilizes the fact that the ear is
sensitive for small differences when the volume is low, while, when the volume is
loud, we can not perceive subtle details to the same extent. Since there is no subband-
filtering, the noise level must be below the lowest masking threshold level (see figure
19) at any frequency (as compared to within the subband for algorithms with
filterbanks) for the compression to be transparent. Since the threshold is highly
dependent on the level of the signal, a non-linear quantization is performed, where
the quantization steps are fine for low values and coarse for large values. In addition,
the signal being quantized is the difference between adjacent samples, which have a
smaller probability of large values. As explained earlier, this is equivalent to a first
order predictor where the prediction residuals are the ones being coded. Of course,
more sophisticated predictors can be constructed to decrease the entropy further
before re-quantization. An example [reference 17], showing a 2:1 DPCM compression
(from 8-bit PCM to 4-bit DPCM) is given to illiustrate the algorithm.

Figure 24 DPCM-encoder block diagram [reference 17]

The encoder shown in figure 24 calculates the difference between a predicted sample
and the original sample. To avoid accumulation of errors the predicted sample is the
previously decoded sample. The residual is then quanitized to 4-bits using a non-
linear quantizer and fed to the output. The quantization operation is shown in table 5.
By using 15 values for encoding, the code is made symmetric and a level in the binary
search tree can be omitted.

Table 5 DPCM nonlinear quantization code [reference 17]


Code 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value
Coded 0 -64 -32 -16 -8 -4 -2 -1 0 1 2 4 8 16 32 64
difference

The decoding is very simple. The 4-bit word is requantized to 8-bits using a quantizer
with an opposite transfer function of the one given in table 5. Then the necessary
prediction is done (when the input is the difference between two adjacent samples, the
next output value is obviously the sum of the current output value and the next
difference).

44
Figure 25 DPCM decoder block diagram [reference 17]

One other thing should also be noted when regarding prediction in combination with
requantization: the predicted values are small when the differences between samples
are small and big when the differences are big. Small differences of course means low
frequencies, while big differences mean high frequencies. Thus a noise-shaping is
performed, where the noise is moved up in frequency. When one look at the equal
loudness curves or the masking curve in figure 19, it becomes evident that moving the
noise to high frequencies is a good thing. Also, the total noise effect will decrease
since less energy exists in the high treble range. Actually, prediction is equivalent to
delta-modulation, a technique often used in audio converters (delta-sigma converters)
where a low noise level in the baseband is desirable.

2.3.2.3 Adaptive DPCM (ADPCM)

Adaptive PCM or ADPCM is a further development of DPCM where the quantizer


and/or the predictor is adaptive. This means they are adjusted according to the nature
of the input signal. If the input signal is small, the quantizer steps are small, if the
input signal is large, the quantizer steps are large. This gives less error than the fixed
nonlinear quantizer and the low, constant bitrate can be maintained.

ADPCM is very widespread in telephone communications and speech coding and


many different algorithms exist. The Interactive Multimedia Association (IMA)
recommended a standard for ADPCM-codecs in multimedia applications, known as
IMA or DVI ADPCM, in the early 1990s [reference 18]. This algorithm is now used
in most cross-platform ADPCM-based audio applications. In this report, a general
explanation of the concepts behind ADPCM will be given, while any specifics will be
in accordance with the IMA-standard.

The ADPCM-structure is very similar to the normal DPCM-structure, the difference


being that the quantizer, the predictor or both are adaptive.

45
Figure 26 ADPCM general block diagram [referene 18]

2.3.2.3.1 IMA ADPCM adaptive quantizer

The proposed IMA-standard, now widely used in multimedia applications, uses an


adaptive quantizer, but a fixed predictor to limit the computational complexity. The
predictor is identical to the one showed previously in the DPCM chapter. The
compression level is 4:1, which means the 16-bit original signal is quantized to 4-bits.
The stepsize of the quantization depends on the input signal, thus making it adaptive.
The adaption is based on the current stepsize and the quantizer output of the
immediately previous input. This adaptation is done as a sequence of two table
lookups. The three bits representing the number of quantizer levels serve as an index
into the first table lookup whose output is an index adjustment for the second table
lookup. This adjustment is added to a stored index value, and the range-limited result
is used as the index to the second table lookup. The summed index value is stored for
use in the next iteration of the stepsize adaptation. The output of the second table
lookup is the new quantizer step size. If a start value is given for the index into the
second table lookup, the data used for adaptation is completely deducible from the
quantizer outputs, side information is not required for the quantizer adaptation. Tables
6 and 7 show the table lookup contents.

46
Table 6 First table lookup for IMA ADPCM quantizer adaptation [reference 18]
Three bits quantized magnitude Index adjustment
000 -1
001 -1
010 -1
011 -1
100 2
101 4
110 6
111 8

Table 7 Second table lookup for IMA ADPCM quantizer adaptation [reference 18]
Index Stepsize Index Stepsize Index Stepsize Index Stepsize
0 7 22 60 44 494 66 4026
1 8 23 66 45 544 67 4428
2 9 24 73 46 598 68 4871
3 10 25 80 47 658 69 5358
4 11 26 88 48 724 70 5894
5 12 27 97 49 796 71 6484
6 13 28 107 50 876 72 7132
7 14 29 118 51 963 73 7845
8 16 30 130 52 1060 74 8630
9 17 31 143 53 1166 75 9493
10 19 32 157 54 1282 76 10442
11 21 33 173 55 1411 77 11487
12 23 34 190 56 1552 78 12635
13 25 35 209 57 1707 79 13899
14 28 36 230 58 1878 80 15289
15 31 37 253 59 2066 81 16818
16 34 38 279 60 2272 82 18500
17 37 39 307 61 2499 83 20350
18 41 40 337 62 2749 84 22358
19 45 41 371 63 3024 85 24623
20 50 42 408 64 3327 86 27086
21 55 43 449 65 3660 87 29794
88 32767

Figure 27 shows how the step-size adaptation works based on these two look-up
tables.

Figure 27 IMA ADPCM stepsize adaptation [reference 18]

47
When the quantizer knows it’s stepsize, the quantization is done based on binary
search. Figure 28 shows a flowchart for the quantizer.

Figure 28 IMA ADPCM quantization [reference 18]

The adaptively quantized value is output from the quantizer. Since the lookup-table
can be stored in both the encoder and the decoder, no overhead in form of additional
information exists. Thus the compression ratio is constant and exactly 4:1.

A fortunate side effect of the ADPCM scheme is that decoder errors caused by
isolated code word errors or edits, splices or random access of the compressed bit
stream generally do not have a disasterous impact on the decoder output. Since
prediction relies on the correct decoding of previous samples, errors in the decoder
tend to propagate.

The decoder reconstructs the audio sample Xp[n] by adding a previously decoded
audio sample Xp[n-1] to the result of a signed magnitude product of the code word
C[n], the quantizer stepsize plus an offset of one-half stepsize.

Eq. 31 Xp[n] = Xp[n-1] + stepsize[n-1]"C’[n], C’[n] = 1/2 + C[n] ;[reference 18]

In the second lookup-table each successive entry is about 1,1 times the previous entry.
As long as range limiting of the second table index does not take place, the value of
stepsize[n] is approximately the product of the previous value, stepsize[n-1] and a
function of the codeword F(C[n-1]). The above two equations can be manipulated to
express the decoded audio sample Xp[n] as a function of the stepsize and the decoded
samplevalue at time m, and the set of codewords between time m and n.

48
n #% i '%
Eq. 32 Xp[n] = Xp[m] + stepsize[m]! *%" $ F(C[ j]) ( ! C'[i]
%)
[reference 18]
i= m +1 & j= m +1

Note that the terms in the summation are only a function of the codewords from time
m+1 onwards. An error in the codeword C[q] or a random access entry into the
bitstream at time q can result in an error in the decoded output Xp[q] and the
quantizer stepsize stepsize[q+1]. The above equation shows that an error in Xp[m]
amounts to a constant offset to future values of Xp[n]. This offset is inaudible unless
the decoded output exceeds it’s permissible range and is clipped. Clipping results in a
momentary audible distortion but also serves to correct the offset term. The equation
also shows that an error in stepsize[m+1] amounts to an unwanted gain or attenuation
of future values of Xp[n]. The shape of the output waveform is unchanged unless the
index to the second table is range limited. Range limiting results in a partial or full
correction to the value of the stepsize. The nature of stepsize-adaptation thus limits
the impact of an error in the stepsize.

As mentioned adaptive prediction can be used as well as adaptive quantization. For an


explanation of adaptive prediction, the reader is referred to the chapter about
intrachannel decorrelation in lossless compression, which covers the principles of
prediction.

2.3.2.4 µ-Law

While DPCM and ADPCM are very popular for a few-bit representation, it is not as
suitable for lower compression rates (that is, many bits in the compressed output
values). For instance, an 8-bit DPCM encoder would have to search through a 256-
level quantization code table instead of a 16-level like table 5. With good search
algorithms, the search time can be minimized to twice that of a 4-bit, but still
alternative methods like adaptive quantization will give the same or higher
performance increase with a lower data rate and lower computational complexity.

ADPCM has fairly good performance, but the second lookup table would have to be
very large if a fair amount of adaptation were to be attainable with so many output
levels. In memory-critical applications, like MCU-systems, this is not a good way to
go. However, while IMA ADPCM is standardized for a 4-bit output, µ-law is an
adaptive algorithm developed and standardized for an 8-bit output (there is also a 12-
bit version). Thus it has become very popular in applications where higher bitrates are
allowed, but where the requirement for simple computation still prohibits algorithms
with large complexity. In digital telephony and audio DAT-recorders with a longplay-
option, µ-law is the standard algorithm in use (8-bit for telephony, 12-bit for DAT)12.
Like the DPCM-based algorithms, µ-law is based on fine quantization for low-level
signals and a more coarse quantization for loud levels (when the masking threshold is
also higher). But it uses an alternative approach where it compresses the dynamic
range of the signal during encoding and expands it again when decoding.

12
An alternative standard to µ-law, a-law, is used in some telephone systems. It is similar to µ-law and
has about the same performance. Since it has not been used during the work with this thesis it will not
be presented in any closer detail.

49
The standardized µ-law algorithm performs a 16-bit to 8-bit quantization by
employing the formula

!
Eq. 33 xˆ µ Qµ [log 2 (1+ µ x(n) )] ;[Reference 20]

where Qµ[…] is a quantizer which produces a kind of logarithmic fixed-point number


with a 3-bit characteristic and a 4-bit mantissa, using a small table lookup for the
mantissa. The operation of this quantization is specified by the following algorithm
[reference 20]:

1. Convert input sample from two’s complement format to sign-


magnitude format.
2. Clip the magnitude to a value of 32635 to prevent integer arithmetic
overflow when the bias value is added.
3. Add a bias value of 132 (0x84) to the magnitude, this guarantees that a
’1’ will appear somewhere in the exponent region of the 15-bit
magnitude.
4. The exponent region is the 8-bits following the sign bit. The next step
is to find the leftmost ’1’ in the exponent region and to record the
position of this ’1’ counting from left to right. The result of the count is
the 3-bit exponent characteristic.
5. Extract and save a four bit mantissa consisting of the four bits
immidiately to the right of the 1-bit.
6. Output word consists of ”seeemmmm”, where ’s’ is sign, ’e’ is
exponent and ’m’ is mantissa.

As can be seen, the quantization depends on the input value. The roundoff will be the
discarded values to the right of the mantissa, which vary depending on where the
leftmost 1-bit is LSB is. Thus an exponential quantization has been applied, but
without using the comprehensive search routines of DPCM and ADPCM. Since the
number of bits saved is the same, the quantization stepsize depends on the input value.
The quantization can therefore in principle be said to be adaptive. It can be shown that
an 8-bit µ-law encoding has 13 bits dynamic range (the smallest value is when the
exponent region is ”00000001”, thus 3-bits will always be discarded), which means
it’s dynamic range is 78dB (ref. 6dB per bit rule) as compared to the 48dB for 8-bit
LPCM [reference 20]. However, the noise of any logarithmic quantization will of
course increase when the signal level increases.

Another advantage of µ-law is that the noise-levels are more evenly spread througout
the signal range than with DPCM. The maximum roundoff error (when all discarded
bits are ’1’-s) is 36dB below the sample value at any time, which is more than for
most DPCM tables (for instance, an expantion of table 1 to 8-bits would give a
maximum roundoff error just 6dB below the sample (or difference) value). In
addition, if the available processing power allows it, µ-law can easily be combined
with prediction for even better results.

50
51
Part 2

- Practical work –
- Documentation –

Thomas Alva Edison – in his laboratory at Menlo Park, New Jersey, 1883

52
3 Hardware Design
Designing the hardware for the wireless loudspeaker system proved to be significantly
more work than first anticipated. Finding components that matched all the
requirements for communication capabilities and data handling proved to be quite
difficult. A custom logic circutry for transfer of the audio data had to be designed.
The process of developing the hardware and the critical choices made in the different
stages are documented in the following sections.

3.1 Selection of components

A system for wireless loudspeakers must include some essential basic parts. The
transmitter has to receive audio signals, digitize and process them and transmit them
over an RF-modulated link. The receiver will receive, decode and convert the signal
back to it’s original analog form. In addition, if the signal source is a CD-drive with
digital output, a SP-dif reciever might be included in the transmitter.

For this project, the RF-transmitter and reciever were both predecided to be
implemented with the Chipcon CC2400 RF-tranceiver. Control and data transfer to
and from the tranceiver has to be done with a microcontroller unit (MCU). Since low-
cost was of the essence in this project, a separate digital signal processor (DSP) for
audio processing was not an option. Thus the audio processing also has to be done by
the MCU and it must be reasonably powerful yet cheap.

D/A-conversion and A/D-conversion can be done with separate converters, however,


flexibility dicates the use of an integrated audio codec (A/D and D/A converters in the
same chip). Then the reciever and transmitter can be implemented on identical PCB’s
and the design will also be much more flexible since the modules are bidirectional.
For instance one can implement two-way communication (like a wireless headset) in
the system without any hardware modifications. All that has to be changed is the
MCU’s source code. The system will then be as shown in figure 29.

Figure 29 Basic block diagram, wireless audio transceiver

In addition, some control logic will be necessary to ensure the right timing and data
transfer between the main units. Since we want a single-clock system it is also

53
preferable if the MCU or the control logic generates the required clocks and the audio
devices slaves off these. The CC2400 however needs it’s own clock, but this will not
interfere with the rest of the system.

3.1.1 RF-transceiver: the Chipcon SmartRF! CC2400


The Chipcon CC2400 transceiver is an integrated wireless solution allowing two-way
communication in the 2.4-2.48235GHz unlicensed ISM-band. It supports over-the-air
datarates of 10kbps, 250kbps and 1Mbps without requiring any modifications to the
hardware. The main operating parameteres of the CC2400 can be programmed via an
SPI-bus, which is also used for normal data transfer. Some key features of the
CC2400 are:

- True single-chip 2.4Ghz RF-tranceiver with baseband modem.


- 10kbps, 250kbps and 1Mbps over-the-air data rates
- Low current consumption (RX: 23mA)
- Data buffering
- Programmable output power
- Standard 1.8V or 3.3V I/O voltages
- QLP 48-pin package

In a typical application the CC2400 is connected to a microcontroller using the 4-wire


SPI-interface. In addition it needs one or two supply-voltages; a 2VMAX core-supply
and 3.6VMAX digital IO-supply if the IO-voltages are in that range. A 16Mhz clock
(crystal or external clock generator) and a few passive component are also needed. A
typical application circuit from it’s datasheet [reference 22] is depicted in figure 30.

Figure 30 Typical application circuit, Chipcon CC2400 [reference 22]

54
3.1.2 Audio codec
To make the design flexible an integrated audio codec was preferred over separate
converters. The system can then have all analog inputs as well as outputs on a single
chip regardless of whether it’s a receiver or a transmitter. Thus true bidirectional
modules can be designed. To have the reference design ready for duplex
communication, it was also regarded as advantageous if the system could include a
microphone input and a headphone output. Under normal circumstances this would be
implemented with opamps. However, some codecs offer a very high level of
integration and have analog amplifiers for mics and headphones, some even
loudspeakers, built in. The cirteria for choosing a audio codec were:

- Price and availability


- Level of integration and flexibility
- Ease of implementation
- Standard I/O-interface and voltages (3.3V)
- Close to CD-quality performance13
- 16bit/44.1kHz compatibility

Two codecs were under serious consideration, the AKM AK4550 and the Texas
Instruments TLV320AIC23B. They are briefly presented and compared in table 8.

Table 8 AKM4550 versus TI TLV320AIC32B comparison [references 23 and 24]


AKM4550 TI TLV320AIC23B
Architecture Audio codec with integrated Audio codec with integrated
ADC and DAC ADC, DAC, microphone and
headphone amplifiers.
Inputs/outputs Line in, line out Line in, line out, mic in,
headphone out
Audio interface I2 S I2S, left justified or right justified
Control interface None (hardware config.) I2C- or SPI-compatible
ADC 1-bit (), 89dB dynamic multibit (), 90dB dynamic range
range
DAC 1-bit (), 92dB dynamic multibit (), 100dB dynamic
range range
Audio format 16-bit, 18-bit or 20-bit, 16-bit, 20-bit, 24-bit or 32-bit,
8kHz-50kHz sample rate 8kHz-96kHz sample rate.
Package 16-pin TSSOP 28-pin TSSOP
Approx. price $2 $3
Pros. - very low cost - Built-in headphone
- small package and microphone amp.
- easy to integrate - High performance
- Flexible
Cons. - low flexibility - No stand-alone mode,
needs configuration
interface

13
Basic ADC and DAC performance parameters are explained in appendix 2.

55
As the comparison shows, the TLV320 is a bit more advanced than the AK4550. The
specifications are also better, especially for the DAC which features a 100dB dynamic
range. The microphone input includes a low-noise bias supply for electret
microphones (often called phantom power) and the headphone output is compatible
with standard 32Ω and 16Ω loads. The unit outputs audio data according to the I2S-
standard and is configured by a system processor over a three-wire (SPI) or two-wire
(I2C) compatible control interface. The control interface allows for many additional
functions, like volume control, power down and audio path control.

Figure 31 Texas Instruments TLV320AIC23B block diagram [reference 24]

Even though the TLV320 is a bit more expensive, it’s higher degree of integration
would probably make the total system cost lower, since opamps for microphone and
headphone amplifiers, components surrounding these as well as PCB area will be
saved. Thus, the TLV320 ended up being the preferred audio codec.

56
3.1.3 SP-dif receiver
When transferring audio digitally it makes sense to design a system which can receive
digital signals from external sources. The most common digital output on CD-players
is the SP-dif interface. In addition to audio data, a SP-dif frame also contains other
information. For details on SP-dif or other formats used in this thesis, see appendix 1.

To decode the information content in the MCU would demand too much resources.
The SP-dif voltages are also not compatible with standard digital TTL or CMOS
levels, therefore an external receiver is necessary. Preferably, it should have a sample
rate converter and be run on an external clock so the data transfer can easily be
synchronized with the MCU. Resamplers are usually quite expensive, but since it is
meant to be optional, one could choose between a version with digital input and one
without, this is acceptable at least for the prototype. Thus, the criteria for choosing a
receiver are:

- Standard I2S-compatible data output


- 16bit/44.1kHz compatible input (SP-dif) and output
- 3.3V compatible IO and supply voltage
- Sample rate conversion preferable
- Resonable price and availability
- Easy to implement in circuit

There is currently two sample-rate converters with integrated SP-dif receivers widely
available on the market; the Analog Devices AD1892 [reference 25] and the Crystal
Semiconductors CS8420 [reference 26]. Both units offer state of the art performance
(dynamic range in the 130dB range) and both have an arbitary sample rate conversion
factor so the clock frequency can be chosen independent of the input sample rate. The
AD1892 has a fixed output sample rate of fCLK/512, while the CS8420 lets the user
choose a factor of 256, 384 or 512.

Although widely available, both chips are aimed at the high-end hifi market and are
therefore quite expensive. But an even bigger problem is that they are both made in a
5V process. Without I/O voltage conversion they can not be used in a 3.3V system.

If one has to integrate the receiver and the sample rate converter on the same circuit
there is only one 3.3V chip that does the trick, the AKM AK4122 [reference 27]. Like
the AD1892 and the CS8420, this is an integrated asynchronous sample rate converter
and receiver which accepts SP-dif at 32kHz, 44.1kHz, 48kHz or 96kHz and outputs
an I2S-compatible data stream at an arbitary sample rate of between 32kHz and
96kHz. However, the chip is not yet (Q2 2004) in mass production and only
engineering samples are currently available.

Since suitable sample rate converters are not available one has to look at stand alone
receivers. Since the MCU will control the clock signals generation and the data
transfer to it from either the codec or the receiver, the receiver should have some sort
of slave mode. Usually receivers regenerate the incoming clock from the SP-dif
signal, and through a PLL generates an output clock which is used to control the unit
it transfers data to, usually a digital filter or a DAC. In this application however, the
data is to be transferred to a MCU which has its own clock. The MCU has to work

57
even when the SP-dif receiver is not connected or does not receive data on it’s input,
and then it can’t be slaved off the receiver. The MCU needs to be the master, and the
receiver needs to be the slave.

There is a 3.3V SP-dif receiver with such a slave mode, the Crystal Semiconductors
CS8416 [reference 28]. It does not have a sample rate converter, but in slave-mode
the LR-clock and bit-clock are inputs which are used to clock data out on the I2S-bus.
If they drift apart from the SP-dif input clock the circuit will either skip or repeat a
sample to get back on track.

Figure 32 Block diagram, Crystal CS8416 [reference 28]

The method for managing slave mode is called ”slip/repeat behavior” [reference 28].
An interrupt bit, OSLIP, in the Interrupt 1 Status register is provided to indicate
whether repeated or dropped samples have occurred. After a fixed delay from the
Z/X-preamble, the circuit will look back in time until the previous Z/X-preamble and
check if one of three possibilities occurred:

1. If during that time, the internal data buffer was not updated, a slip has
occurred. Data from the previous frame will be output and OSLIP set to 1.
OSLIP will remain 1 until the register is read. It will then reset until
another slip/repeat occurs.
2. If during that time, the internal data buffer dit not update between two
positive and two negative edges of ORLCK a repeat has occurred. In this
case the buffer data was updated twice, so the part has lost one frame of
data. This event will also trigger OSLIP to be set to 1. It will remain 1 until
the register is read.
3. If during that time, it did see a positive edge on ORLCK then no slip or
repeat has happened and the OSLIP will remain in it’s previous state.

If the user reads OSLIP as soon as the event triggers, over a long period of time the
rate of occuring interrupts will be equal to the difference in frequency between the
input SP-dif data and the master’s serial output LRCK. To avoid excessive slip/repeat
events due to jitter14 on the LR-clock the CS8416 uses a clock hysteresis window.

14
Jitter is explained in appendix 2

58
3.1.4 Selection of microcontroller
To find a suitable microcontroller was definetely the most complicated task when it
came to hardware design. There are many architectures to choose from and the
requirements cannot be fully established since the software at that stage is not yet
written. So one has to make an estimate of how demanding the application will be,
and then add some headroom to be on the safe side. In addition, the microcontroller
will have to meet the requirements set by the other hardware in the system, like I/O
capabilities and supply-voltages. It has to be easy to implement in circuit, compilers
and other development tools must be available, it’s preferable if the architecture is
fairly standard, it must run at suitable clock speeds and last but not least, it has to be
low-price and widely available.

3.1.4.1 Speed requirements

The microcontroller will have to tranfer data to or from the CC2400 and the codec or
the interface in real-time while doing compression. The sample speed is 44.1kHz, so
the data rate from the codec/interface will on average be 1.41Mbps while the data rate
to the CC2400 on average will be 1Mbps. How much resources the data compression
will use is unknown, but a study of excisting algorithms showed that the fastest ones
need between 25 and 35 instructions per sample when running on a 16-bit processor
[reference 2, 29]. In an 8-bit architecture arithmetic operations on 16-bit numbers will
be significantly more demanding, so close to 100 instructions per sample is a crude,
but fair estimate. A typical serial data transfer requires approximately 8-10
instructions per register transfer [reference 31], which translates to a bit more than one
instruction per bit in an 8-bit architecture and the half in a 16-bit architecture. In
addition the MCU will have to do control routines, the timers will be used to
genereate clocks and so on, so quite some headroom must be calculated. For stereo
16-bit/44kHz audio this lead to an estimate shown in table 9.

Table 9 Crude MIPS requirement estimation for MCU


Task MIPS, 8-bit MIPS, 16-bit
Audio compression 100IPSa*44100Sa/S=4.4MIPS 35IPSa*44100Sa/S=1.5MIPS
Transfer audio 1.41MBPS*1.25I/B=1.8MIPS 1.41MBPS*0,75I/B=1.1MIPS
Transfer CC2400 1MBPS*1.25I/B=1.3MIPS 1MBPS*0.75I/B=0.8MIPS
Other / headroom 2MIPS 2MIPS
Total 9,5MIPS 5,4MIPS

One must keep in mind that this is a very crude estimation and just a guideline. The
algorithm can be simplified or improved, different MCU architectures may use less or
more instructions to transfer data and so on, so one cannot discard a MCU just
because it is slightly below the estimate given in table 9. But it cannot be too far
away, a 2 MIPS processor will not do the trick. Also, since the codec and interface
need a clock speed of at least 256fS, the MCU should be able to run at this frequency.
Can it run even at 512fS, this would be advantageous, but it is not a requirement.

59
3.1.4.2 Memory requirements

In addition to speed, memory is an important factor when choosing a microcontroller.


An estimation of memory requirement must also be made and met with some
headroom when selecting the appropriate MCU.

A study of existing lossless audio codecs showed that a frame size of between 576
and 1152 samples is commonly used [reference 2]. For a stereo signal this translates
to approximately 2-4 kBytes of memory usage. In a microcontroller this has to be
decreased, but not too much since the overhead from any frame headers should not be
too significant. An estimated ”least useful frame size” of 64 samples is defined. This
translates to 1.45ms of music or 256 bytes of memory when uncompressed. A
compressed frame will require an estimated 180 bytes (including overhead) since the
maximum transfer-rate is 950kbps, with double-buffering and some overhead this
should require approximately 400 bytes (double-buffering is needed because the
sample rate in lossless compression will vary, while the transmission rate through the
CC2400 will be constant). In addition some headroom must be given to allow other
variables, tables and such in the software.

This estimations leads to the following requirements for RAM:

- Minimum RAM: 1kByte


- 2kByte preferable

In addition one must take into consideration the requirement for program memory.
The size of the program itself, as well as possible constants or look-up tables will
determine the need for program memory. The object code for the smallest algoritm
available, MusiCompress [reference 29] takes up a total of 14.8kBytes for
compression and packing (decompression and unpacking uses 9kBytes). The MCU-
algorithm will probably be more simple, but communication with the codec and the
CC2400 will also need some room for implementation.

The estimations done leads to the following program memory requirements:

- Minimum program memory: 16kBytes


- 32kBytes preferable
- Program memory should be FLASH for easy re-programming

60
3.1.4.3 I/O requirements

The microcontroller needs to communicate with both the CC2400 and the audio codec
or receiver simultaneously. The data transfer to both has to be synchronous. This can
be done in two ways. Either the MCU can have two SPI (Serial Peripheral Interface)
ports or the data from one of the units can be parallelized and transferred through a
general I/O-port. Since the data from the audio units consists of fixed wordlength
samples it makes most sense to parallelize the data from them.

Figure 33 Communication through a) 2 SPI-ports or b) 1 SPI-port and parallell IO


via shift registers

To convert the serial data to parallell and vice versa can be done via parallell-to-serial
and serial-to-parallell shift registers, for instance the 74HC166 and the 74HC4094.

The I/O must also be capable of the necessary transfer rates, 1.41Mbps to or from the
audio unit and 1Mbps over SPI to or from the CC2400. In addition it will need a I2C-
interface (or a free SPI-interface) to be able to configure the preferred audio codec.

61
3.1.4.4 Evaluated microcontrollers

In the process of finding the right microcontroller, serveral was taken into
consideration. When evaluating microcontrollers, the following factors have been the
most important:

- Does the microcontroller meet the speed and memory requirements?


- Does it have the necessary I/O-capabilities and is it easy to implement in
system?
- Is it a widespread architecture with easy-to-get programming tools and
program examples?
- Is it widely available at a low price?
- Are development kits available at resonable prices?

Based on these criteria, numerous alternatives were evaluated. The following sections
will give a brief description and comparison of the microcontrollers that ”made it to
the final round”. The MCUs listed in table 10 were seriously considered and closely
studied before a final decision was made.

Table 10 Comparison between seriously considered MCUs [references 30-36]


Atmel Mega Texas Instr. Motorola Hitachi/Renesas Silicon Labs.
32L/169L MSP430F1481 DSP56F801 R8C/10 Tiny C8051F005
Architecture 8-bit RISC 16-bit RISC 16-bit DSP- 16-bit RISC 8-bit CISC
hybrid
Clock speed <8 Mhz <8 Mhz 8 Mhz (80 Mhz <16 Mhz <25 Mhz
core)
Max inst./s 8 MIPS 8 MIPS 40 MIPS 16 MIPS 25 MIPS
Instr. set AVR MSP430 56800 M16C 8051
Memory 2kB/1kB 2kB 2kB 1kB 2,25kB
Program 32kB/16kB 48kB FLASH 16kB FLASH 16kB FLASH 32kB FLASH
memory FLASH
I/O 1 SPI 2 SPI/USART 1 SPI 1 SPI/USART 1 SPI/UART
(169L:2 SPI) 48 IO-pins 1 SCI 16 IO-pins 1 SMbus
1 two-wire (I2C-comp)
(I2C-comp) 32 IO-pins
1 USART
32 IO-pins
Package pins 48/64 64 48 32 64
Approx unit $6.50/$7.20 $7.00 $7.90 $1.50 $13.50
price15
Free Yes Yes No Trial version Yes
programming only
tools
Development STK500+501 MSP-STK430 DSP56F801EVM M3A-0111 C8051F00DK
kit / approx $150 $100 $350 Price unknown $99
retail price

15
Prices are found either from the manufacturers website or www.digikey.com. Prices are in quantities
of 100, except Renesas R8C/Tiny for which price is given for a quantity of 1000.

62
3.1.4.4.1 Atmel AVR Mega169L and Mega32L

Early in the hardware design phase, Atmel AVR was considered the most likely
achitechture to use in the WLS. It’s a much-used processor-series with good
performance at a resonable price. It’s also easy to program with a small and efficient
instruction set. The AVR-architecture has the disadvantage of being 8-bit, but at 1
MIPS per Mhz, the performance is still good.

The only AVR-series with enough memory is the Mega-series ranging from the
Mega16 with 1kByte RAM and 16kByte flash to the Mega128 with 8k/128k. The
”L”-units are 3.3V-compatible and thus the only ones which can be integrated into the
system with ease. The Mega169L [reference 30] and Mega32L [reference 31] were
considered to be the most suitable.

While the Mega169L has the advantage of two SPI-interfaces it also has the
disadvantage of having only half the memory compared to the 32L. Even if the
communication was made easier by being able to opt for the scheme shown in figure
33a), there was always the question of memory. The 32L on the other hand has more
memory and is also both cheaper and comes in a smaller package, so the extra cost of
some external logic will probably balance out. Thus the Mega32L was considered the
best of the two.

However a more crucial problem became apparent when it came to speed. The ”L”-
versions are both rated at only 8Mhz. Although the 169L has a typical performance of
12Mhz at 3.3V [ref 31, figure 138], Atmel will not guarantee stable operation at this
frequency.

If the MCU is to be run at 8Mhz, much external logic is needed to generate the 256fS
clock for the audio circuits. In addition the timing between the MCU and the audio
circutry would be more complicated. The alternative would be to divide the 256 fS
clock and run the controller at 128 fS or 5.64Mhz. But with an 8-bit architecture and
5.6 MIPS, the speed requirements are not met. Therefore other alternatives had to be
taken into consideration. The AVR Mega169L/32L suitability is summarized with
advantages and disadvantages.

+ Widespread standard with easy-to-get software and program examples


+ Fast and efficient RISC architecture, 1 MIPS pr. Mhz
+ 2 SPI-interfaces on the Mega169L
+ Mega32L meets memory requirements, Mega169L is on the limit
+ Resonable price and very good availability
- Can not be run at higher frequencies than 8 Mhz
- Not very powerful, 8-bit architecture and relatively low speed

63
3.1.4.4.2 Texas Instruments MSP430F1481

The MSP430-family of mixed signal devices from Texas Instruments is a series of


true 16-bit RISC microcontrollers with a one-clock-cycle (1T) register-to-register
operation execution time. The F148-models [reference 32] have 48kBytes program
memory and 2kBytes of RAM so they meet the RAM requirements easily. They have
two SPI-interfaces and with no LCD-driver and no AD-converter the price of the
basic model F1481 is low. The processor is sold by many stores as well as the
manufacurer themselves, but the architecture not as widespead as the AVR or 8051,
so the availability of program examples is not so good. TI offers tools and
development kits at low prices.

The MSP430 however has the same problem as the Mega-L-series, it is not rated
faster than 8Mhz. Thus it has to run at 128 fS or 5.6Mhz to avoid excessive external
logic. Although this speed limitation is the same as for the AVR, the TI is still
considered better performance-wise since it has a 16-bit achitecture. Since all register-
to-register instructions are 1T-executable, it will probably exceed five 16-bit MIPS,
which is close to the performance requirement. Another drawback with the MSP430 it
that it has no I2C-compatible interface. Because of this, one SPI interface must be
used for the CC2400 and one for the TLV320, and the audio transfer scheme would
still have to be the one from figure 33b).

In summary, the evaluation of the MSP430F1481 are as follows:

+ 16-bit RISC-architecture
+ Fast and efficient instruction set
+ Easily meets memory requirements
+ 2 SPI-interfaces
+ Very low price.
- Can not be run at frequencies above 8 Mhz
- Not widespread standard
- No two-wire / I2C interface.

64
3.1.4.4.3 Motorola DSP56F801

The Motorola DSP56800-series is a new generation integrated MCU/DSP-hybrids


designed for portable and integrated multimedia applications. The DSP-performance
is significantly better than with standard microcontrollers and the 16-bit unit also
includes peripheral like SPI and PWM-outputs. The 56F801 [reference 33] has
16kB/2kB memory and runs at a blistering 40MIPS and 80Mhz core frequency. The
built-in 10x clock multiplier accomodates the use of a standard 8Mhz or less crystal
resonator. The package is small (48-pin) and the price not significantly higher than a
standard MCU like the Atmel AVR.

The instruction set [reference 34] is somewhat more complex than for your basic
MCU since it also includes some signal processing instructions. However a single-
instruction 16-bit barrel-shifter, a single-instruction 16x16 multiplier and two 36-bit
accumulators included in hardware simplifies mathematical operations significantly.
The major disadvantage of the DSP56800 is that it is a relatively new processor
family and that and code examples and reference designs are not as available as for
more established architectures. Also, the development tools are very expensive.

To summarize the evaluation of the Motorola DSP56F801:

+ Very powerful
+ Built-in DSP-features
+ 16-bit architecture
+ Meets memory and I/O requirements
+ Efficient C-compiler
+ Small package and single power-supply
+ Competitive price
- Relatively new architecture, not as established as others
- Development tools are very expensive.

65
3.1.4.4.4 Hitachi/Rensas R8C/10 Tiny

The Hitachi R8C/10 [reference 35] is a powerful 16-bit microcontroller built into a
small 32-pin package. Still it can be run at up to 16 Mhz and features a SPI-interface
as well as 21 general I/O-ports. The price is very low, four times below the Atmel or
Motorola-units, and the small package makes it ideal for portable solutions.

There are three alternative memory configurations of the R8C/10 with the biggest at
16kB/1kB. This is the same at the minimum set in the requirements. In addition to
this, there seems to be another major problem with the R8C/10; availability. It is not
easy to find from other distributors than Rensas themselves, and the selection of
development tools seems small. Compilers and debuggers are not freely available and
litterature, code examples and other practical information is very difficult to come
across. Also, it does not meet the IO-requirements when using the TLV320 since it
has only one SPI-interface and no I2C.

The R8C/10 seems to have great potential, but some uncertainties makes it a bit risky
to include in a reference design without prior knowledge to the processor family.

+ Good performance, up to 16Mhz


+ 16-bit MCU-architecture
+ Very small package
+ Very cheap
- Availability seems unclear, not many distributors
- Not standard architecture, little literature or information available.
- No development tools freely available
- Does not meet IO-requirements when using the TLV320 codec.

66
3.1.4.4.5 Silicon Laboratories C8051F005

Silicon Laboratories (formerly known as Cygnal) produces high-performance


microcontrollers based on the very established 8051 8-bit architecture. The
C8051F005 [reference 36] features a 32kB/2.25kB memory configuration and 32
digital I/O-ports. Although 8-bit immediately seems like a disadvantage, the C8051-
series runs at up to 25Mhz and since it executes 70% of it’s instructions in one or two
clock cycles it enables up to 25 MIPS throughput. Both in terms of speed, memory
and I/O capabilities it should be sufficient. As the only chip of the ones considered
running at over 20Mhz (although the DSP56F801 runs at up to 80Mhz internally) it
can be clocked with a 512fS or 22,6Mhz clock. Thus performance should not be a
problem and all necessary clocks are easily generated.

The 8051-architecture is very well established and compilers and other tools are
widely available. Silicon Laboratories also offer development kits at very reasonable
prices. The only disadvantage of the C8051F005 is that it, like most others, only has
one SPI-interface and that the chip itself is more expensive than the rest. If 16 I/O-
ports are sufficient, a slightly cheaper but otherwise identical model, C8051F006, is
available in a 48-pin package.

To summarize, the evaluation of the C8051F005 comes down to the following


conclusions:

+ High performance and clock frequency


+ Can be run at 512 fS
+ Very well-established architecture
+ Meets I/O and memory requirements
+ Tools widely available at resonable cost
- 8-bit architecture
- Quite expensive

67
3.1.5 Conclusions:
The process of finding the right components was extensive but ultimately rewarding
work which gave insight into the hardware market as well as experience in evaluating
possibilities and limitations of different kind of circuits.

The CC2400 was decided to be one of the components in advance, since the target
application is a demonstration system for just that chip. The decision to use the
TLV320 audio codec was also made at an early stage since it met all the requirements
and also is highly integrated and thus quite easy to implement in circuit.

Finding the right SP-dif interface and microprocessor however, was a more difficult
task. An SP-dif receiver with an integrated sample rate converter was initially thought
to be the solution, but no such circuits are available for 3.3V supply voltages. The
arrival of the AK4122 can change this in the near future. But for now, a receiver
without sample rate conversion must be used. The Crystal CS8416 seems to be the
most suitable one since it features a slave-mode and also 3.3V operation. However,
when the AK4122 arrives this is highly likely to be preferable.

As far as microcontrollers go, there are so many models and architectures to choose
from, and so many factors to take into account, that one just has to cut through to ever
get done. Consequently a few models were moved on to ”round two” and evaluated
furher. They are the ones presented in this document.

The final decision fell on the Silicon Laboratories C8051F005, due to it’s
performance, availability, low-cost tools and well-known architecture. The great
performance and competitive price also makes the 16-bit Motorola DSP56F801 a very
strong contender, especially if software upgradability is taken info consideration. The
Motorola is probably powerful enough to run more advanced audio algorithms, like
subband filtering or even Ogg-Vorbis fixed-rate lossy compression, in real-time. But
the unit is less widespread, development tools are much more expensive and
litterature is scarse, so opting for a 8051-architecture was considered the safest bet.

It should also be mentioned that although the price given in table 10 seems very high
compared to the others, my instructor at Chipcon informed me that very good deals
could be made with the distributor, which would make it much more competitively
priced. This also had significance for the final decision when it was made.

68
3.2 Audio transfer to MCU
The preferred MCU, Scilicon Laboratories C8051F005 has only one SPI-port which
will be occupied by the CC2400 RF-module. Since the datarate to and from the audio
codec or SP-dif device is more than 1.4Mbps, creating a second SPI in software will
put to much strain on the processor. A different scheme is proposed where the data is
converted from serial to parallell form and sent word-wise to the microcontroller. The
microcontroller will read or write 8-bit words on it’s IO-port and appropriate logic
will be implemented to convert it to serial form.

3.2.1 Principle for data transfer, audio device - MCU


The audio device outputs serial data in accordance with the I2S bus specification. Four
signals are used. The LRCK clock signal indicates whether the left or the right channel
is the one being transferred. SCLK is used to clock data and SDTO and SDTI are the
data input and output lines. This is shown in figure 34. Special attention should be
given to the fact that the MSB in the sample is delayed one clock cycle with respect to
LRCK.

Figure 34 I2S data transfer timing diagram

The principle for the communication shceme is shown in figure 35. The data is
transformed from serial to parallell form, so the MCU receives or transmits SD[15..8]
in one read/write and SD[7..0] in the next.

69
Figure 35 Principle for data transfer between audio device and MCU

The control signals will tell the serial to parallell interface when to latch data onto the
8-bit bus (the data-flow from the I2S-interface is continuous) when data goes from the
audio device to the MCU and when to read data from the bus when the flow is in the
opposite direction. There also has to be control signals to the MCU so it knows when
to write or read data on it’s IO-port and also so it knows if it is the left or right
channel data it is dealing with.

3.2.2 Realization of data transfer, audio device - MCU

3.2.2.1 Serial-to-parallell and parallell-to-serial conversion

To make the data transfer possible, appropriate logic devices had to be found. The
74HC4094N 8-stage shift-and-store bus register [reference 37] is ideal for converting
data from serial to parallell form. It has a serial input and a strobe input. For each
clock tick the data on the serial input is shifted one step to the right in the shift
register. When the strobe is set high, the data in the 8-stage shift register is latched to
the 8-bit storage register. Whenever the output enable signal OE is high, the contents
of the storage register is available on the parallell outputs. When OE is low, the output
is in tri-state. This is shown in figure 36.

70
Figure 36 Simplified schematics, 74HC4094N [reference 37]

To use this device to transfer data from the audio device to the MCU, a control signal
is needed for the STR-input. The strobe signal has to be set high when a complete set
of data is shifted to the input. This is shown in figure 37.

Figure 37 Tming diagram, transfer from audio device to MCU

As can be seen, a STR-pulse is needed every eigth BCLK cycle. Since the
74HC4094N holds it output vaule constant when STR is low, the MCU can read the
data at any given time before the next STR-pulse. A delayed STR-signal, for instance
by one BCLK cycle can thus be used to interrupt the MCU to make it read it’s IO-port.
The falling edge of STR provides an ideal interrupt source.

To transfer the data from the MCU to the audio codec the 74HC166N 8-bit parallell-
in/serial-out shift register [reference 38] is used. It latches in an 8-bit word on the
input and shift it out serially, MSB-first. The device is activated with an active low
/CE-signal and the data is latched in using the /PE-input. A logic diagram of the
circuit is shown in figure 38.

71
Figure 38 Logic diagram, 74HC166N [reference 38]

The I2S audio device reads the SDTI input on the positive edge. To assure valid data
with good timing margins on the I2S-interface, the data on the SDTI-input should
change state on the negative clock edge and have a stable, valid value on the positive.
This can be seen from figure 34. Since the 74HC166N shifts data out on it’s positive
clock edge it should therefore be run on an inverted clock. Then the timing diagram
will be as shown in figure 39.

Figure 39 Timing diagram, transfer from MCU to audio device

The arrows indicates when the audio device reads the SDTI-data. The data is valid in
this instance and there is significant time to or from the next transition on SDTI. Thus
the timing requirements are not very stringent. The requirement for the MCU is that it
has valid data on it’s outputs before /PE is low. The falling edge of STR can therefore
provide the interrupt source for the write too.

72
3.2.2.2 Design of logic to create necessary control signals

The control signals that needs to be generated are strobe and /PL in addition to the
SCLK and LRCK signals. At first I intended to use the PWM-outputs from the MCU
to generate these signals, but this proved to be unfeasible. The C8051 has a
programmable counter array (PCA) consisting of 5 separate capture/control modules
that can provide separate PWM-outputs. These are all controlled by a single PCA
counter/timer. The low-byte of the counter register is compared to a user defined
value to provide a PWM output with selectable dutycycle and a frequency of 256fT,
where fT is the timebase frequency of the counter (see reference 36, chapter 20 for
details). Since the maximum fT is SYSCLK/4, the maximum PWM-frequency is
SYSCLK/(4·256) or fS/2. SCLK, /PL, STR and LRCK must run at 32fS, 8 fS, 8 fS and fS
respectively, and it’s then impossible to generate them using the PWM-outputs of the
C8051.

Both the strobe and /PE signals are active only every eighth SCLK cycle. Without the
possibillity of using PWM, an external counter is needed to generate them. A gate on
the output can give a high value when the counter has a specific value (e.g. ”000” or
”111”) and a low value otherwise. The /PE is delayed one half clock cycle with
respect to STR and also inverted. This does not have to be done externally since the
74HC166N is running on an inverted clock and therefore detects /PE one half clock
cycle later. A fast ripple counter, like the 74LV4040, can also be used to create the
SCLK and LRCK when it is clocked with the master clock. Since the master clock is
512fS, SCLK is 32fS and LRCK runs at fS, the scheme proposed is as shown in figure
40.

Figure 40 Logic circuit for generation of control signals

The 256 fS output provides a master clock signal for the audio device. The bitclock
BCLK and it’s inverted /BCLK are given by the ripple counter b[5..7] = ’1’. The
output b8 provides a SCLK/16 signal that will be used to tell the MCU if it’s the
MSW (most significant word) or the LSW (least significant word) of a sample that is
being transferred. If LRCK and SCLK/16 are both ’1’, it’s a right channel LSW, if

73
[LRCK, SCLK/16] is [’1’,’0’] it’s a right channel MSW, [’0’,’1’] is a left channel
LSW and [’0’,’0’] is a left channel MSW.

The control signals are shown in figure 41. The very high frequency MCLK and 256 fS
are omitted for clarity.

Figure 41 Timing diagram for control signals

As we can see, STR is high and /PE is low at the critical points, when their respective
circuits are supposed to latch and load data.

The component cost for realizing the control signals is:


1pc. 74LV4040 ripple counter
1pc. 74HC27 2x3 input NOR (also used to realize the two inverters)
1pc. 74HC4094N 8-stage shift-and-store bus register
1pc. 74HC166N parallell-in/serial-out shift register

These are all low-cost circuits, so compared to a MCU with two SPI-interfaces (which
would also need some logic to be made I2S-compatible) the extra cost in hardware is
not significant. Another alternative is to integrate all of this into a small and cheap
CPLD, for instance the Xilinx XE9536. The cost would then be the CPLD plus the
connector needed to program it.

In the prototype, the communication is realized with logic devices.

74
3.3 Circuit design
After deciding which components to use and developing the communications system,
the next step was to design the complete circuit. The system consists of a total of eight
IC’s; the C8051F005, the TLV320, the CS8416, the CC2400, the 74LV4040, the
74HC27, the 74HC4094N and the 74HC166. The block diagram is showed in figure
42.

Figure 42 Block diagram, wireless loudspeaker system

This diagram is highly simplified, although all major signals and buses are included.
The thickest lines are buses and the thinnest clock lines, the normal ones are signal
lines. As can be seen, there is a fair amount of routing to be done, especially between
the MCU and the audio units and logic. The switch indicates the analog/digital input
selectors. Rather than opting for a electronic selector, like a mux, jumpers are used
since they were necessary to include some other functionality anyway.

75
3.3.1 Configuration of the SP-dif receiver.
The configuration of the Crystal Semiconductor CS8416 SP-dif receiver was done in
accordance with it’s datasheet. The unit has 8 SP-dif inputs routed through a 8:2 input
MUX, but only one input was used in our application. To keep the physical
dimensions small and to avoid extra cost, the possibility to use more inputs was not
utilized. To simplify implementation, the stand-alone modus is used, so the MCU
does not need to use resources communicating with the receiver. The input-select pins
are hardwired to choose input 0, while the indicator outputs, with the exception of
/AUDIO, are not used. The /AUDIO output indicates if there is valid data being
received and is connected to a general I/O-pin on the MCU so this can know when a
signal is coming. The connection of the chip is as shown in figure 43. The SP-dif
input is terminated with a 75Ω load resistance as specified by the SP-dif standard.

Figure 43 Configuration of SP-dif receiver

Special care should be taken when routing the PLL-filter. This is very sensitive to
stray capacitances. To achieve correct filter characteristics and thus good jitter
performance, the layout should be like shown in figure 44. Ground connection for the
PLL filter should also be returned directly to AGND independently of the ground
plane.

76
Figure 44 Recommended filter layout [reference 27]

If this recommendation is followed, the PLL in the CS8416 should provide very good
jitter attenuation.

3.3.2 Configuration of the audio codec


Unlike the SP-dif receiver, the Texas Instruments TLV320AIC23B audio codec has to
be set up using a microcontroller. This is done using a 2-wire I2C-compatible
interface. The configuration inputs can also be set up to be SPI-compatible, but since
the MCU SPI-interface is occupied by the CC2400, I2C is used for the codec. This is
set up by hardwiring the MODE and /CS inputs. The data outputs are routed to the
logic devices handling the audio transfer.

The line inputs and the mic-input is set up and filtered as recommended in the
datasheet and the electret biasing output is connected to the mic-input so the system
can be used with all kinds of microphones. It is connected through a big resistor (10k)
to prevent the DC-voltage inflicting damage on dynamic microphones.

The headphone output however, was changed slightly from the recommended layout.
In their reference design, Texas Instruments used 220µF decoupling capacitors.
However, simulations showed that this would compromise bass performance when
used with a low-impedance 32Ω or 16Ω headphone. Since the system is supposed to
have high-fidelity performance, a frequency response convering the entire audible
range from 20hz-20khz (-3dB) is desirable. The capacitor size had to be increased.
Figure 45 shows SPICE-simulations with two widely available alternatives, into
standard 32Ω and 16Ω headphone loads.

77
Figure 45 220µF, 330µF, 470µF decoupling caps frequency response, 32/16Ω load

The 220µF capacitor gives a 4dB drop at 20hz, which is outside of specification even
with a 32Ω headphone. 330µF gives almost 2dB while 470µF leads to just 1dB drop,
well within the demands. With a 16Ω load only the 470µF cap fulfilled the spec.
However, from our supplier 470µF capacitors turned out to be much larger physically
than 330µF. Because of this, and also since 16Ω headphones are quite rare, the middle
value was chosen as a compromise. The complete connection of the TLV320AIC23B
is as shown in figure 46.

Figure 46 Configuration of audio codec

78
3.3.3 Configuration of the RF-transceiver
The Chipcon CC2400 RF-transceiver is in this application set up identically to the
2400DB demonstration board. The microcontroller interface is connected for
hardware packet handling support. This allows for hardware insertion of preambles,
sync-words and CRC in the data stream by the CC2400. If this does not need to be
utilized, the relevant pins can just be ignored by the MCU. It uses it’s own 16Mhz
crystal and two voltage levels (1.8V core and 3.3V IO). Data transfer and
communication is, in addition to the pins used for packet handling, done through a
standard SPI-interface, connected to the MCUs SPI-pins.

The CC2400, being an RF-device, is rather sensitive to PCB-layout. Separate voltage


and ground planes, as well as low impedance connections from all critical nodes to
these, is highly recommended. The layout itself was done by Chipcon, using
professional CAD-tools, and will not be reviewed in detail in this thesis. Interested
readers are reffered to the Chipcon CC2400 datasheet and the complete PCB-layout
included in appendix 5.

The connection of the Chipcon CC2400 RF-transceiver is shown in figure 47.

Figure 47 Connection, Chipcon CC2400 RF-transceiver

79
3.3.4 Configuration of the MCU IO
The C8051F005 IO-system uses a Priority CrossBar Decder to assign the internal
digital resources to the IO-pins. This gives the designer full complete control over
which functions are assigned, limited only by the physical amount of IO-pins in the
selected package. A block diagram of the system is displayed in figure 48.

Figure 48 C8051F00x IO-system functional block diagram [reference 36]

The CrossBar assigns the selected internal digital resources to the IO-pins based on
the Priority Decode Table [reference 36], shown in figure 49. It starts at the top with
the SMBus, which means that when it is selected it will be assigned to P0.0 and P0.1.
The decoder always fills IO-bits from LSB to MSB starting with Port 0, then Port 1,
finishing if necessary with Port 2. If a resource is not used, the next function in the
priority table will fill the priority slot.

80
Figure 49 C8051F00x priority decode table [reference 16]

In the design of the wireless audio system, the SMBus will be used to configure the
audio codec, thus it must be assigned. Next, the SPI-interface will be used to send
data to and from the CC2400 RF-transceiver. The UART will not be used, neither will
the timer outputs, since all control signals are generated by external logic. The
interrupt input /INT0 will be used however, since the MCU must receive an interrupt
when sending or receiving data. The /INT1 is used by the Chipcon CC2400. The
/SYSCLK output will also be used to clock external circuits, while the rest will be
unused. This results in a configuration of the CrossBar Decoder as shown in figure 50.

81
Figure 50 Configuration of MCU IO CrossBar Decoder

The IO-pins P0.0 to P0.7 will be assigned to digital functions as shown in figure 50,
while the rest of the ports will be general IO (GIO) ports used to transmit and receive
necessary data and other signals. The complete circuit schematics (appendix 4) shows
the entire allocation of the IO-pins for the MCU and the complete connections for the
circuit.

82
3.3.5 The finished circuit
The complete circuit with all connections are shown in figure 51 (a bigger, higher
resolution version is found in appendix 4). For clarity some connections are shown as
buses.

Figure 51 Complete circuit diagram

In addition to all the circuits including logic, the power supply and analog connections
are also shown. Some of the lines are routed through a 10-pin connector to provide
extra flexibility. One jumper can choose between normal mode and digital loopback.
In the latter mode, the audio output from the codec is fed back directly to it’s inputs.
This gives the user the opportunity to test if the codec works, if it’s properly set up
and so on without having to connect or program the entire system. This should also
enhance the circuit’s testability significantly, since locating errors will be much
easier. If the digital input is selected, this is routed back to the codec in loopback
mode. The second jumper selects master or slave mode for the MCU, while the third
one is a digital/analog input selector. The jumper settings are shown in figure 52.

83
Figure 52 Jumper settings

To further enhance testability, several zero-ohm resistors are put on critical lines. In
addition, the circuit has two logic analyzer connections, compatible with the standard
logic analyzer port of figure 53.

Figure 53 Logic analyzer standard connection

The pinout is such that the logic analyzer can be used to both monitor all critical
signals under operation, but also to take directly control over the audio codec and the
SP-dif receiver if it is necessary during testing. This is useful if for instance the MCU
for some reason fails to provide the clock signals or control signals necessary to
operate the other devices and thus test them. The complete logic analyzer port pinout
is shown in figure 54.

84
Figure 54 Logic analyzer connections

In addition, there are two LEDs in the circuit to indicate power-on and /AUDIO from
the SP-dif receiver respectively. Also, a third LED is connected to a MCU IO-port
and can be used for whatever the user finds desirable.

85
4 Analysis of Lossy Compression Algorithms

The lossy compression algorithms examined were written in C, ensured to be wav-


compatible and run on a Apple Powerbook G4 laptop-computer. The compressed and
decompressed files were analyzed in Matlab to see how big the errors were. A much-
used measurement of loss in a compressed audio file is the ratio between the signal
power and the error power, also referred to as the SER. Since the error (or noise-level)
in a lossy compressor should follow the signal level (to stay below the masking
threshold), SER gives a better indication of it’s loss than you get by just looking at the
error itself. Also, a plot should be made to ensure the error level actually follows the
signal-level. A Matlab-script was written that calculates the SER and the maximum
absolute error and plots the signal and the error as a function of time. The source-code
as well as the Matlab-script is given in appendixes 6 and 7. The maximum absolute
error is simply the factor between the maximum error and the maximum allowable
signal level while SER is given by the equation:

& 1 N # & N #
( "
N i=0
(x[n]) 2
% ( " (x[n])2 %
Eq. 34 SER = 10 ! log ( % = 10 ! log ( i =N0 %
(
N
1 2 % ( 2 %
(
'
" (e[n]) %$
N j=0 (' " (e[n]) %$
j=0

The analysis was done with a file called ”littlewing.wav” a recording of myself
playing guitar. The recording has much dynamics, so performance could be evaluated
at both low and high signal levels, a fairly wide spectrum but even more importantly,
a very clear and unedited sound. When doing subjective listening-based quality tests it
is important to have a reference that sounds both natural and familiar. Then distortion
and colouring of the sound can more easily be identified since one knows how it’s
supposed to be like. The necessity of subjective listening tests is obvious. Although
the SER together with an error plot gives a good indication of how much loss there is,
it tells quite little about the nature of the loss. Lossy compression algorithms use
perception-based models, whose quality can affect the resulting fidelity significantly,
even if the loss is the same in absolute quantity.

Figure 55 shows the waveform and spectrum of the used test file.

86
Figure 55 Waveform and spectrum, "littlewing.wav"

4.1 Reference for comparison; 8-bit and 4-bit LPCM

To put the numbers into perspective, the tests are first presented on 8-bit and 4-bit
LPCM requantization of the audio data. When doing LPCM quantization, the 6dB per
rule tells us the maximum acheivable SNR, the resolution, is 6"B, where B is the
number of bits. Since LPCM quantization does a random roundoff, the noise is almost
white and the level is thus constant and about 6"B dB below the maximum signal
level. For a maximum level signal, the SER would then be identical to the resolution,
but for normal music signals it will be significantly lower as the results for
”littlewing.wav” show.

Figure 56 Performance measurements, 4-bit and 8-bit LPCM

87
Table 11 Performance, 8-bit and 4-bit LPCM
8-bit LPCM 4-bit LPCM
SER 28.8dB 8.3dB
Maximum error 0.004 0.07

As we can see, the SER is well below the resolution. This is of course because the
signal level and thus signal power is lower than maximum while there is no related
shaping of the noise. We can clearly see that the quantization noise is white, at least
for the 8-bit version. For 4-bit there actually is some visible correlation between the
signal and the noise. It can be shown that LPCM quantization noise in reality is not
completely white, but does produce some distortion, especially for low-level signals
or very coarse quantizations. Since distortion sound worse than white noise this is
often compensated for by adding random noise, also called dithering16.

It should also be noted that the noise is not in any way psychacosutically shaped.
When the signal level is low, the masking threshold is also low, but the noise is still
high. It is then very audible. Perceptive-based shaping of the noise can provide
significant improvements in audio fidelity, even when the SER value is the same.
Both 8-bit and 4-bit LPCM is classified as low-fidelity.

4.2 Analysis of 4-bit DPCM


A 4-bit (or 4:1) DPCM compression algorithm was written and compiled on the
Powerbook. It uses the scheme described in the DPCM theory chapter with an
exponential quantization table. The quantization is showed in table 12.

Table 12 DPCM quantization table


Code 0 1 2 3 4 5 6 7
Difference 0 -16536 -4096 -1024 -256 -64 -16 -4
Code 8 9 10 11 12 13 14 15
Difference 0 4 16 64 256 1024 4096 16384

The source-code is given in appendix 6. As one can see, the quantization spaces are
small for low levels and very large for high levels. It’s therefore assumed that the
DPCM will perform poorly when the levels (or rather, the differences, since first
order prediction is used) are high. Since some music recordings are very dynamic, it’s
likely that DPCM will be less suitable in a hifi-application than for voice-coding,
where the levels are usually quite low. The algorithm was tested for performance
using ”Littlewing.wav”.

As expected, the 4:1 DPCM compressor was very fast, but did not perform well when
it came to audio quality. Especially for loud signals, the quantization error is huge (as
can be seen by looking at it’s exponential quantization table) and the distortion is
clearly audible. At low volumes, the noise-level is far better than for 4-bit LPCM

16
See appendix 2, ”Data converter fundamentals” for details

88
quantization, and the music quality is improved somewhat. But the error ”bursts” as
seen in figure 57 are far above the masking threshold and are clearly audible.

Figure 57 4:1 DPCM performance measurement, "Littlewing.wav"

As table 13 shows, a calculation of SER despite prediction gave little improvement


over 4-bit LPCM17, but still the shaping of the noise gave a clear improvement in
subjective performance outlining the need for listening tests as well as measurements.
Still, the performance is nowhere near acceptable quality. 4-bit DPCM is suitable for
voice applications, but more or less useless on high-fidelity audio.

Table 13 Performance 4-bit DPCM, ”littlewing.wav” (see text)


SER 8.5dB
Maximum absolute error 0.70
Complexity estimation Approx. 100 inst./sample

17
It can be shown that if the variance in the difference between samples (i.e. the predicted residual),
* ∆x, is larger than the variance of the samples, *2x, prediction will give more distortion since the bit-
2

rate/distortion ratio is dependent on variance. Also, the nonlinear quantization can yield worse results
when the signal is in the area where the quantization steps are larger than the linear ones (i.e. >4096).
For most music signals however, one would be likely to get an improvement with DPCM over LPCM
and for speech signals even more so.

89
4.3 Analysis of IMA ADPCM
The algorithm written to test ADPCM was made compliant with the IMA ADPCM
standard. The reader is referred to the IMA ADPCM theory chapter and the source-
code for a more detailed insight in how it is made. It was tested with the same file as
the DPCM algorithm for a subjective (listening test) and an objective (Matlab)
evaluation of audio quality. The result was a massive improvement over normal
DPCM. There still is some audible distortion on loud or dynamic passages, but
nothing compared to DPCM. Subjectively, IMA ADPCM does provide fairly high-
fidelity music, the quality is resonable for background music or casual use, but still
not sufficient for critical listening over a high-performance hifi-system. Again, the
noise ”bursts” are clearly above the masking threshold, although nowhere near
DPCM, while the average background noise is very low, almost inaudible.

The analysis done with Matlab is given in the figure and table below.

Figure 58 IMA ADPCM performance measurement, ”Littlewing.wav”.

90
The errors are now much smaller, between –0.05 and 0.05 in amplitude and with a
very low nominal noise level. We can also see that it clearly follows the signal level
and thus also the hearing threshold. The calculated values are given in table 14.

Table 14 Performance 4-bit ADPCM, ”littlewing.wav


SER 32.5dB
Maximum absolute error 1.23
Complexity estimation Approx 250 inst./sample

As we can see, the SER har increased dramatically. 32.5dB is still not true hi-fi
performance, even when the noise is phychoacoustically shaped, but compared to the
8.5dB achieved with the DPCM algorithm the improvement is very significant indeed.
As we can see, it is still ”bursts” of distortion at dynamic passages which dominates.
With less dynamic music the subjective results as expected were better. The huge
maximum absolute error in table 14 is not as worrying as it seems, it’s just a result of
the very first prediction being way off, since the index-variable and the previous
sample variable must be given start values before the first run (see source-code).

The penalty of using ADPCM is increased complexity, it is about 2.5 times slower
than the basic DPCM algorithm. With effective programming however, real-time
IMA ADPCM should be possible to implement on a resonably powerful MCU.

4.4 Analysis of µ-law


Both algorihms above produce a 4:1 compression ratio while, in the application
intended, 2:1 compression is sufficient. However, doing for instance 8-bit DPCM with
the method described above would require a 8-level search-tree (ending in 255
nodes), which would make it very ineffective (approximately a 100% increase in
computaton time). It would be as slow as, and probably not better than, IMA
ADPCM. An 8-bit translation of the ADPCM algorithm would also be difficult or
impossible to implement on a MCU. The stepsize-table would be very large and
probably not fit in the limited memory available in such a system.

The µ-law algorithm is an algorithm made for 2:1 (16-bit to 8-bit) compression and
frequently used in digital telephony (it’s also used in DAT-recorders with longplay-
function). It is adaptive since the quantization depends on the input level and provides
a significant improvement in dynamic range over 8-bit LPCM. The reader is referred
to the theory section for details. The algorithm is standardized, fast and easy to
implement. A µ-law codec was developed in C and run on the Powerbook using the
same test setup as for DPCM and ADPCM.

91
Figure 59 µ-law performance measurement, ”Littlewing.wav”.

The 8-bit µ-law algorithm is clearly better than the 4-bit ADPCM, which was as
expected. In numbers, the performance is as shown in table 15.

Table 15 Performance 8-bit µ-law, ”littlewing.wav” and ”speedtest.wav”


SER 42.6dB
Maximum absolute error 0.022
Complexity estimation Approx. 200 inst./sample

During programming and testing, it became evident that µ-law is actually faster than
ADPCM and provides higher audio quality, though at twice the output bitrate. Since
the bitrate still is within the requirements, µ-law is definetely a viable alternative that
provides decent fidelity music quality and is fast and well-tested.

Another advantage with µ-law that became evident when listening is that the errors
are better spread throughout the signal range. The noise is, as it should be, highest for
loud signal levels, but the ”bursts” found in DPCM and ADPCM are not nearly as
present in µ-law. The error follows the signal level, and thus the masking threshold, in

92
much better way. Subjective listening tests confirm and reinforce the superiority µ-
law has over 8-bit LPCM and also IMA ADPCM. If the wireless loudspeaker system
is to use lossy compression with a standard algorithm, µ-law is regarded the most
suitable of the ones tested.

4.5 Reference for comparison II: MP3


Although MP3 is impossible to run on a small MCU-system and therefore is
irrelevant when it comes to implementation, it serves well as a performance reference.
MP3 is well-known, there is a general subjective opinion of it’s quality, an analysis of
MP3 will help to put the numbers achieved by the above compression methods into
perspective.

The ”littlewing.wav” audio-file was compressed and decompressed using what is


considered the best MP3-codec, LAME. The performance was measured using the
same error-calculating Matlab-script as for the other algoritms. Speed measurements
were not taken, since the MP3-application utilizes special hardware within the
Powerbook (like the G4 Velocity Engine and more) and a comparison therefore would
not be representative18.

Measurements were made on 128kbps, 192kbps and 256kbps MP3. 128kbps is


generally considered to be of good hifi-quality. It is often reffered to as ”CD-quality”
or ”near CD-quality” in the litterature, but blind-tests have shown that MP3 is not
quite transparent at this bitrate. 192kbps is considered to be of very high quality, in
most cases transparent, but with slight audible loss on some material when played
back over high-end stereo systems and under near-optimal listening conditions.
256kbps is gererally accepted to be completely transparent, as blind tests have not
consistently proved audible differences. However, the most discriminate audiophiles
also claim this this bitrate is inferior to CD, pushing the envelope for even more
sophisticated algorithms like AAC. The measurements are summarized in table 16,
128kbps and 256kbps is shown in figures 60 and 61.

Table 16 Measured performance, LAME MP3, ”littlewing.wav”


Bitrate SER Max error
128kbps 49.0dB 0.011
192kbps 60.4dB 0.0027
256kbps 67.1dB 0.0011

18
Velocity Engine is a special instruction set within the G4, used to increase multimedia performance.
It also has a dedicated maths co-processor and other special hardware which is not utilized by the
compression routines written for this thesis. The encoding of a 10min wav-file takes less than 20s with
either MP3 or AAC on the 1Ghz Powerbook, almost as fast as the simple DPCM codec. To write
dedicated compression programs that utilizes the Mac hardware is beyond the scope of, and not the
focus of, this thesis.

93
Figure 60 Measured performance, 128kbps MP3, ”littlewing.wav”

94
Figure 61 Measured performance, 256kbps MP3, ”littlewing.wav”.

We can see that MP3 is better than any of the above methods, even at 128kbps bitrate
(12:1 compression ratio). This proves that much can be gained using advanced
algorithms. Unfortunately, dedicated hardware or powerful processors are needed for
real-time implementation. If low compression ratios (2:1 to 4:1) are sufficcient, even
simple algorithms can give quite good results. However, for ratios below 2:1,
dynamic quantization does not seem to be a good alternative due to the ”bursts” of
distortion and the fast-rising complexity of the quantizer due to the number of output
levels rising exponentially with the number of bits.

95
4.6 iLaw: a low-complexity, low-loss algorithm.
For this part of the project, a low-loss compression algorithm was designed especially
to meet the requirements of the wireless loudspeaker system, and to be an alternative
to lossless compression if implementing the latter proved unfeasible. The demands are
as basic as they are fundamental:

- < 1Mbps bitrate (some headroom should be available for other


information).
- Very low computational complexity
- High-fidelity audio quality.

Since the DPCM quantizer and ADPCM tables quickly increase in size and
complexity with the number of bits in the compressed stream, they were discarded
from further development. Rather, the coding is based on µ-law coding, whose
complexity is in principal independent of the number of output bits. The minimum
compression is given by:

bps 1!10 6
Eq. 35 WLO = = = 11.3
2 ! fS 2 ! 44,100

Where WLO is the maximum output word length. Since some headroom is desired, a
10-bit version of the µ-law encoding scheme was developed. This would allow for a
15-bit dynamic range using a 3-bit exponent and a 6-bit mantissa as described in the
µ-law theory section. Thus, the 10-bit output word will be on the form:

Figure 62 10-bit µ-law data format

Since the exponent can hold a zero count up to 8, the sign bit holds the MSB and the
mantissa 6 LSBs, the dynamic range is 15-bits. It’s just an expansion of the standard
8-bit µ-law coding, which has a 4-bit mantissa and thus 2-bits lower performance.

In addition, to minimize the number of high values (with correspondingly high


quantization errors), second order prediction is performed. This can be done at very
litte computational cost, since e2[n] = e1[n]-e1[n-1], where e1[n] = x[n]-x[n-1], as
shown in the theory section where prediction is discussed. To avoid accumulation of
errors, the output value fed back has to be decoded from the compressed data. The
complete predictor and encoder was made like shown in figure 63.

96
Figure 63 Flowchart, iLaw encoder designed for this thesis.

In the case of second order prediction, the filter is on the form H(z) = 2z-1-z-2,
however, no multiplications are used since the residuals are calculated as shown
earlier. The differences are also rounded to 16 bits while they may actually be 17.
Since the 10-bit µ-law throws away the LSB anyway (it has 15-bit dynamic range; the
signbit, 8 zeros and the 6-bits mantissa is what it at most can hold) this will not lead to
any further degradation of the signal quality and the encoder’s complexity is reduced.

The decompression is very simple and easy to implement, it will be the same filter
and the same decoder as used in the compression.

Figure 64 Flowchart, iLaw decoder designed for this project.

This special iLaw codec was written in C and compiled for Mac OS-X to enable a
performance evaluation. The results for the same tests as the others are shown in table
17.

97
Table 17 Performance iLaw codec, ”littlewing.wav”
SER 49.5dB
Maximum absolute error 0.0055
Complexity estimation Approx. 250 inst./sample

As can be seen, the results are significantly better than for the traditional µ-law codec.
Actually, they exceed the measured numbers achieved with 128kbps MP3, and
subjective listening tests also show very little degradation of signal quality.

Figure 65 Measured performance, custom codec, "littlewing.wav".

This codec provides high-fidelity performance and should also be quite easy to
implement in an MCU. It is thus a viable alternative to lossless coding.

98
4.7 Notes about the performance measurements
Although measurements for only one reference file are shown in this report, the
codecs were tested on several music tracks to ensure that the results were
representative for the algoritms and not caused by special circumstanses. The
”littlewing.wav” file has both a quite large dynamic range and a wide spectrum, so it
won’t mask any bad performance. The check done with other files confirmed this.

The signal-to-error ratio is a standard method to evaluate compression performance.


However, even though it gives an accurate representation of the error magnitude, it
does not take into consideration the more advanced properties of the hearing.
Although the error levels can be quite high, one must remember that they consistently
follow the masking threshold and because of this may still not be very audible. How
audible depends on the quality of the encoder. No good measurement methods have
been developed that include distortion or error perceptability, consequently evaluation
was also done through subjective listening tests. Actually, they corresponded quite
good with the measurements, 128kbps MP3 and the iLaw codec were evaluated to be
of about the same quality and offered very good performance. On standard 8-bit µ-
law the distortion was clearly audible, while it with the 4-bit codecs was directly
annoying.

Estimations of complexity were done by compiling a single compression run with the
SDCC MCU-compiler and count the instructions in the resulting assembly-file. It
should be noted that this was a very rough estimation since no data retreival or
sending operations were included, the variables were just given certain values. Also,
the code was not significantly optimized for the MCU. But although these estimations
are not very precise, they give an indication on how demanding the different
algorithms are. To do a full implementation of every codec would be too much work
and at this stage rather pointless, since the estimations were just meant to indicate
whether or not the different encoders are at all feasible to implement in an MCU. And
since there are 512 instructions available per sample, they are.

99
5 Design of Lossless Compression Algorithm
The goal for the WLS is to use lossless compression to restrict the datarate to within
the 1Mbps capability of the RF-transceiver while maintaing full audio quality. In
addition, since the algorithm must be able to run on real-time using only a 8-bit MCU,
it has to be very fast. Different solutions were tested by the means of writing
programs in C doing the necessary functions and then evaluate them by compiling for
OS-X and run them on wave-files.

As explained earlier, lossless compression algorithms necessarily produce variable


length output words. This since it continuously adapts to the ”compressability” of the
input signal (in other words, continuously eliminating redundancy). For shorter
periods one can actually experience a negative compression ratio, which complicates
real-time use. Buffering must be implemented, and if the buffer runs empty, one has
to enter some kind of lossy-mode until it fills up again. This will only happen for very
short time-periods and will probably not be audible. However, it is advantageous if
the lossless algorithm does not produce excessive word length even in a worst-case
scenario.

Due to the signal-dependent performance of lossless compression, a range of wav-


files were used to characterize the algorithms with a resonable accuracy. 6 files were
picked to be the basis of the evaluations done in this part of the thesis. These music
pieces are of a very varied nature and should combined give a good estimate of real-
life perfomance. The files are listed in table 18, and their waveforms and spectra
shown in figures 66 and 67. When results are given, references are made to the wav-
file(s) for which they were found.
Table 18 Wav-files used for characterization of lossless algorithms
Filename Contents Characteristics
”Littlewing.wav” Recording of myself playing the Quite dynamic, some reverb,
intro to Jimi Hendrix’ ”Little solo instrument only.
Wing” on guitar, 38 seconds.
”Percussion.wav” Ed Thigpen - ”Could Break”, Just percussion, much high
60sec excerpt. frequency content due to cymbal
and hi-hat use.
”Rock.wav” Stevie Ray Vaughan – Rock/blues quartet. Fast, loud
”Couldn’t Stand The Weather”, and energetic.
50sec excerpt.
”Classical.wav” Berlin Philharmonical Symphony orchestera. Quiet in
Orchestera – ”Eine Kleine places.
Nachtmusik” – allegro. W.A.
Mozart, 60sec exc.
”Jazz.wav” John Coltrane – ”Blues to You”, Instrumental, medium dynamics
60sec excerpt. and loudness.
”Pop.wav” Robbie Williams ft. Kylie Typical modern pop recording,
Minouge– ”Kids”, 60sec very loud all the time, highly
excerpt. compressed

100
Figure 66 Waveform of, from top to bottom, "littlewing.wav", "percussion.wav",
"rock.wav", "classical.wav", "jazz.wav" and "pop.wav", Audacity

The waveform and FFT give a good indication of compressability. The louder the
waveform, the higher the entropy. The effectiveness of prediction (how much the
entropy is reduced) is as explained in the theory chapter dependent on the high-
frequency content. Since the entropy is related to the signal power and the entropy
reduction possibility to the HF-content, the ”compressability” could to some degree
be quantified using the mean signal power level and the spectral centroid (the spectral
”center of gravity”) as well as by looking at waveforms and FFTs. The Matlab-files in
appendix 7 includes calculation of both signal power (the SER calculator) and spectral
centroid for the interested reader to explore.

For simplicity all files are, as can be seen, mixed down to mono. The gain of using
channel decorrelation was tested seperatly. To reduce the workload when testing other
parameters like prediction and coding schemes, only mono codecs were used during
this phase of the developement.

101
Figure 67 Spectrum of the "littlewing.wav", "percussion.wav", "rock.wav",
"classical.wav", "jazz.wav" and "pop.wav”, Audacity

As we can see the files characteristics are very different, some have much high-
frequency content and others less, while some are definetely much louder than the
others. Combined, these files should give a good indication of how well the tested
algorithms will perform.

It should be noted that when the ”pop.wav” file in table 18 is decribed as ”highly
compressed” it is not with reference to any data compression, but to amplitude
compression. What is done is that the volume of all the tracks in the recording is
truncuated and amplified to full level using an amplitude compressor. This technique
is very commonly used in pop recordings to maximise perceptability over low-fidelity
playback systems like radios, car-stereos and TVs. Popular music is sold through
mass media and it is important that the music is ”catchy”, i.e. easy to remember even
when listening to it casually or with low-quality sound. When everything is loud, it’s
easy to perceive. Audiophiles will of course argue that this makes the music ”flat” and
lifeless, but they are not the target audience anyway. However, this poses a problem
when it comes to data compression as well. Since the signal amplitude is very high at
all times, the entropy is also high and the music is difficult to compress. Lossless
compression can because of this be expected to have a lower performance level on
such recordings and they are therefore often used as worst-case scenario benchmarks.

102
5.1 Coding method
One of the most crucial steps in a lossless compression algorithm is the entropy
coding. It should be fast, memory efficient and at the same time eliminate almost all
redundancy.

Huffman-coding and adaptive huffman-coding were discarded during theoretical


evaluation due to the difficulty of necessary estimations for the former, and the
computational complexity of the latter. Also, studies showed that very few excisting
programs use Huffman-coding, the approach used in almost all available software is
Rice-coding.

Rice-coding has the advantage of being very fast, easy to implement and there is no
need to store tables. The clear disadvantage of Rice-coding, as shown in the theory
chapter, is the huge codelengths produced when there is significant overflow (that is,
when the real sample value is significantly larger than the k-bit estimated value that is
sent uncoded). Thus the estimation of the factor k is very critical. A fast method to
calculate k has been shown, but this calculation is still the most demanding bit
computationally. One can trade off effectiveness for speed by using the same k for a
larger number of samples, but then some very long codes will be produced. As
mentioned, this is much more critical in a real-time system than in a computer
application.

Another alternative is the Pod-code. Here, the overflow is also sent uncoded. Ahead
of it comes a number of zeros that indicate how many bits the overflow is.
Consequently, the codelength has been reduced from (overflow+1+k) for Rice-coding
to (2"log2(overflow)+k) while the prefix-property has been preserved.

5.1.1 Evaluation of Pod-coding and Rice-coding


A Rice-codec and a Pod-codec was written in C and run on wavefiles to examine
effectiveness. Both are included in appendix 6. The most interesting thing was to see
how the algorithms behaved for different calculations of k. So the reader does not
have to look it up, the calculation of k is repeated:

{
Eq. 36 k = min k' 2 k' N ! A}
Where A is the accumulated sum of previous residual magnitudes and N is a count of
residuals. Programmed in C, this translates to:

for( k=0; (N <<k)<A; k++);

The two critical factors in this calculation is how often it is done and also how often A
and N should be reset. Ideally one should calculate a new k for every sample.
However, this will slow down the codec, since this calculation is the most complex in

103
the algorithm. If k is calculated too rarely, the effectiveness will be reduced, the
question is by how much. Also, one has to reset N and A with some interval, so they
don’t use up too much memory. However, to do it too often will decrease the
performance since less previous samples are averaged.

If A is to be limited to 3 bytes, it has to be reset at least every 256th sample, then N is


kept within 1 byte. If A is limited to 4 bytes, it has to be reset at least every 65536th
sample and N will use 2 bytes. Remember than in an 8-bit microcontroller, the time
used to increment A and N will increase significantly if their length is large, a 32-bit
addition is much more time-consuming than a 16-bit, and the incrementation must be
performed for each sample passed. As a consequense an upper limit to the reset cycle
was set at every 256th sample19.

It should be noted that using the same k for several samples will give most
performance decrease for signals with much high frequency content. This is obvious
because the sample values then will vary more within such a frame, and a larger
number of samples are likely to produce much overflow. Prediction is earlier shown
to be equivalent to a high-pass filtering of the signal, and the effect of a non-ideal k
should be different when prediction is performed. To see if this has a big effect on the
performance of the two coding methods, they were tested both without prediction as
well as with first-order prediction.

The results with are shown in following tables, without prediction and with with first
order prediction.

Table 19 Performance of Rice- and Pod-coding, A and N reset every 256th sample,
no prediction, ”littlewing.wav”
Calculation Rice - filesize Rice - max Pod - filesize Pod – max
frequency, k reduction wordlength reduction wordlength
Every sample 25.8% 29 bits 25.3% 20 bits
Every 4th 25.4% 48 bits 25.2% 22 bits
Every 8th 25.3% 48 bits 25.1% 22 bits
Every 16th 25.3% 48 bits 25.0% 22 bits
Every 32nd 25.1% 52 bits 24.9% 22 bits
Every 64th 25.0% 64 bits 24.8% 22 bits
Fixed k = 6 -28.0% 371 bits 14.4% 26 bits

19
Experiment were done with A incrementing over a larger number of samples between each reset, but
the gain in compression ratio was not significant and therefore considered not to be worthwile
exploring further due to the given limitations in processing power.

104
Table 20 Performance of Rice- and Pod-coding, A and N reset every 256th sample,
1st order prediction, ”littlewing.wav”
Calculation Rice - filesize Rice - max Pod - filesize Pod – max
frequency, k reduction wordlength reduction wordlength
Every sample 42.0% 30 bits 41.8% 20 bits
Every 4th 42.0% 44 bits 41.7% 20 bits
Every 8th 41.9% 52 bits 41.7% 20 bits
Every 16th 41.9% 62 bits 41.6% 20 bits
Every 32nd 41.8% 85 bits 41.6% 20 bits
Every 64th 41.8% 116 bits 41.6% 21 bits
Fixed k = 6 31.8% 172 bits 37.5% 24 bits

As we can see, the Rice codec performs significantly better when k is calculated for
every singe sample. This is not unexpected as table 4 in the theory chapter shows that
Rice is the more effective code for very low overflow values. It was a bit surprising to
see that the Rice encoder help up even when a new k was calculated only every 32nd
or 64th sample. However, the difference evens out, and with a fixed k Rice coding
performs very badly. It was also a bit surprising to see that with first order prediction
too, the Rice codec held up very well even with the same k over frames of 64 samples.
This, along with the big gain when doing the prediction, indicates that there is quite
little high-frequency content in the signal. To validate this assumption, as well as the
one stating a decrease in performance for the combination big frames and much HF-
energy, tests were done on ”percussion.wav”; a recording of percussion instruments
with much high-frequency energy. The simulations were done for the two extremes, a
new k for each sample, and a new k for every 64th sample.

Table 21 Performance of Pod- and Rice-coding with HF-rich file, no prediction,


"percussion.wav".
Coding Cal.frequency of k Filesize reduction Max wordlength
Rice Every sample 31.7% 2397 bits
Rice Every 64th sample 31.4% 2397 bits
Pod Every sample 31.5% 26 bits
Pod Every 64th sample 31.4% 26 bits

Table 22 Performance of Pod and Rice coding with HF-rich file, 1st order
prediction, "percussion.wav".
Coding Cal.frequency of k Filesize reduction Max wordlength
Rice Every sample 36.3% 1105 bits
Rice Every 64th sample 36.0% 2917 bits
Pod Every sample 36.2% 23 bits
Pod Every 64th sample 36.1% 26 bits

Again the Rice-codec performs surprisingly well even when the same k is held over
64 samples. But the gap to the Pod-codec has closed, which shows that k is not as
accurately estimated when there is much high frequency energy. One should also note
that the process of prediction has much less effect on the percussion track. This is

105
obvious since high frequencies means big differences between adjacent samples. That
the compression ratio is as good as it is, is probably due to the fact that parts of this
track are quite silent and in these periods the datarate produced by the encoders is
quite low.

There is no doubt however, that the worst-case performance of the Pod-encoder is


much better than for the Rice-encoder. As the results show, the maximum wordlength
for the Rice-coding increases dramatically even when the parameter calculation
frequency is reduced to every fourth sample. In a computer compression program this
is not a problem, in a real time system, the huge variance in wordlengths can represent
a very big problem. When there is much high-frequency content, the wordlengths can
reach thousands of bits. It should be noted that the identical results in the first test of
the percussion track is probably due to the biggest miss being at the very first sample,
where k is set to the initial value 6. The track starts with a very loud cymbal smash
beginning at it’s very first sample.

The average performance of the Rice and Pod encoder in all the tests listed above is
shown in figure 68. The cases with a fixed k are excluded from this average, since that
is something than would not be considered in any final algorithm and thus has little
relevance when it comes to evaluating the practical results.

Figure 68 Encoding performance and worst-case word length, all tests averaged

The conclusion after examining and comparing Pod-coding and Rice-coding is that
the gain of using Pod-coding is most significant in real-time systems, where the
excessive wordlengths produced in some cases by Rice-coding can cause serious
problems and would demand a big buffer not to interfere with the data throughput. In
computer compression applications, where real-time operation is not needed, the Pod-
coding is unlikely to give any performance improvement. As figure 68 shows, the
performance is in the Rice-codings favour, although only by 0.2%. Rice-coding is
also the preferred method in almost all commercial lossless audio codecs. But
computer programs is not the target application for this thesis. The codec is to be used

106
in a low-power, low-memory real-time systems, and due to the enormous difference
in worst-case behaviour Pod-coding is clearly considered the better alternative of the
two.

5.2 iPod: an attempt at improving the Pod-coding


When developing a lossless encoding scheme, or anything else for that matter, you
always try to find ways to improve on the algorithms existing today. One way of
improving the Pod entropy coding that hasn’t been shown before is here suggested
together with test results. The coding is called iPod for improved-Pod20. The idea is to
put the sign-bit into the coded prefix/overflow but still preserve the crucial prefix-
property. The scheme, together with the gain in output sample wordlength is shown in
table 23.

Table 23 Regular Pod-coding vs. iPod-coding


Overflow Pod-code iPod-code iPod code Change Gain in Net
(res > 0) (res < 0) in code bits due gain
number to in bits
of bits removal
of sign
bit
0 1 01 10 -1 1 0
1 01 0010 1101 -2 1 -1
2 0010 0011 1100 0 1 1
3 0011 000100 111011 -2 1 -1
4 000100 000101 111010 0 1 1
5 000101 000110 111001 0 1 1
6 000110 000111 111000 0 1 1
7 000111 00001000 11110111 -2 1 -1
8 00001000 00001001 11110110 0 1 1
9 00001001 00001010 11110101 0 1 1
10 00001010 00001011 11110100 0 1 1
11 00001011 00001100 11110011 0 1 1
12 00001100 00001101 11110010 0 1 1
13 00001101 00001110 11110001 0 1 1
14 00001110 00001111 11110000 0 1 1
15 00001111 0000010000 1111101111 -2 1 -1
16 0000010000 0000010001 1111101110 0 1 1

As the table shows, the encoded part is just inverted if the value is less that zero, then
the sign-bit can be discarded. To retain the prefix-property (the code always has to
start with a zero when positive and a one when negative), the code had to be ”shifted
up” one number. It’s then obvious that this scheme would not give any benefit if used
on Rice-coding, since the loss of shifting up is always one bit and the net gain would

20
The name iPod is a registered trademark of Apple Computer Corp. and if the suggested scheme is to
be used in any commercial application, the name should be changed.

107
always be zero, but when used on Pod-coding it gives a one-bit net benefit for most
overflow values and a one bit loss for a few.The only process operation that has to be
done extra is to invert the n-bit overflow after n ones if the number is below zero. If
the overflow is frequently large (>3 bits) this scheme should lead to an improvement,
if it is not, it can actually give a net loss.

A iPod-coder was written in C and the results compared to a traditional Pod-coder. No


prediction was used in this comparison shown in table 24. A new k is calculated on
the fly for each sample.

Table 24 Pod-coding vs. iPod coding, filesize reduction (no prediction)


File Filesize reduction Filesize reduction
regular Pod-coding iPod-coding
”Littlewing.wav” 25.3% 24.0%
”Percussion.wav” 31.2% 30.0%
”Rock.wav” 11.7% 10.4%
”Classical.wav” 23.3% 22.0%
”Jazz.wav” 12.0% 10.7%
”Pop.wav” 1.6% 2.9%

The proposed scheme actually gives a decrease in performance for all files except
”pop.wav”. The loss is also bigger than when calculating k more rarely, which can be
seen by comparing the results for ”littlewing.wav” to table 19.

A study of the overflow shows that the calculation of k is very effective, the overflow
is 0 or 1 for most of the samples, which also explains why Rice-coding gave better
compression than Pod-coding. The value 1 gives a 1-bit loss with iPod encoding
compared to ordinary Pod and, as figure 69 shows, it appears much more often than
all values for which iPod gives a net gain put together. Note the logarithmic y-axis in
the figure, the overflow is 0 or 1 for more than 90% of the samples. Because of the
results found, the proposed scheme was discarded.

108
Figure 69 Distribution of overflow, "littlewing.wav"

109
5.3 Prediction scheme
For intra-channel decorrelation different prediction schemes were considered. It is
important that the predictors are simple, but still efficient. Adaptive predictors, very
high order linear predictors or polynomial approximations with many polynoms were
considered unfeasible due to the hardware contraints, and the options were narrowed
down to a few low-complexity alternatives:

1. First order linear prediction, where the residual is the difference between
two adjacent samples.
2. Second order linear prediction, where the residual is the difference
between two adjacent differences from 1.
3. A simple two-alternative polynomal approximation:
a. One polynomal being xˆ 0 [n] = 0 (no prediction) and the second being
xˆ1[n] = x[n !1] (first-order, as in 1).
b. Or with one polynomal being xˆ1[n] = x[n !1] (same as in 1) and the
other xˆ 2 = 2x[n !1] ! x[n ! 2] (same as in 2). Also, two ways of
handling them can be used:
i. Sample-to-sample adaptivity, where the smallest of the two
residuals are encoded for each sample, and a dedicated bit tells
the decoder if it is residual 1 or 2.
ii. Frame-to-frame adaptivity, where the residuals are incremented
over a given frame and the smallest one is chosen for this
particular frame. This saves the dedicated bit for each sample,
but the frame needs a small header (for instance a ”10” / ”11”
after the sign bit where the fist ’1’ indicates the start of a frame
and the second one which residual is encoded).

Alternative 3-a does not demand any extra calculations over 1, since the first order
prediction is the only one being done in both cases (this relation is the same for 3-b vs
2). The extra work is to either find the smallest of the residuals one time per sample
(i) or to accumulate and compare an entire frame (ii). The first of the two is probably
faster, but there is an extra overhead of 1 bit per sample needed to tell the decoder
which residual is used. However this can be made up for by the fact that the smallest
residual is chosen each time and the gain from this might on average be more than 1
bit. The second alternative will produce less overhead, but will not choose the right
sample each time, so experiments must be done to find the best one. Statistically, it’s
obvious that the third alternative will not work very well if one residual is smaller
than the other almost every single time. For the case 3-b-ii, one can just as well
choose from 3 alternatives (zero, 1st or 2nd order) since one extra bit per frame (to
indicate which of the three is chosen) will produce minimal overhead.

For testing alternatives 1 and 2, a codec with a selectable predictor was written. Since
Pod-coding has been shown to be the preferred coding scheme, only this was used
during testing of the other parmeters. The performance is summarized in the
following pages.

110
Table 25 Filesize reduction, no pred., 1st order and 2nd order linear pred.
No prediction 1st order pred. 2nd order pred.
”Littlewing.wav” 25.3% 41.8% 48.1%
”Percussion.wav” 31.2% 35.8% 32.9%
”Rock.wav” 11.7% 27.1% 31.7%
”Classical.wav” 23.3% 40.4% 47.3%
”Jazz.wav” 12.0% 33.5% 38.7%
”Pop.wav” 1.6% 17.2% 18.5%

As the results show, there is a clear correlation between the mean amplitude of the
signal and the compression level achieved. Also, the effect of increasing predictor
order is strongly dependent on the spectrum of the signal. This is expected and in
harmony with theoretical assumptions. For the wireless loudpeaker system, the
requirement is about 30% (from 1.4Mbps to <1Mbps), the results show that this is
within reach for most inputs even with quite simple predictors, but that there at least
for some music will have to be quite frequent usage of a lossy compression mode as
well.

To see if polynomal approximation performed better than fixed predictors, the


alternatives sketched above were tested. First, the sample-wise approximation where
the best of two alternative polynomals are chosen for each sample. The encoded data
was then output from the decoder as shown in the figure below.

Figure 70 Bit-wise polynomal approximation encoder data structure

Where Pi is the Prediction Indicator, which tells the decoder what prediction is used,
S is the Sign bit and the rest is normal Pod-encoded data. A codec was written where
the user can select between alternative 3a and 3b above. The results are presented in
table 26.

Table 26 Filesize reduction, sample-wise polynomal approximation.


0 and 1st order polynom 1st and 2nd order
selection polynom selection
”Littlewing.wav” 42.7% 50.8%
”Percussion.wav” 38.7% 41.3%
”Rock.wav” 27.9% 35.0%
”Classical.wav” 41.1% 50.0%
”Jazz.wav” 34.0% 42.1%
”Pop.wav” 19.8% 23.6%

As we can see, the polynomal approximation gives a noticable improvement in


performance, even though an extra bit is sent with each sample. The improvement is

111
however not as great as when moving up one order in prediction (i.e. 0th and 1st order
polynomal approximation does not perform as well as 2nd order linear prediction),
which suggests a fixed predictor will give a better performance/complexity ratio. The
polynomal approximation however has the advantage that to a large extent the
excessive wordlengths are avioded since the biggest overflows will be eliminated in
the polynom selection process.

The extra bit sent with every sample is the major drawback to this approach. If one
chooses between polynomals only every n’th sample, the overhead will reduce by a
factor of n. However, it is not longer certain that the best polynomal is chosen for
each sample. One has to choose the one giving the smallest total magnitude for an n-
sample frame. A major factor of course is how big this frame should be. Obviously it
should be several samples, to minimize the overhead, but at the same time, the larger
the n is, the more ”wrong” selections are made within each frame. Since the codec
should not operate with several frame lengths, it is logical to do this selection at the
same time as the code-parameter k is calculated. Then the same variables can be used
for accumulation and counting too. Since two bits per frame is a insignificant
overhead, we can in this scenario choose between the 0th, 1st and 2nd order residual
and use to two-bit frame header to tell the decoder which is chosen. The results for
two different frame lengths are given in table 27.

Table 27 Performance, framewise polynomal approximation, 0th, 1st and 2nd order
polynom selection
16 sample frame 256 sample frame
”Littlewing.wav” 48.2% 47.8%
”Percussion.wav” 39.5% 36.0%
”Rock.wav” 30.9% 31.5%
”Classical.wav” 50.5% 47.2%
”Jazz.wav” 40.9% 38.6%
”Pop.wav” 21.5% 19.0%

As we can see, performance increased very little. The cause of this can be a
combination of two things. The first is that the long framelenght gives a more seldom
calculation of k and also that more wrong polynom selections are done within each
frame. This is degrading performance somewhat, with three polynoms to choose from,
the algorithm should otherwise have performed better than the 1st or 2nd order
samplewise approximation. However, the differences are small and the gain compared
to a fixed second order approximation is also very limited. The reason for the small
performance improvement then probably lies in the fact that the same polynom is
chosen almost all the time. To examine this, variables were included in the code
which counted the number of times each polynom was used. The result is given in
figure 71.

112
Figure 71 Polynomal selection, framewise polynomal appr., 255 sample frames, Excel

As we can see, the 2nd order residual is chosen most of the time. The exception is the
very HF-heavy ”percussion.wav”, where the distribution is very different. This is also
supported by the fact that polynomal approximation clearly gives most improvement
over fixed prediction with just that file. One can also see that in some files, the
sample value is actually chosen more often than the 1st order residual. This is most
notable for ”littlewing.wav” and ”jazz.wav” and probably due to the fact that these
pieces have a couple of second of silence at the start for which the sample values are
chosen.

The conclusion is that in a real-time, processor-weak application like the wireless


loudspeaker system, fixed prediction is preferable. The gain is small and a significant
number of instructions are used for accumulating all the residuals and choosing
between them. Any extra processing power could, if available, be spent implementing
a higher-order fixed predictor. However, one should note that the gain by increasing
predictor order decreases rapidly as earlier shown in figure 10. To see if any extra
processing power would be better spent on polynomal approximation or a higher
order predictor, third and fourth order prediction was also tested to see what
performance improvement this would give.

113
Table 28 Third and fourth order fixed predictor, new k for every sample
Third order Fourth order
”Littlewing.wav” 51.1% 52.0%
”Percussion.wav” 28.6% 24.7%
”Rock.wav” 34.6% 36.5%
”Classical.wav” 48.1% 49.7%
”Jazz.wav” 38.8% 36.3%
”Pop.wav” 19.2% 19.8%

As we can clearly see, the gain is decreasing rapidly, even being negative in some
cases. It’s obvious that a brute force method with very high order fixed predictors will
give a low performance/complexity ratio. This is probably also the reason that many
of the best available applications uses some sort of polynomal approximation. The
latter is garuanteed to have better performance when more polynomals are used.
Shorten [reference 6], one of the most successful lossless compression programs
available today, uses a four-polynomal approximation. However, this is beyond the
capability of the wireless loudspeaker system’s performance. Figure 72 shows a
comparison of the average performance for all test files and all prediction schemes.

Figure 72 Performance, different tested prediction schemes

114
Table 29 Computational cost per sample for the different prediction schemes
No prediction 0
1st order fixed 1 16-bit subtraction
2nd order fixed 1 24-bit assertion
1 24-bit subtraction
1 quantization (24-16 bit)
1 16-bit subtraction
3rd order fixed 2 24-bit assertions
2 24-bit subtractions
1 quantization (24-16 bit)
1 16-bit subtraction
4th order fixed 3 24-bit assertions
3 24-bit subtractions
1 quantization (24-16 bit)
1 16-bit subtraction
Samplewise pol.appr., 0th and 1st. 1 16-bit subtraction
1 16-bit compare
1 16-bit assertion
1 8-bit assertion
Samplewise pol.appr., 1st and 2nd 1 24-bit subtraction
1 24-bit assertion
1 quantization (24-16 bits)
1 16-bit subtraction
1 16-bit compare
1 16-bit assertion
1 8-bit assertion
Framewise pol. appr., 0th, 1st and 2nd 1 24-bit subtraction
1 24-bit assertion
1 quantization (24-16 bits)
1 16-bit subtraction
2 24-bit accumulations
(3 24-bit compares, 1 16-bit assertion and 3 24-bit
clear at the start of each frame)

As can be seen, there is a big leap in performance from no prediction to 1st order,
then there is a significant jump to 2nd order. Increasing the fixed predictor order
further has little effect. The polynomal approximations are a few percent better than
the highest order prediction they consist of, but the 0th and 1st order selection,
probably the most likely to be achieved on the WLS MCU, is not as good as a fixed
second order prediction. Generally, moving to polynomal approximation increases the
number of operations per sample more than moving up an order or two in the
predictor. A fixed predictor also gives a constant processor load, which is much easier
to handle when the operation is real-time. But if resources are available, polynomal
approximation should definetely be considered, as it seems to give the best
compression ratio.

5.4 Channel decorrelation


When a suitable coding and prediction scheme had been found, the next step was to
expand the compression to handle stereo. The test codec was written so that channel
decorrelation could be selected when running it, to make it easy to compare the
filesize reduction both with and without it enabled. Five test files were chosen to

115
measure performance, normal modern live and studio recordings, a live classical
recording with a big and wide soundstage and also a 60’s recording where different
instruments are located in each of the two channels21. The files are described in table
30.

Table 30 Recordings used to test stereo decorrelation


”Modernstereo.wav” The Cardigans – Electric modern pop recording.
”Erase/Rewind” from the album Few instruments, bass and vocal
”Gran Turismo” (1998) in center, guitar and synth
panned to the left and right
”Modernstereo2.wav” R.E.M. – ”Find the River” from Acoustic modern pop recording.
the album ”Automatic For The Bass and vocal in center, guitars
People” (1992) in both channels. Bigger
soundstage than
”modernstereo.wav”
”Modernlive.wav” Lenny Kravitz – ”Always On Modern live recording. Band
The Run” from the album playing live in a small arena.
”MTV Unplugged” (1994)
”Symphoniclive.wav” Sarah Chang and The London Live classical recording. Large
Symphony Orchestera – orchestera playing live in big
”Paganini Violin Concerto in D, hall. Very big soundstage.
3. mov.” (Live – 1997)
”Oldstereo.wav” The Beatles – ”Sgt. Pepper’s Old-style stereo recording with
Lonely Hearts Club Band” from some instruments only in the
the album ”Sgt. Pepper’s Lonely left channel and others only in
Hearts Club Band” (1967) the right channel.

In addition to these five tracks, a file consisting of two identical channels,


”dualmono.wav”, was used to verify the stereo decorrelation’s functionality.

As described in the theory section, the normal way of decorrelating the channels is to
replace the L (left) and R (right) signals with M (mutual) and S (side), one consisting
of the average between L and R and the other the difference. However, a complication
arises when using only integer arithmetic. The mutual signal is calculated by

L+R
Eq. 37 M =
2

which will give a roundoff error unless a floating-point representation is used. To


keep the algorithm fast, it should be restricted to integer-only. One could in theory
remove the divide-by-two operation, but this will result in the M-signal being as large
as L and R combined, and any performance increase is withered. An alternative
approach is used here; one channel is sent directly to the encoder while the other is
replaced with the S-signal. The channels are then decorrelated. This should give about
the same performance as using M and S, since a given channel normally will be the
strongest (i.e. larger than M) 50% of the time and the weakest (i.e. smaller than M)

21
In the early days of stereo recording, it was often utilized by placing some instruments, like drums and rhythm
guitar, in one channel and the rest, for instance lead guitar, bass and vocals, in the other. During the late 60’s and
early 70’s recording engineers gradually learned to use stereo to pan the sound between the speakers, which gives
a more natural soundstage and also more signal correlation between the channels.

116
50%. However, performance can suffer in special cases where the channel being sent
uncoded is consequently stronger than the other, or if the channels are often in
opposite phase. A way to overcome this, and also improve performance, is to find out
which channel has the smallest absolute value and see to it that this channel is sent
directly for each sample. However, this would demand quite a bit of resources and
also a 1-bit indicator would have to be added to each sample-pair to tell the decoder
which channel is sent directly. Subsequently, this is not investigated further in this
thesis, but if high-performance compression programs for home computers are to be
developed, it could be considered.

The test-codec was designed so stereo-decorrelation could be switched on and off to


make it easy to compare. The results for the test files are shown in table 31 and were
done with a second order fixed predictor and pod-coder.

Table 31 Results of inter-channel decorrelation


File Filesize reduction Filesize reduction with
without inter-channel inter-channel
decorrelation decorrelation
Oldstereo.wav 38.9% 37.1%
Modernlive.wav 27.8% 27.4%
Modernstereo.wav 25.4% 25.4%
Modernstereo2.wav 29.4% 29.4%
Philharmonic.wav 49.9% 48.5%
Dualmono.wav 47.8% 60.0%

As we can see, inter-channel decorrelation gives little or no improvement and


”Dualmono.wav” shows that it is not due to implementation issues. These results
correspond with the ones found earlier by Mat Hans of AudioPak [reference 2] as well
as Al Wegener of MusiCompress [reference 29]. Because of time differences between
the channels and separate track processing often used during mastering, there usually
isn’t much sample-to-sample correlation between them (even if there is much
correlation within a larger time window). To ensure that the implementation follow
the theoretical entropy differences, the MatLab script earlier developed to do
decorrelation and calculate entropy was used and it’s results compared to the
difference in real-life compression with and without the stereo decorrelation. Also, the
L and S entropy was compared to the M and S entropy to see if substituting M with L
caused any loss. The results are displayed as an average of the above 5 files. Second
order prediction was used both in the MatLab script and in the codec. To make the
results comparable, they are normalized to a percentage of the original data size, i.e. is
the entropy shown as a percentage of 16 bits, the entropy sum as a percentage of 32
bits and the compression shown as the new to old filesize percentage.

117
Figure 73 Entropy of channels, mutual and side signals and filesize reduction,
average results of files in table 14 except ”dualmono.wav”

As we can see, the teoretical and practical results are almost identical. The average
performance is lower with inter-channel decorrelation on since the side signal has
higher entropy than the channel being removed. We can see that the mutual signal is
smaller than the channel signals, this is an obvious consequence of the side signal
being larger, so a bit better performance would be achieved if the mutual signal had
been calculated as well. However, the conclusion is that the sample-to-sample channel
correlation is neglegible and that implementing inter-channel decorrelation is not
worthwhile.

Better results could probably have been achieved by exploiting channel correlation
over larger time windows. There is much correlation between left and right, but
because of the time differences it will not be evident when only one or a few sample
instants are compared at a time. By searching for correlation over larger time periods,
much redundancy could probably be removed, but this will require much memory and
processor power and it thus not feasible on the wireless loudspeaker system. It would
probably also produce too much latency for use in any real-time system, but if
compression for personal computers and file storage is developed, it should definetely
be considered.

These results correspond with the ones found by beforementioned Mat Hans and Al
Wegener, among others, and inter-channel decorrelation is not recommended to
implement in products like the wireless loudspeaker system.

118
5.5 Final algorithm proposal and benchmark
As seen in the previous segments, a large number of methods have been tested. The
results found lays the foundation for the final algorithm proposal. Of course one
should always keep in mind that the target application is an embedded real-time
system. Thus, some of the demands include:

- Good worst-case as well as average performance.


- Low complexity
- Non-variable or low-variable computational load. Since the algorithm
operates in a real-time, low-memory system, it should be able process data
at a constant speed, thus there is not room for heavy computations even if
they’re done rarely.

Based on these requirements, some conclusions have been drawn:

- Pod-encoding is preferred over Rice-encoding since the worst-case


performance is much, much better.
- The iPod encoding is discarded since the overflow values are low, even if
k is calculated quite rarely.
- The predictor should not be of higher order than 2, since increasing the
order beyond this gives very little performance increase.
- The framewise polynomal approximation adds too much complexity, and
calculating k more rarely somewhat compromises it’s performance. In
addition to several continuous accumulations one will also get a significant
processor load increase at the start of each frame, which a real-time system
running ”on the edge” might not be able to handle.
- The 1st and 2nd order samplewise polynomal approximation is interesting,
but the comparison and selection done for each sample adds complexity to
the algorithm. However, if the extra processor power is available, it is in a
real-time system preferable to a framewise polynomal approximation or
higher-order fixed predictors.
- Inter-channel decorrelation is not worthwile to implement, since the gain
on most recordings is non-existent or very limited.

Based on these criteria, a second-order fixed predictor with Pod-encoding, no


channel-decorrelation and a sample-wise computation of k is considered the best
compromise22. Implementation in the MCU will finally determine if this is indeed
feasible and if there is any resources left over. If so, these are probably best spent
implementing a sample-wise polynomal approximation.

The suggested algorithm is finally tested for performance against the compression
application Carbon Shorten 1.1a for Macintosh. Shorten is considered one of the best
compression algorithms and both Carbon Shorten and Shorten for Windows are
amongst the most popular lossless compression utilities for their respective platforms.
It is thus very relevant as a benchmark for comparison. Shorten is a highly developed

22
It might be that using the same k within a frame is the best option when it comes to implementation
of the WLS, see the implementation considerations chapter for details.

119
scheme based on a higher-order polynomal approximation and Rice-encoding and will
therefore presumably give a better compression ratio than the much simpler algorithm
devised for our purpose. The point of the comparison is to see how close we get to the
more sophisticated algorithm in terms of compression ratio with just a second order
predictor and the Pod-coding. The comparison was done using all the files in table 18
(the six mono test files) and table 30 (the six stereo test files).

Figure 74 Performance evaluation, Shorten vs. suggested algorithm for WLS

As we can see, there is as expected a performance gap to Shorten, but only by


between two and five percent. This again shows that even a very simple predictor
performs surprisingly well and that the encoding is not significantly less efficient than
the more advanced one used in the benchmark. Given the simplicity of the proposed
algorithm, this result is very satisfying indeed. Also, the proposed algorithm was
compiled as a single run using SDCC. The results indicates a 300-400 instructions per
sample complexity, depending on the input signal. This is within the capabilities of
the MCU, but definetely on the limit, as data handling must be done simultaneously.
However, if the code is optimized and, if necessary, written in assembly, it should be
feasible to implement the proposed lossless compression algorithm in the WLS.

120
5.6 Lossy mode
As mentioned before, the wireless loudspeaker system has to include some sort of
lossy mode if the compression ratio over a period of time does not manage to meet the
requirement set by the 1Mbps transfer rate of the transceiver. The data will need to be
buffered in the MCU memory and if the buffer is about to be filled up (if more data is
being sent from the encoder than the CC2400 is able to transmit) the lossy mode must
be engaged. It must stay ”on” for a short while until the buffer is empty again.

The time period the lossy-mode is on will be very short, a few ms at most, but on files
with low compressabillity it will be used quite often. It is still unlikely to be audible,
but a lossy-mode scheme has to be used which does not compromise performance too
much, to minimize the probability of perceptible degradation.

To be able to realize this, the data must be split into frames. A header is needed to tell
the decoder if the frame is encoded in lossless or lossy mode. The frame should be
short enough to not minimize distortion audability, but long enough so the header
does not give too much overhead.

Three different schemes for the lossy mode were considered. A model of the system
was written in C so listening tests and measurements could be made. The three
alternatives are:

- If the data rate is too high, employ a fixed-wordlength lossy compression


scheme. Then revert to lossless.
- If the output data rate is too high, remove a number of LSBs from the data
in the frame to compensate.
- If the output data rate is too high, send some samples in mono to
compensate.

If we consider the first method, some kind of low-complexity lossy encoding must be
employed, probably µ-law, iLaw or an eqivalent. However, as shown earlier, each of
these methods will, unless they are very adaptive, give low noise on low-level signals
and high noise for high-level signals, often higher than an LPCM quantization to the
same number of bits. Of course, the lossless encoding will produce the longest output
words when the signal is loud, i.e. at the same time as the lossy compression performs
badly. Thus, alternative 2, to remove some LSBs when the bitrate is too high, will
almost certanly give a better result than, and also be much simpler than, moving to
some special lossy encoding scheme. Thus, this option is discarded. The other two are
evaluated in the following subchapters.

121
5.6.1 LSB-removal lossy-mode

When the data rate from the lossless encoding is too high it is likely that the signal is
loud. Removing one or a few LSBs when the signal is loud is not very perceptible, if
it is done over short periods of time, even less so. Unlike for instance µ-law lossy
encoding, a hybrid scheme like this will cause loss only when the bitrate is too high to
transfer (as shown earlier, µ-law gives high dynamic range, but the instantaneous
quantization error is just (n-4) bits below the sample value for a n-bit encoding).

Note that requantization should ideally be combined with dithering to avoid


distortion. Dithering has been used during testing, but can be left out if the MCU does
not have the available resources. Since the number of bits removed is quite small, the
distortion is unlikely to degrade audio quality significantly.

The frame header will need to tell the decoder two things. First, it needs to know
whether or not the frame itself is encoded in lossy or lossless mode. Secondly, it will
need to know how many bits are removed from the samples in the frame. Since the
data input is 16 bits/sample and the decoder output is around 10 bits/sample, the
number of LSBs needed to be chopped off can be represented with three bits.
Obviously, the zero will tell the decoder that no LSBs are chopped off, which is the
same as lossless transmission, and a separate indicator for this is not needed. One
should note that if k is not calculated for every sample, the wordlength can increase
for some samples where k is way off. So if k is calculated rarely some frames can be
large and one might want to increase the number of bits in the header to accommodate
this. Since a sample-wise calculation of k is suggested for this system, a three-bit
header is used during testing.

During testing, a frame-size of 64 samples was used. The MCU has 2kB RAM and
can thus hold two frames, the uncompressed input frame and the compressed output
frame. Since the application is in real-time it was also important to develop a scheme
which is causal. The result was a low-complexity algorithm for employing lossy-
mode: During decompression of frame N, the output data length in words is counted.
If frame N is larger than a threshold, corresponding to 1Mbps, LSBs are removed
when reading frame N+1. The number of LSBs removed always correspond to the
overshoot from the length of frame N relative to the threshold. Thus the average
output datarate will always converge towards 1Mbps. There are 64 samples in each
frame and the desired output data rate was set to 10 bits per sample. The threshold is
then:

Eq. 38 Threshold = 64s !10bits / s = 640bits

And the number of LSBs removed from a given frame is

BOUT ! 640
Eq. 39 LSB _ rem =
64

122
Where BOUT is the number of bits produced when decompressing the previous frame.
When mono files are read, the bitrate is already below 1Mbps and no lossy-mode is
employed.

Figure 75 Algorithm for LSB-removal lossy mode

The lossy-mode was tested on several files and very little if any degradation of audio
quality was detected. Figure 76 shows the performance on a 30s excerpt of
”modernlive.wav”, a file of normal loudness. The performance is also compared to
the iLaw codec and the LAME MP3 codec at 192kbps.

123
Figure 76 Lossy-mode performance, "modernlive.wav", 30s excerpt, left channel

As we can see, the error clearly follows the frames. For many frames no bits are
removed, while for a few others up to four bits are removed. The vast majority
however, are between zero and two. The measured results in numbers are shown in
table 32.

Table 32 Lossy-mode performance


Lossy-mode SER Max absolute error
Lossless with LSB-removal 88.5dB 0.00023
ILaw 56.6dB 0.0043
MP3, 192kbps 62.1dB 0.0024

As we can see, the loss measured in numbers is a lot better than for iLaw or MP3.
This was not unexpected, as listening tests showed no audible degradation. Figure 76
also shows that for most frames, zero or one LSB is removed, two for quite a few,
while there in some rare instances are three to four removed. But this is in very loud
parts of the track and also on high-frequency signals (due to the prediction), and does
not appear to be audible. The lossy-modus as suggested here works very well.

124
5.6.2 Mono samples lossy-mode
The mono mode lossy algorithm developed is very similar to the LSB-removal
algorithm. It checks the output length of frame N. Then, if it is too long, it sends some
samples in frame N+1 in mono to compensate for the overshoot. The threshold is
calculated in the same way as with the LSB-removal. Since 16 bits are saved for each
sample sent in mono, the number of mono samples for frame N will be

BOUT ! 640
Eq. 40 SMONO =
16

Where SMONO is the number of samples in frame N+1 to be sent in mono and BOUT is
the number of bits used in frame N. Thus, the output bitrate will average at 640 bits
per frame or 10 bits per sample. The algorihm is the same as shown in figure 75,
except that SMONO is calculated instead of lsb_rem and mono samples are sent instead
of LSBs removed. When in mono-mode only the left channel is sent and the decoder
copies it to both left and right after decompressing.

During testing, it soon became evident that the mono samples lossy-mode had
problems that could not be solved without significantly compromising performance.
For most frames, the right channel will toggle between being itself and being a copy
of the left. Since this happens twice for each frame (when the lossy-mode is engaged)
and the frames are 64 samples, the right channel will toggle it’s mode about 1,400
times per second. This introduces very audible high frequency distortion. To confirm
that this was not an implementation issue, a very simple program converting a fixed
number of samples per frame to mono, without any compression or signal processing,
was written. This produced the same result. The only way to avoid this distortion was
to force the mono-mode to be on for quite long periods each time, at least five to ten
thousand samples (so the toggling rate is below any audible frequency). And even
then it was easy to hear the audio going from stereo to mono and back again, the
soundstaging was almost rendered unrecognisable.

125
Figure 77 Spectrum with mono-mode, 64-sample frames, ”modernlive.wav”, 30s
excerpt.

As figure 77 shows, the HF noise level added to the right channel is significant and
the result is not by any means of high-fidelity standard. Since the LSB-removal lossy
mode gave excellent results, this is a no-brainer. The mono-samples lossy-mode is
discarded.

Appendix 6 includes source-code for an evaluation program where LSB-removal or


mono-mode can be selected by the user and tested. A mono-mode test-only (without
encoding or decoding) can also be tested.

126
6 WLS Implementation Considerations
As mentioned in the introduction, delays in design and manufacturing of the hardware
made it impossible to do a full implementation before the thesis deadline. This is
detailed in the project review. But even so, algorithm design has consistently been
done with MCU implementation in mind. As a result of this, some optimization
suggestions and general considerations will be presented as well as the work actually
done with the hardware.

6.1 MCU implementation considerations

6.1.1 Wrap-around arithmetic


As mentioned in the lossless compression theory chapter, the output from the
prediction filter is quantized to 16-bits precision. This makes the predictor slightly
nonlinear, but the effect it has on the performance is neglible. Since the same
quantization is done in the decoder’s filter as well, the system will of course output
the same sample values it received and will still be completely lossless.

The residual being sent is the difference between the real value and the predicted
value. Since both of these are 16-bit in length, the residual can, although it is higly
unlikely, be a 17-bit value23. In a powerful computer or DSP, which uses 32- or 64-bit
instructions, this is not a problem. In a 16- or 8-bit MCU however, the requirement to
handle 17-bit values instead of 16-bit will give a significant performance reduction.
Every operation will have to use a significantly higher number of instructions.

But this problem can be avoided with wrap-around arithmetic. When the arithmetic
only includes summation and differenciation and the operations in the decoder are
inverse of the ones in the encoder, using a 16-bit variable for the residual will not be a
problem, even if it’s value overflows. This is easiest explained using an example.

A 16-bit two’s complement variable has the value range [-32768, 32767]. If you try to
go outside these values, it will wrap around. For instance:

32,767 0111 1111 1111 1111


+ 1 0000 0000 0000 0001
= -32,768 1000 0000 0000 0000
or
-32,768 1000 0000 0000 0000
- 2 0000 0000 0000 0010
= 32,766 0111 1111 1111 1101

23
For the residual to use 17-bit, the difference between the real and predicted value must be more than
±32,767. This rate of change is very unlikely to occur in music signals. If for instance a first order
predictor is to give such a residual, the signal must be at almost 0dBfs (full level) and close to 20kHz,
no normal recording has such an output level at those frequencies. For a higher order predictor it is
even more unlikely.

127
If we use a first order predictor and the last value x[n-1] was 19,000 and the current
value x[n] is –32,000, the residual, x[n]-x[n-1], will be –51,000. This is outside the
value range and the residual will wrap around to 14,536. The decoder now has the last
sample value 19,000 and receives a residual of 14,536. In the decoder the sample
value is of course found by adding the residual or difference to the last value, which
gives 19,000+14,536 = 33,536. This is outside the range and will again wrap around
to –32,000, the correct value. As long as the encoder and decoder do the same
operations, this is not a problem.

It should be noted that the wrap-around process will affect the compression, since a
different value is compressed. However, one must remember that this is an event that
is very unlikely to happen, the probability of the prediction residual being outside the
16-bit value range is almost non-existent. Thus the practical compression ratio will
not be affected and by restricting oneself to 16-bit values, significant hardware
resources are saved.

6.1.2 Look-up tables


Generally, shift operations are much slower in a MCU than in a computer processor.
While a P4 or G4 can shift many bits at a time, the MCU can shift only one bit per
instruction. To avoid extensive shifting, some look-up tabels should be used. For
instance, if the application is to check the sign bit, one can do this in two ways. For an
8-bit signed variable it can be done by either

if(variable>>7) {
….;
}

or by

if(variable&0x80) {
….;
}

clearly, alternative two is much faster. If bit 8 is to be set one can likewise use

variable=variable|0x80;

The easiest way to handle single bits is to use a bit-table:

char bittable[8] = {0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01};

Then we can use bittable[i] to either check or assign bit i in any variable. For 16-bit
variables the bit-table should of course start with 0x8000 and end with 0x0001. This is
done throughout the code.

Look-up tables should also where possible be used to replace for-loops, for instance
when finding variables dependant on other variables. For instance, finding the

128
exponent in the µ-law encoder (the number of zeros before the leftmost ’1’ in the
magnitude) is done by

value = sample<<1;
for (exp=7;exp>0;exp--) {
if(value&0x8000) break;
value = value<<1;
}

This can easily be replaced with a table as shown in the source-code in appendix 6.
Static look-up tables like this can be included in the program memory rather than the
RAM, and will thus not affect the systems memory resources (assuming that there is
enough program memory, of course).

6.2 RF-link implementation considerations

6.2.1 Packet handling


The RF-link for the wireless loudspeaker system is realized with the Chipcon CC2400
RF-transceiver. The CC2400 features hardware packet handling support to allow
flexible and robust data transfer without stealing resources from the system. The
parameters of a packet is identified through a packet and a sync word. To avoid
multiple headers within a packet, it would therefore be smart to set the packet size
identical to the frame size. The suggested frame size from the lossy modus tests is at
64 samples.

Eq. 41 Packet size = frame size

The packet format is shown in figure 78.

Figure 78 Chipcon CC2400 packet format [reference 22]

The ”data field” would in our case then start with the frame header and also contain
the compressed audio data for the next 64 samples. The optional 8/10 coding in the
figure is an encoding of the data (IBM 8B/10B encoding scheme, see reference 22)
that is in some applications used for spectral properties and error detection. However
it reduces the data rate to 80% of the original 1Mbps. In the WLS 8/10 coding is not
considered necessary. However, CRC (cyclic redundancy check) should be included

129
to avoid noise corrupting the data too much. As seen in figure 78, CRC adds an
overhead of 16 bits per frame.

6.2.2 Transmission or calculation of k ?


In the test applications, the Pod-parameter k is calculated for each sample in both the
decoder and the encoder using the same formula. This means no encoding parameters
must be transmitted and also gives the most symmetric codec (encoder and decoder of
about the same complexity). However, if data corruption in the transfer occurs, the k
calculation in the decoder might not work as intended. Tests done on the compressed
file showed that changing some of the data content in the compressed file could have
catastrophic consequenses for the calculation of k, and that it might well freeze the
application. During implementation, it should be considered if k should be sent with
the compressed data rather than being calculated in the decoder. The parameter k can
take any value from 0 to 15 and would thus generate an overhead of 4 bits. It is then
obvious that a new k cannot be calculated for each sample, rather for each frame so it
could be included in the same header. As tables 19 and 20 show, the negative effect
this has on compression ratio is very limitied, within a few tenths of a percent. In
addition, k would add a minimal overhead of 4 bits per frame. This should be tested
during implementation, but the proposed frame would then be like shown in figure 79.

Figure 79 Proposed frame for WLS-implementation with transfer of frame-static k

When all the decoder parameters are transferred instead of being calculated, the
consequences of transmission errors are likely to be much smaller. Also, errors will
not be able to accumulate, at least not from one frame to the next.

6.2.3 Lost packet handling


Even though CRC allows you to detect and handle corrupted packets, it might happen
that entire packets are lost. This can be due to very high noise levels or interference
from another ISM-unit. To minimize the latter, the transmitter should do frequency-
hopping (FH) from packet to packet. A frequency table must then be defined and the
frequency information added to the packet header24. Also, the compressed packet
audio data (see figure 79) should contain two original sample values (rather than
prediction errors) to allow the second order predictor to get back on track in case the
previous packet is lost.

Even so, packets will be lost. This can be handeled by either repeating the last packet
or putting out silence. 64 samples correspond to 0.73ms of audio for a stereo signal
sampled at fS = 44,100hz, the question is if occasional periods of silence with this

24
For details on frequency hopping the reader is referred to Chipcon Application Note AN24

130
length is audible at all and if so, will a repetition of the previous frame instead of
silence give better or worse sound.

A program that emulated the loss of frames was written to test the audability and
compare the alternatives. The source-code is given in appendix 6. The program lets
the user select the packet length in samples, how often packets are lost (a fixed
”loose-interval”, where a value of 1,000 means that 1,000 packets are sent for each
time a loss happens), how many successive packets are lost and whether silence or a
repetition of the last packet should be done to compensate.

It soon became evident that when only one packet was lost at a time, the two methods
of handling it sounded more or less identical. A 64-sample packet is just 0.7ms of
audio, and in both cases the loss of a single packet sounded like a small ”tick”.
Differences were not heard until several successive packets were dropped. A blind
test using five different audio files was set up to see how many packets had to be lost
before a difference between the two methods could be identified, and when it could,
which alternative was preferred25.

6
Number of audio files for which difference

4
was audible

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of successive packets lost

Figure 80 Left: Audibility of difference between method 1 (silence) and 2 (repitition),


1,000 packet "loose interval", 64 sample packet.

25
For information on how to set up a scientifically credible blind test, the reader is recommended the
webside of The ABX-Company, http://www.pcavtech.com/abx/index.htm

131
As we can see, 5 successive packets (3.6ms) must be lost before the difference
between the two metods is audible on a majority of the files. When many packets are
lost they sound different. Which leads to the question of which one which one is
preferable.

80

70

60

50

40

30

20

10

0
Silence Repetition
Peferred alternative

Figure 81 Preferred lost packet handling method

Of the 75 occasions for which a difference was detected, silence was preferred in 72.
It should be noted that when more than 8-10 successive frames were lost the listener
could easily identify whether silence or repetition was used and knew which one he
”voted for”. The evaluation in figure 81 is thus highly subjective. Nevertheless, the
distortion-like effect caused by repeating a packet several times was perceived as
worse than moments of silence.

When listening to what rate of packet loss could be tolerated, the loss in audio quality
was characterized as ”significant” when the ”loss-interval” was less than 1,000-1,500
(i.e. more than one packet loss per second). When below 300-500 it was characterized
as ”annoying”.

The conclusions from the test is that a packet loss of less than one per second can be
tolerated and that inserting silence is preferable when packet loss happens. Inserting
silence should also be easier to implement in the WLS as no extra buffering or
calculations are required.

132
133
Part 3

- Summary –

Engineering for the sake of music;


the Fender Stratocaster of late guitar extraordinare Stevie Ray Vaughan (1954-1990)

134
7 Project Review
In this part of the thesis a review of the project itself and the work done will be
presented.

The Wireless Loudspeaker System was a project initiated by Norwegian


semiconductor company Chipcon to develop a demonstration platform for their
CC2400 RF-transceiver. The requirements for the design was that it should be low-
cost, relatively simple and implemented using only standard hardware, i.e.
microcontroller, logic circuits and the RF-chip, no dedicated DSPs, FPGAs or
anything like that.

From the beginning the goal of the thesis was to evaluate and find a suitable low-
complexity and high-fidelity compression algorithm and implement this using a MCU
demonstration board strapped to the CC2400EWB. However it soon became apparent
that no such demonstration boards had the necessary peripherals. Since Chipcon
wanted a reference design anyway we agreed on including hardware design as part of
the thesis. The WLS was thus designed from scratch.

To design the WLS proved to be more work than anticipated; as shown earlier in this
thesis, a custom communications system using logic devices had to be developed. As
a consequense, the hardware design phase took almost a month more than planned
and the complete and verified circuit design was delivered to Chipcon for
manufacturing in mid March instead of mid Febuary as intended.

The plan was then for me to do compression algorithm research using my computer
while waiting for the finished PCB. I was supposed to receive it soon after easter and
use the last four to six weeks on implementation. However, there were also significant
delays in the manufacturing of the circuit. Since this was beyond my control I used
the time to do a much more extensive research and development on compression than
first planned and both a custom lossless and a custom lossy algorithm has been
proposed. The finished WLS-hardware would not leave production before the thesis
deadline and an implementation was thus not possitble to achieve before finishing the
thesis.

However, although I recognise implementation as both an important and instructive


process, I do not believe that the academical reward of the project was compromised
because of these delays. The extra effort put into audio compression gave valuable
insight into a field of which I have great interest and it also produced some very good
results, both for the lossy and lossless part. Also, desiging the hardware proved to be
very educational. I learned a lot about embedded systems design as well as gaining
insight into how the business works; how a design process is handeled, how
verification and frequent reviews are of utter importance and generally how to
administrate a fairly extensive project.

135
8 Summary
This thesis convers the work done developing the Wireless Loudspeaker System. It
has been my intention to make it a complete document, and I have therefore presented
a theory section so the reader is able to understand the work and the results even
without looking up in the references. The theory that is not directly related to the
thesis main focus, but still has been relevant to the development process, is presented
in appendixes 1 and 2. This covers the different formats and protocols used in the
system (appendix 1) as well as general data conversion theory (appendix 2). The other
appendixes include the circuit, the PCB-design, the equipement used as well as the
source-code.

Regarding the source-code, only the most relevant applications are included. During
the project over 50 different versions of various compression algorithms were
compiled and tested. To include all these would make the report much to extensive.
The source-code in the appendix includes DPCM, ADPCM, µ-law, iLaw, a Rice-
/Pod/iPod entropy coder and decoder, a lossless codec with selectable prediction, a
hybrid lossy/lossless codec and the frame drop test algorithm. In addition the MatLab
scripts referred to in the thesis are also given in appendix 7.

The practical part of the thesis documents the work done, from finding the appropriate
components to hardware and software design. The hardware documentation ends in a
finished design, while the software documentation, due to the implementation not
being done, ends with considerations and suggestions. Based on measurements and
subjective listening tests, compared to an assessment of computational complexity
and MCU implementation feasibility, conclusions are drawn and algorithms
suggested. For the lossy option, a custom made iLaw algorithm is proposed which
features high performance (comparable to 128kbps MP3) and very low complexity
(estimated at around 250 instructions per sample in an 8-bit MCU). The lossless
algorithm suggested uses a second order predictor and Pod-coding. It features
compression ratios within a few percent of the much-recognised home computer
application Shorten, a lossy-mode for constant bitrate and an encoding with very good
worst-case performance to ensure minimum influence from this lossy-mode.
Complexity tests show it should be feasible to implement in an MCU-based system.

To summarize, I think this project, despite it not being completely finished within the
thesis deadline, has been an academical success. I have learned a lot about audio
compression, digital signal processing, general programming and embedded systems
and hardware design. These are all important areas for an engineer to master and I
consider the gained knowledge to be extremely valuable. Also, much practical
engineering work has been done, which I find very rewarding since it has made me
better equipped to face the challenges that meet me outside the university.

136
9 References
1. ”Data Compression Basics”, Slides, EECC694
2. ”Optimization of Digital Audio for Internet Transmission”, Ph.d Thesis
Mathieu Claude Hans
3. Windrow, B. et.al.: ”Stationary and non-stationary learning
characteristics of LMS adaptive filter.” Proc IEEE, 1976
4. G. Mathew et.al: ”Computationally simple ADPCM based on exponential
power estimator” Proc. IEEE, 1992
5. Monkey’s Audio theory
6. T. Robinson: ”Shorten: simple lossless and near-lossless waveform
compression.”, Technical Report 156, Cambridge University, 1994
7. Lesson 8: Compression Basics, Computing and Software Systems lecture
notes, University of Washington Bothell
8. Introduction to multimedia, Cardiff University
9. Debra A. Lelewer , Daniel S. Hirschberg, “Data compression”, ACM
Computing Surveys (CSUR), v.19 n.3, p.261-296, Sept. 1987
10. LOCO-1: Weinberger, M, Seroussi, G, Sapiro, G: “A low-complexity,
Context-based, lossless image compression algorithm.” IEEE Data
Compression Conference, 1996
11. Weinberger, M, Seroussi, G: “Modeling and low-complexity adaptive
coding for image prediction residuals”, IEEE.
12. Robin Whittle: “First Principles, Lossless compression of audio”
13. Fraunhofer institute, ”MPEG-1 Layer 3 overview”.
http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html
14. Microsoft corp.: ”Windows Media Encoder whitepaper”
http://download.microsoft.com/download/winmediatech40/Update/2/W98
NT42KMeXP/EN-US/Encoder_print.exe
15. Sony: ”ATRAC whitepaper” http://www.minidisc.org/aes_atrac.html
16. Fraunhofer Institute, “MPEG-2 AAC overview”.
http://www.iis.fraunhofer.de/amm/techinf/aac/index.html
17. Chipcon Application Note 126 ”Wireless Audio using CC1010”
18. IMA Digital Audio Focus and Technical Working Groups:
”Recommended practices for enhancing digital audio compatibility in
multimedia systems.” rev. 3.00, October 21, 1992
19. W.M. Hartmann ”Signals, sounds and sensations”, AIP Press, 1997
20. R.G. Baldwin: ”Java Sound, Compressing Audio with mu-Law encoding”
21. ”Vorbis-1 specification”, Xinph.org foundation,
http://www.xiph.org/ogg/vorbis/doc/Vorbis_I_spec.html
22. Chipcon SmartRF CC2400 datasheet
http://www.chipcon.com/files/CC2400_Data_Sheet_1_1.pdf
23. AKM AK4550 datasheet
24. Texas Instruments TLV320AIC23B datasheet
25. Analog Devices AD1892 datasheet
http://www.analog.com/UploadedFiles/Data_Sheets/294553517AD1892_0
.pdf
26. Crystal Semiconductors CS8420 datasheet
http://www.cirrus.com/en/pubs/proDatasheet/CS8420-5.pdf

137
27. AKM AK4122 datasheet preliminary
www.akm.com/datasheets/ak4122.pdf
28. Crystal Semiconductors CS8416 datasheet
http://www.cirrus.com/en/pubs/proDatasheet/CS8416-4.pdf
29. Wegener, Albert: ”MUSICompress: Lossless, Low-MIPS Audio
Compression in Software and Hardware.”Soundspace Audio, 1997
30. Atmel AVR Mega169 datasheet
http://www.atmel.com/dyn/resources/prod_documents/doc2514.pdf
31. Atmel AVR Mega32 datasheet
http://www.atmel.com/dyn/resources/prod_documents/doc2503.pdf
32. Texas Instruments MSP430F1481 datasheet.
http://www-s.ti.com/sc/ds/msp430f1481.pdf
33. Motorola DSP56F801 datasheet
http://e-www.motorola.com/files/dsp/doc/data_sheet/DSP56F801.pdf
34. Motorola DSP56800-family reference manual
http://e-www.motorola.com/files/dsp/doc/ref_manual/DSP56800FM.pdf
35. Hitachi/Rensas R8C/10 datasheet
http://www.eu.renesas.com/documents/mpumcu/pdf/r8c10ds.pdf
36. Silicon Laboratories C8051F005 datasheet
http://www.silabs.com/products/pdf/C8051F0xxRev1_7.pdf
37. 74HC4094N datasheet
38. 74HC166N datasheet
39. 74HC4020 datasheet
40. Kernighan, Brian W. & Ritchie, Dennis M. : ”The C Programming
Language”, 2nd edition, Prentice Hall 1989.

138
139
APPENDIXES

140
141
Appendix 1. Data Formats

The wireless audio system must both in hardware and software be compliant with
several standard interfaces used by the various chips. The digital audio input is based
on the SP-dif (Sony/Philips digital interface format) format and is decoded with a
dedicated receiver. Both this receiver and the audio codec which manages the analog
inputs and outputs use the I2S (Inter IC Sound) standard for communicating with other
circuits. Thus the communication between the MCU and these units must be
compatible with their I2S interfaces. Finally, the communication between the MCU
and the RF-chip uses the SPI (Serial Peripheral Interface) format.

In addition, the compression algorithms used in this project have been developed and
tested on Mac OS-X and Windows computers. The most widespread uncompressed
audio format for computer use is the WAV-format (Waveform Audio Format), which
has been used during testing and development. The WAV-file format is also
examined in the following sections.

SP-dif (Sony/Philips-data interface format)


The Sony/Philips digital interface format [reference A1-1] is a consumer version of
the AES/EBU (Audio Engineering Society / European Broadcasting Union) format
and is given by the IEC958 standard of 1989. While AES/EBU was developed as a
digital audio interface for professional use, the SP-dif is intended for home audio
equipement and therefore has some changes in the data being transferred. Also, the
physical connection is unbalanced with much lower signal levels, since the cabling
length and surrounding noise levels will be much lower in a home audio system than
in a professional recording studio. The main differences between AES/EBU and SP-
dif are listed in table A1-1.

Table A1- 1 SP-dif vs AES/EBU digital audio interfaces [referenceA1- 2]


AES/EBU SP-dif
Cabling 110ohm shielded TP 75ohm coaxial of fiber (Toslink)
Connector 3-pin XLR RCA (or BNC)
Signal level 3-10V 0.5-1V
Modulation Biphase-mark-code Biphase-mark-code
Subcode information ASCII ID text SCMS copy protection info
Max. resolution 24-bits 20-bits (24-bits optional)

Every sample is transferred in a 32-bit subframe. The left and right channel subframes
represents one frame. The subframes and frames are seperated with a preamble, a bit-
pattern containing a biphase-coded error. This because the receiver must be able to
identify the start of a sample or a data block. Figure A1-1 shows a how the subframes
and frames are built up.

142
Figure A1- 1 SP-dif subframes and frames [reference A1-1]

The different preambles have the following meaning:

- Preamble X: Tells us the subframe has data for the left channel. The
subframe is not at the start of the data block.
- Preamble Y: Tells us the subframe has data for the right channel. The
subframe is not at the start of a data block.
- Preamble Z: The subframe has data for the left channel and we are at
the start of a new data block.

In a subframe, the first four bits are preambles. After these, four AUX data bits
follow. They are used to tranfer information about tracks, like name, track number
and soforth. Bit 8 to 27 contains the actual audio data, max 20-bits. If the data
wordlength is 24-bits, the AUX-bits are also used for audio data. After the audio data
comes a validity-bit, a user bit, a channel status bit and a parity bit.

Figure A1- 2 The content of SP-dif subframes and data blocks [ref.A1-1].

143
As seen in figure A1-2, each data block contains 192 frames and will always start
with a left channel sample. In each frame, a total of 384 channel-status and subcode-
information bits are transferred. This data information must be decoded by the SP-dif
receiver as shown in figure A1-3.

Figure A1- 3 Channel status block data, SP-dif (left) and AES/EBU (right) [ref.A1-1]

Figure A1-3 also shows the difference between the SP-dif consumer format and the
AES/EBU professional format. The latter does not have copyright information, but it
does contain some other information like reliability, reference, when the data is
recorded etc. It also contains some user configurable bits like channel setup override
and sample frequency. This information is not needed in consumer equipement, which
are meant only to playback the data, and not to alter it.

It must also be mentioned that the IEC958 standard was renamed IEC60958 in 1998
and has been expanded to also carry IEC61937 datastreams. IEC61937 data can
contain multichannel sound like MPEG-2, AC3 or DTS [reference A1-2].

I2S (Inter IC Sound)


I2S (Inter IC Sound) [reference A1-3] is a bus developed by Philips for transmission
of digital audio between different chips within a system. The bus only transfers audio
data, while control and information signals are sent between the components other IO-
pins. I2S is a three-wire bus with one data-connection, one bitclock-connection (to
clock the bits in the serial data stream) and one word-clock or LR-clock connection
(to clock the samples, left channel sample when the LR-clk is ’0’ and right channel
sample when it is ’1’). The unit generating the clock signals will function as master.
The audio samples are transferred as two’s complement PCM with the MSB-first.
Since the MSB is transferred first, the transmitter is not dependent of knowing how
many bits the receiver can handle. If a 24-bit transmitter is connected to a 16-bit
receiver, the 8 LSBs will be ignored on reception. If a 16-bit source is connected to a
24-bit receiver, the 8 LSBs will be set to zero. All timing demands in the I2S-protocol

144
are proportionate to the clock frequency, thus higher sample-rates can be allowed in
future applications.

Figure A1- 4 I2S-interface data transfer diagram [reference A1-3]

Figure A1-4 shows the I2S data transfer, where SCLK is the serial- or bit-clock while
LRCK is the left-right- or word-clock. SDTI is the data transfer pin. As shown the
SCLK usually runs at 32fS or 64 fS, where fS is the sample frequency. At the former
frequency the PCM word-length can be 16-bit, 20-bit or, in theory, up to 31-bit (or
32-bit with left- og right-justification, which is explained below). However, no current
audio equipement exceeds 24-bit resolution26. In a 16-bit or less system, the SCLK
usually runs at 32 fS, easing the timing requirements. The LRCLK runs at the sample
frequency fS.

One should also notice that the sample MSB comes one BCLK-cycle after a transition
on LRCK. This is how the I2S-standard is specified and is often referred to as I2S-
justification. However, most audio components also allow for left-justification (the
MSB comes when the LRCK toggles, one cycle earlier than I2S-justification) or right-
justification (the LSB comes at the last BCLK cycle before LRCK toggles) of the data
stream. One should notice that right-justification as mentioned demands the same
wordlength on transmitter and receiver.

SPI (Serial Peripheral Interface)


The SPI- (Serial Peripheral Interface) format is a four-wire synchronous, full-duplex
data transfer bus developed for low-complexity data interfacing between peripherals
in computer systems or embedded systems. It uses four wires, MOSI (Master Out,
Slave In), MISO (Master In, Slave Out), SCK (Serial Clock) and NSS (Negative Slave
Select). The active-low NSS-pin is used to select a slave device and enable data

26
Even though the digital resolution or wordlength in modern audio equipement is usually 24-bits, the
effective resolution, given by A/D- and D/A-converter linearity and system noise levels, is currently
only around 20-bit in state-of-the-art systems. However, a seemingly excessive wordlength (true 24-bit
resolution seems impossible with todays technology) allows for more accurate digital signal
processing, with less degradation of signal quality.

145
transfer between it and the master. The MOSI and MISO are the data lines between the
master and slave, and the SCK is used to clock the transfer. A typical SPI-system with
a master (for instance a microcontroller) and three slave devices is shown in figure
A1-5.

Figure A1- 5 Typical SPI system [reference A1-5]

The data on both the MOSI and MISO pin is transferred MSB-first. A SPI-pin also
places the MISO-pin in tristate (high-impedance) when it is not selected, so it’s output
does not load the bus.

WAV (Waveform Audio Format)


The WAV (Waveform Audio) format [reference A1-6] is a proprietary Microsoft
format and part of the RIFF family (Microsoft media-format family). It stores raw,
uncompressed audio samples as PCM-values and can be used with any normal
sample-rate or wordlength. The number of channels can be one (mono), where the
samples are stored successively, or two (stereo) where every other sample is a left- or
right-channel sample. In computer audio, WAV has become the standard for storing
uncompressed digital audio. Even though Apple has it’s own format, AIFF, almost
any audio software and all computers, including all Macintosh models, can read and
write WAV-files.

The WAV-format is very simple. In addition to raw audio data, it consists of a header
which identifies it as a WAV-file. The header also tells the application if it’s mono or
stereo, what the the sample rate and resolution (wordlength) is, the filesize and also
some other information. The 44-byte header is stored at the start of the file and is
followed by the audio data like shown in figure A1-6

146
Figure A1- 6 WAV audio file header [reference A1-6]

The header is often reffered to as consisting of three ”chunks” of information. These


are identified as follows:

1. RIFF Chunk
a. Byte 0-3: ”RIFF” (ASCII characters); identifies the file as a RIFF-file.
b. Byte 4-7: Total length of package to follow (binary, little endian).
c. Byte 8-11: ”WAVE” (ASCII characters); identifies the file as a WAV-
file.
2. FORMAT Chunk
a. Byte 0-3: ”fmt_” (ASCII characters); identifies the start of the format
chunk.
b. Byte 4-7: Length of format chunk (binary, always 0x10).
c. Byte 8-9: Always 0x10.
d. Byte 10-11: Number of channels (1 – mono, 2 – stereo).
e. Byte 12-15: Sample rate (binary, in Hz).
f. Byte 16-19: Bytes per second (samplerate"#channels"bitspersample/8).
g. Byte 20-21: Bytes per sample (align: 1 = 8-bit mono, 2 = 8-bit stereo
or 16-bit mono etc.).
h. Byte 22-23: Bits per sample (sw).
3. DATA Chunk
a. Byte 0-3: ”data” (ASCII characters); identifies start of the data chunk.
b. Byte 4-7: Length of data to follow.
c. Byte 5…: Audio data.

The header can also in some cases contain other chunks that specifies index marks,
textual description of the sound etc., but these are not relevant for this project, so they
will not be investigated further in this report. The interested reader is recommended
reference A1-6, ”The File Format Handbook” by Gunter Born.

References:
A1-1: IEC 958 ”Digital Audio Interface” whitepaper, European Broadcasting
Union, 1989.
A1-2: ”About SP-dif”, Tomi Engdahl
A1-3: ”The Inter IC Sound” whitepaper, Philips corp.
A1-4: AKM 4553 datasheet, AK corp.
A1-5: Silicon Laboratories C8051F00x datasheet rev. 1.7
A1-6: ”The File Format Handbook”, Gunter Born, 1995, ITP-Boston

147
Appendix 2. Data Converter Fundamentals
As mentioned in the theory chapter, digitizing an audio signal involves two processes,
sampling and quantization. When sampling, the amplitude of the signal is measured at
a fixed sampling interval T. The interval is usually described with the sampling
frequency fS=1/T. Sampling converts the signal from continous time to discrete time.
When quantizing the amplitude is assigned to a number of discrete values between 2-B
and 2B where B is the number of bits in the digital representation. This is, as
previously explained, LPCM code. The result is a discrete-time and discrete-
amplitude digital signal. The illustration from figure 2 is repeated for clarity.

Figure A2- 1 Sampling and quantization of audio signal

A sampled discrete time sinusiod can may be expressed as

Eq. A2- 1 x[n]= A ! cos("ˆ n + # ) ,+<n<+ ;[reference A2-1]

Where n is an integer variable (the sample instant) and !ˆ is the sample frequency
given by ,"T. , is the signal’s ”analog” frequency in radians per second (, = 2-f
where f is the frequency in hertz) and T is the sample period. By definition a discrete-
time signal x[n] is periodic only if it’s frequency !ˆ is a rational number, that is:

Eq. A2- 2 x[n+N] = x[n] , for all n ;[reference A2-1]

The smallest period N for which this is true is called the fundamental period. It can
easily be shown that for the discrete time sinusiod the fundamental period is 2-
because:

148
Eq. A2- 3 cos[(!ˆ + 2" )n + # ] = cos(!ˆ n + 2" n + # ) = cos(!ˆ n + # ) ;[reference A2-1]

This means that every discrete sinusoidal sequence where !ˆ k = !ˆ + 2k" are
indistinguishable when [!" # $̂ # " ] . On the other hand the sequence of any two
sinusoids with frequency in the range [!" # $̂ # " ] are distinct. The frequencies
outside this range is thus described as aliases of the distinct frequencies. Since
!ˆ = ! T = ! / fs it becomes apparent that:

$ 1 f 1
Eq. A2- 4 !" # $̂ # " => ! " # # " or ! " " ;[reference A2-1]
fs 2 fs 2

must be fulfilled for any analog signal to be given a distinct sampled sequence. The
signal must be below half the sampling frequency. This is known as the Nyquist
frequency or Shannon’s sampling theorem after Harry Nyquist and Claude Shannon
who derived it. An attempt to sample anything outside the Nyquist frequency will as
equation A1-3 indicates produce an unwanted signal of which the input is an alias. To
avoid this, filtering must be performed before AD-conversion. Likewise, filtering is
done after DA-conversion to avoid aliases as well as the original spectrum being
reproduced from the digital sequence. Both pre-ADC and post-DAC filtering is
referred to as antialias-filtering og just antialiasing.

The other fundamental limitation in digital signals is the resolution, given by the
quantization. For a B-bit digital quantization the smallest distance, the quantization
step Q, is given by R/2B where R is the signal range (see figure A1-1). A roundoff
error is subsequently made. If it is assumed to be random, it is given as a white
distribution between:

Q Q
Eq. A2- 5 ! "e" ;[reference A2-3]
2 2

This gives a RMS-error of

Q /2
1 Q
Eq. A2- 6 eRMS = e2 = "
Q !Q /2
e2 de =
12
;[reference A2-3]

If the signal to be quantized is a random signal distributed between 0 and R the signal-
to-noise ratio (SNR) will be:

" Vin( RMS ) % " R / 12 %


Eq. A2- 7 SNR = 20 ! log $ ' = 20 ! log $ ' = 20 ! log(2 B ) = 6.02 ! B [dB]
# eRMS & # Q / 12 &

149
This is referred to as the ”6dB per bit rule”. For a sinusoidal input the SNR can easily
be calculated to 6.02B+1.76 dB by using the RMS-value for a sinusiod of amplitude
R/2.

However, although these are the only fundamental limitations of a signal digitized at fs
and with B bits there are other nonidealities in the conversion that can compromise
performance.

Figure A2-2 shows the transfer characteristic for an ideal 2-bit ADC and a DAC

Figure A2- 2 Transfer characteristic for ideal 2-bit ADC and DAC [ref A2-2]

The ideal ADC assigns a new value exactly at the quantization interval while the ideal
DAC draws a completely straight line between the sample values. In real-life
however, there are several factors that compromise performance:

- Offset-error: DAC: The output that occurs for the input code that
should produce zero output. ADC: The output code for a zero volt
input level.
- Gain-error: The difference between the ideal and actual full-scale value
when the offset error has been reduced to zero.
- Differential nonlinearity error (DNL): the variation in analog step sizes
away from 1 LSB with the two above removed. DNL values are
defined for each digital value.
- Integral nonlinearity error (INL): The difference between the ideal and
actual transfer curve when offset- and gain-error has been removed.
The maximum INL is also often referred to as absolute accuracy.

150
Figure A2- 3 INL error and red. in SFDR (spurious free dynamic range) [ref A2-4]

As can be seen, these errors introduce nonlinearity or distortion. The resulting


resolution is often referred to as SFDR or spurious-free dynamic range and is
measured in dB or effective number of bits. For current state-of-the-art 24-bit DACs,
the effective number of bits is in the range of 20 bits.

Another non-ideality of data conversion is jitter. Jitter occurs when there is variation
in the sample period T due to inaccuracy in the systems clock signals. Jitter leads to
distortion of the signal as shown in figure A2-4.

Figure A2- 4 Distortion as a consequence of jitter [ref A2-3]


It can be shown that for a 16-bit system reproducing a 20kHz tone at full level, the
jitter distortion will be higher than the quantization noise at >127ps jitter, and it will
thus reduce the SFDR. In high-end audio applications, jitter is currently one of the
performance bottlenecks.

The final performance limitation reviewed here is granulation noise. It was previously
assumed that the quantization noise e is random. However, for low values or a very
few bit representation this is not the case. This can easily be understood by looking at
the output and error from a few-bit ADC.

151
Figure A2- 5 Transfer curve and error for few-bit ADC [ref A2-2]

As we can see, there is correlation between the signal and the noise, which leads to a
distortion called granulation noise. The granulation noise is mostly audible at low
volumes and sounds much more uncomfortable than straight white noise. Therefore
requantization is often done together with dithering, a process where white noise is
added to the signal before it is truncuated. The point is to decorrelate the signal and
the noise and thus substitute uncomfortable distortion with white noise. Dithering is
displayed in figure A2-6 and the effect of it in figure A2-7.

Figure A2- 6 Dithering and quantization [ref. A2-3]

152
Figure A2- 7 The effect of dithering on a signal with amplitude 2Q [ref. A2-3]

The dither-signal is often generated by an independent random noise-source and


should then have a maximum amplitude of "# ! 12 Q, 12 Q $% . This will lead to a 3dB
decrease of SNR, but the distortion will be reduced significantly and the sonic result
is an improvement. Almost every requantization in modern hifi-circuits is done with
dithering.

However, it can be shown that a random noise source as a dither generator is not
ideal. Using random noise is known as rectangular dither. This because the dither
signal has a rectangular probability density function (PDF). It can be shown that the
rectangular dither does not completely decorrelate the signal and the quantization
noise. Triangular dither does exactly that. Realized as a convultion of two random
noise-sources it will have a decreasing or triangular PDF. The amplitude can however
reach ±Q and it can be shown that the nominal noise floor will increase with 4.77dB
as opposed to 3dB. This is made up for by the resulting quantization noise, with
triangular dithering, having a completely uniform mean value and variance, i.e.
complete decorrelation from the signal (white noise). Thus triangular dither is usually
preferred in audio applications. Triangular dithering can digitally be realized easily by
passing the output from a random noise source through a (1-z-1) filter. The noise-
source is in normally made by a pseudo-random number generator.

When quantizing an analog signal on the other hand, the dither source also has to be
analog. To generate triangular dithering with solely analog components is not
possible. Analog dithering is often realized with a gaussian PDF, since this is the
same probability distribution as for natural white noise or thermal noise. Thermal

153
noise is generated by resistance in a circuit and a ±Q gaussian dither can thus be
realized with nothing more than a simple diode or resistor (diodes are normally used,
to avoid loading the input). Gaussian dithering is however less ideal than triangular
since it increases the nominal noise-floor by 6dB.

Figure A2- 8 The PDF of gaussian, rectangular and triangular dither [ref. A2-3]

References:
A2-1: Proakis, John et.al.: ”Digital Signal Processing, Principles, Algorithms
and Applications”, Prentice Hall 1996.
A2-2: Johns, David et.al: ”Analog Integrated Circuit Design”, John Whiley
& Sons, 1992
A2-3: Løkken, Ivar et.al: ”One-O digital amplifier”, bachelor thesis, HiST
2002
A2-4: Løkken, Ivar: ”Delta-sigma Audio DAC for SoC applications”, project
report, NTNU 2003

154
Appendix 3. Schematics

155
156
157
158
159
160
161
Appendix 4. Components List

162
Appendix 5. PCB-Layout

163
164
165
166
167
168
169
170
171
172
173
174
175
Appendix 6. Source-Code, C.
DPCM encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//DPCM encoder, 4:1 compression..............………..//
//Works with 16-bit mono WAV-file on big-endian//
//systems....................................…………………….//
//...........................................………………………..//
//Ivar Løkken, NTNU, 2004....................………….//
/////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

//DPCM logarithmic quantization table


//One set for positive values, one for negative
static int quantTable[16] = {
0, -4, -16, -64, -256, -1024, -4096, -16384,
0, 4, 16, 64, 256, 1024, 4096, 16384
};

int main(void)
{
FILE *fp, *op;

fp = fopen("in.wav", "rb"); //open wav-file for reading


op = fopen("out.dp", "wb"); //open output-file for writing

if (fp) {
//wav header data
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
short value = 0; //current input sample value
short value_temp = 0; //for endian change
unsigned char delta = 0; //current dpcm output value
int diff = 0; //difference, actual and predicted value
short valpred = 32767; //predicttion value for feedback
unsigned char outputbuffer; //two-sample buffer
int bufferstep; //toggle between outputbuffer fill/write

//read and write wav header


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) {
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) {
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);

176
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// RUN COMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

//first order prediction (difference)


diff=value-valpred;

//set signbit and work with absolute values.


if (diff < 0) {
diff = (-diff);
delta = 0;
} else {
delta = 8;
}

//find the four bit output code


//(binary tree bit 2-4, first is sign bit)
//Sxxx, where S is sign

//check second bit


if (diff >= 256) {
//set second bit => S1xx
delta |= 4;
//check third bit
if (diff >= 4096) {
//set third bit => S11x
delta |= 2;
//check fourth bit
if (diff >= 16384) {
//S111
delta |= 1;
} else {
//third bit 0 => S10x
//check fourth bit
if (diff >= 1024) {
//S101

177
delta |= 1;
}
}
}
//S0xx
} else {
if (diff >= 16) {
//S01x
delta |= 2;
if (diff >= 64) {
//S011
delta |= 1;
}
//S00x
} else {
if (diff >= 4) {
//S001
delta |= 1;
}
}
}

//feedback dequantized delta


valpred += quantTable[delta];

//put two samples in the 8-bit output-buffer and write it to file


//if bufferstep == 1; buffer = cccc0000 (c=current adpcm sample)
//else: buffer = ccccpppp (c=current, p=previous)
//then write it to file
//bufferstep toggles - two samples in buffer before write
if (bufferstep) {
outputbuffer = (delta << 4) & 0xf0;
} else {
outputbuffer = (delta & 0x0f) | outputbuffer;
fwrite(&outputbuffer, sizeof(char), 1, op);
}
bufferstep = !bufferstep;
}
//output last step, if necessary
if (!bufferstep)
fwrite(&outputbuffer, sizeof(char), 1, op);
}
fclose(fp);
fclose(op);
}

178
//////////////////////////////////////////////////////////////////////////////
//DPCM decoder, 4:1 compression.............………...//
//Works with 16-bit mono WAV-file on big-endian//
//systems....................................……………………..//
//...........................................………………………...//
//Ivar Løkken, NTNU, 2004....................…………..//
/////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

//DPCM logarithmic quantization table


//One set for positive values, one for negative

static int quantTable[16] = {


0, -4, -16, -64, -256, -1024, -4096, -16384,
0, 4, 16, 64, 256, 1024, 4096, 16384
};

int main(void)
{
FILE *fp, *op;
fp = fopen("in.dp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing

if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
char delta = 0; //current dpcm input value
int valpred = 32767; //predicted output value
short valout; //ouput value for writing
char inputbuffer = 0; //2-sample input buffer
int bufferstep = 0; //toggle between inputbuffer/input

//read and write wavinfo


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //if it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);

179
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged withmsbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

//RUN DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

//step 1, read the 8-bit buffer containing two samples and put the right one to the delta variable
if (bufferstep) {
delta = inputbuffer & 0x0f;
} else {
fread(&inputbuffer, sizeof(char), 1, fp);
delta = (inputbuffer >> 4) & 0x0f;
}
//the above must be done every second run so that the char is split
//and read into two deltas, since there are two residuals in each
bufferstep = !bufferstep;
//update predicted output value (last value + dequantized current difference)
valpred += quantTable[delta & 0x0f];
//limit output value to 16-bits
if ( valpred > 32767 )
valpred = 32767;
else if ( valpred < -32768 )
valpred = -32768;
valout = 0;
//reverse endian and write to wav-file
valout = ((valpred & 0x00ff)<<8);
valout = (valout | ((valpred & 0xff00)>>8));
fwrite(&valout, sizeof(short), 1, op);
}
}
fclose(fp);
fclose(op);
}

180
IMA ADPCM encoder and decoder:
/////////////////////////////////////////////////////////////////////////////////
//IMA ADPCM compatible encoder, 4:1 compression//
//Works with 16-bit mono WAV-file on big-endian…//
//systems....................................……………………….//
//...........................................…………………………...//
//Ivar Løkken, Mar. 2004.....................……………….//
////////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

//adpcm state variable structure


struct adpcm_state {
short valprev;
char index;
};

//ADPCM index adjustment table as given by IMA ADPCM standard


//one set for positive values and one set for negative

static int indexTable[16] = {


-1, -1, -1, -1, 2, 4, 6, 8,
-1, -1, -1, -1, 2, 4, 6, 8
};

//Quantization step table as given by IMA ADPCM standard

static int stepsizeTable[89] = {


7, 8, 9, 10, 11, 12, 13, 14, 16, 17,
19, 21, 23, 25, 28, 31, 34, 37, 41, 45,
50, 55, 60, 66, 73, 80, 88, 97, 107, 118,
130, 143, 157, 173, 190, 209, 230, 253, 279, 307,
337, 371, 408, 449, 494, 544, 598, 658, 724, 796,
876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066,
2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358,
5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899,
15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767
};

int main(void) {
FILE *fp, *op;

fp = fopen("in.wav", "rb"); //open wav-file for reading


op = fopen("out.adp", "wb"); //open output-file for writing

if (fp) {
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
struct adpcm_state *state; //encoder status structure
short value; //current input sample value
short value_temp; //temp value for endian-flip
int sign; //current adpcm sign bit
int delta; //current adpcm output value
int diff; //difference (prediction result)
int step; //stepsize
int valpred; //predicted output value
int vpdiff; //current change to valpred
int index; //step change index
char outputbuffer; //2 sample buffer
int bufferstep = 1; //toggle between outputbuffer/output

181
char out; //output variable

//read and write wav header info


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //if it is a wavefile, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

//Initiate encoder state


valpred = state->valprev;
index = state->index;
step = stepsizeTable[0];

//START COMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

//read input sample and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

//calculate difference from previous value


diff=value-valpred;
//set adpcm sign bit: sign set to 8 (1000) if diff<0, set to 0 (0000) else, change to absolute values
//makes algorithm faster since the quantization is symmetric around zero (works on 3bits, instead of 4)
sign = (diff<0) ? 8 : 0;

182
if ( sign ) {
diff = (-diff);
}

//Quantize
delta = 0; //output value initialization
vpdiff = (step >> 3); //vpdiff = step/8
if (diff >= step) { //if the difference diff is bigger than step
delta = 4; //first value bit is set (4=100)
diff -=step; //decrement diff by value step
vpdiff += step; //vpdiff = step/8+step = 9step/8*/
}
step >>=1; //rightshift step 1 bit
if (diff >= step) { //diff bigger than new step (step/2)?
delta |=2; //if yes, set second bit
diff -= step; //decrement diff by value step
vpdiff += step; //vpdiff = 9step/8 + step/2 = 13step/8
}
step >>=1; //rightshift step 1 bit
if (diff >= step) { //diff bigger than new step?
delta |=1; //set the third and final value bit
vpdiff += step; //vpdiff = 13step/8 + step/4 = 15step/8
} //(the same as absolute value for step + sign bit)

//Update previous value and (with sign)


if (sign) {
valpred -= vpdiff;
} else {
valpred -= -vpdiff;
}
//Limit previous value to 16-bits
if (valpred > 32767) {
valpred = 32767;
} else if (valpred < -32768) {
valpred = -32768;
}

//Assemble value, update index and step values


delta |= sign; //Put sign-bit back to output value
index += indexTable[delta]; //Update index-table

//Make sure index does not exceed index table length


if ( index < 0 )
index = 0;
if ( index > 88 )
index = 88;
step = stepsizeTable[index];
//step updated to the table entry given by index

//Fill buffer (previous and current sample) and


//output value when buffer is full (every second run)
if (bufferstep) {
outputbuffer = (delta << 4) & 0xf0;
} else {
out = (delta & 0x0f) | outputbuffer;
fwrite(&out, sizeof(char), 1, op);
}
//Bufferstep makes sure the above goes right
bufferstep = !bufferstep;
}
//Output last value, if necessary
if (!bufferstep)
fwrite(&outputbuffer, sizeof(char), 1, op);
//Update state
state->valprev = valpred;
state->index = index;
}

183
fclose(fp);
fclose(op);
}

//////////////////////////////////////////////////////////////////////////////////
//IMA ADPCM compatible decoder, 1:4 compression//
//Works with 16-bit mono WAV-file on big-endian…//
//systems....................................……………………….//
//...........................................…………………………...//
//Ivar Løkken, Mar. 2004.....................……………….//
////////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

//adpcm state variable structure


struct adpcm_state {
short valprev;
char index;
};

//ADPCM index adjustment table as given by IMA ADPCM standard


//one set for positive values and one set for negative

static int indexTable[16] = {


-1, -1, -1, -1, 2, 4, 6, 8,
-1, -1, -1, -1, 2, 4, 6, 8
};

//Quantization step table as given by IMA ADPCM standard

static int stepsizeTable[89] = {


7, 8, 9, 10, 11, 12, 13, 14, 16, 17,
19, 21, 23, 25, 28, 31, 34, 37, 41, 45,
50, 55, 60, 66, 73, 80, 88, 97, 107, 118,
130, 143, 157, 173, 190, 209, 230, 253, 279, 307,
337, 371, 408, 449, 494, 544, 598, 658, 724, 796,
876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066,
2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358,
5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899,
15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767
};

int main(void)
{
FILE *fp, *op;
fp = fopen("in.adp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing

if (fp)
{
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
struct adpcm_state *state; //encoder status structure
short value_out = 0; //output value
int sign; //current adpcm sign bit
int delta; //current adpcm output value
int diff; //difference, current and previous value
int step; //stepsize
int valpred; //predicted output value
int vpdiff; //current change to valpred

184
int index; //step change index
char inputbuffer; //place to keep previous 4-bit value
int bufferstep = 0; //toggle between outputbuffer/output

//read and write wav header


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //if it is a wavefile, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged withmsbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

//Initiate decoder state


valpred = state->valprev;
index = state->index;
step = stepsizeTable[0];

//START DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

//Get the delta value


//every second value stored in the 4msbs and 4lsbs of a file char
if (bufferstep) {
delta = inputbuffer & 0x0f;
} else {
fread(&inputbuffer, sizeof(char), 1, fp);
delta = (inputbuffer >> 4) & 0x0f;
}
//Bufferstep controls the read operation

185
bufferstep = !bufferstep;

//Find and limit the new index value


//limit it so it stays within the table length
index += indexTable[delta];
if (index < 0)
index = 0;
if (index > 88)
index = 88;

//Seperate sign and magnitude


sign = delta & 8;
delta = delta & 7;

//Compute difference and new predicted value (de-quantize), bitwise update


vpdiff = step >> 3;
if (delta & 4)
vpdiff += step;
if (delta & 2)
vpdiff += step>>1;
if (delta & 1)
vpdiff += step>>2;

//restore sign
if ( sign )
valpred -= vpdiff;
else
valpred += vpdiff;

//Limit output value to 16-bits


if ( valpred > 32767 )
valpred = 32767;
else if ( valpred < -32768 )
valpred = -32768;

//Update step value


step = stepsizeTable[index];

//Change endian and copy the value to output variable


value_out = ((valpred & 0x00ff)<<8);
value_out = (value_out | ((valpred & 0xff00)>>8));
//write it to file
fwrite(&value_out, sizeof(short), 1, op);
}
//update state
state->valprev = valpred;
state->index = index;
}
}

186
µ-law encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//mu-law encoder, 2:1 compression...........………....//
//Works with 16-bit mono WAV-file on big-endian//
//systems..................................……………………...//
//.......................................………………………......//
//Ivar Løkken, Mar. 2004.....................……………//
////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

// mu-law exponential lookup table


static char exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};

//if you do not want to use table, the exponent can be found with the following
//equation. Lookup-table requires memory, but is faster
// value_temp = (value << 1);
// for (exp = 7; exp > 0; exp--) {
// if (value_temp & 0x8000) break;
// value_temp = (value_temp << 1);
// }

int main(void)
{
FILE *fp, *op;

fp = fopen("in.wav", "rb"); //open wav-file for reading


op = fopen("out.mul", "wb"); //open output-file for writing

if (fp) {
//wav info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short sign = 0; //sign-bit
char exp = 0; //exponent (position of rightmost 1)
short mantis = 0; //mantissa
unsigned char outputbuffer = 0; //output buffer

//read and write wav header info


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);

187
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// RUN COMPRESSION

for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 0x0080;
} else {
sign = 0x0000;
}

// clip value
if (value > 32635) {
value = 32635;
}

// add bias
value = value + 0x84;

// find exponent value (0 to 7, can also be done with equation)


exp = exp_lut[(value >> 7) & 0xFF];

188
// get the mantissa
mantis = (value >> (exp + 3)) & 0x000f;

// put together output byte


outputbuffer = (sign | (exp << 4) | mantis);
fwrite(&outputbuffer, sizeof(char), 1, op);
}
}
fclose(fp);
fclose(op);
}

//////////////////////////////////////////////////////////////////////////////
//mu-law decoder, 2:1 compression...........………....//
//Works with 16-bit mono WAV-file on big-endian//
//systems..............................…………………….......//
//................................……………………….............//
//Ivar Løkken, Mar. 2004..............…………….......//
////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

// exponent recovery table


static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};

int main(void)
{
FILE *fp, *op;

fp = fopen("in.mul", "rb"); //open wav-file for reading


op = fopen("out.wav", "wb"); //open output-file for writing

if (fp) {
//wav file info variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
short valout = 0; //ouput value for writing
unsigned char inputbuffer = 0; //input buffer
char sign = 0; //sign
char mantis = 0; //mantissa
char exp = 0; //exponent
short out = 0; //ouput variable

//read and write wav header info


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //if it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);

189
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

//RUN DECOMPRESSION
for( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value


fread(&inputbuffer, sizeof(char), 1, fp);

// get sign, exp and mantissa


sign = (inputbuffer & 0x80);
exp = (inputbuffer >> 4) & 0x07;
mantis = inputbuffer & 0x0f;

// restore output value and sign


valout = exp_lut[exp] + (mantis << (exp + 3));
if (sign != 0) {
valout = -valout;
}

// convert back to big endian


out = ((valout & 0x00ff)<<8);
out = (out | ((valout & 0xff00)>>8));

// write output value


fwrite(&out, sizeof(short), 1, op);
}
}
fclose(fp);
fclose(op);
}

190
iLaw encoder and decoder:
//////////////////////////////////////////////////////////////////////////////
//Custom mu-law-based encoder..........……….........//
//Works with 16-bit mono WAV-file on big-endian//
//systems...................................……………………..//
//.......................................………………………......//
//Ivar Løkken, Mar. 2004.............……………........//
////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

// mu-law exponential lookup table


static char exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};

static int exp_lut2[8] = {0,132,396,924,1980,4092,8316,16764};

int main(void)
{
FILE *fp, *op;

fp = fopen("in.wav", "rb"); //open wav-file for reading


op = fopen("out.mul", "wb"); //open output-file for writing

if (fp) {
//wav file info variables
char id[4];
unsigned long size, data_size, data_size_sw, loop;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//predictor variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short valprev = 0; //previous value
int dif; //first order prediction value
int dif2 = 0; //second order prediction value
short d2 = 0; //predictor output
int difprev = 0; //previous first order prediction value
int d2o; //decoded error value for feedback

//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short exp = 0; //exponent (position of rightmost 1)
unsigned short mantis = 0; //mantissa
unsigned short outputbuffer[8]; //output buffer
unsigned short shortbuffer[5]; //16-bit buffer for writing to file as short
int i = 0;

191
//read and write wav info
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
loop = data_size_sw/16;
printf("Data size: %d \n", data_size_sw);

// RUN COMPRESSION

for ( ; loop>0; loop=loop-1) {

for(i=0; i<=7;i++){

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

// second order linear prediction


dif = value - valprev;
dif2 = dif - difprev;

//toss away LSB since the mulaw will do that anyway


d2 = dif2>>1;

// convert to sign-magnitude
if (d2 < 0) {
d2 = (-d2);

192
sign = 0x0200;
} else {
sign = 0x0000;
}

// clip value
if (d2 > 32635) {
d2 = 32635;
}

// add bias
d2 = d2 + 0x84;

// find exponent value (0 to 7)


exp = exp_lut[(d2 >> 7) & 0xFF];

// get the mantissa


mantis = (d2 >> (exp + 1)) & 0x003f;

// put together output byte


outputbuffer[i] = (sign | (exp << 6) | mantis) & 0x03ff;

// decode error value


d2o = (exp_lut2[exp] + (mantis << (exp + 1)))<<1;
if (sign != 0) {
d2o = -d2o;
}
difprev += d2o;
valprev += difprev;

//put together output variable


//outputbuffer holds 10 compressed samples or 80 bytes
shortbuffer[0]=outputbuffer[0]|outputbuffer[1]<<10;
shortbuffer[1]=(outputbuffer[1]>>6)|(outputbuffer[2]<<4)|(outputbuffer[3]<<14);
shortbuffer[2]=(outputbuffer[3]>>2)|(outputbuffer[4]<<8);
shortbuffer[3]=(outputbuffer[4]>>8)|(outputbuffer[5]<<2)|(outputbuffer[6]<<12);
shortbuffer[4]=(outputbuffer[6]>>4)|outputbuffer[7]<<6;

// write output value


for(i=0; i <= 4; i++) {
fwrite(&shortbuffer[i], sizeof(short), 1, op);
}
}
}
fclose(fp);
fclose(op);
}

//////////////////////////////////////////////////////////////////////////////
//Custom mu-law-based decoder.............………......//
//Works with 16-bit mono WAV-file on big-endian//
//systems.............................……………………........//
//..................................………………………...........//
//Ivar Løkken, Mar. 2004.........……………............//
////////////////////////////////////////////////////////////////////////////

#include <stdio.h>

//exponent recovery table

193
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
//static int exp_lut[8] = {0,

int main(void)
{
FILE *fp, *op;

fp = fopen("in.mul", "rb"); //open wav-file for reading


op = fopen("out.wav", "wb"); //open output-file for writing

if (fp) {

//wav info variables


char id[4];
unsigned long size, data_size, data_size_sw, loop;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
unsigned short inputbuffer[5]; //input buffer
unsigned short tempbuffer[8]; //decoded sample buffer
int valout = 0; //ouput value for writing
char i; //counting variable

//decoder variables
unsigned short sign = 0;
unsigned short mantis = 0;
unsigned short exp = 0;

//predictor
short out = 0; //output variable
int difout = 0; //difference
int d1out = 0; //difference of differences (2nd order)

//read and write wav info


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //if it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {

194
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged withmsbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);
loop = data_size_sw/16;

//RUN DECOMPRESSION
for( ; loop>0; loop=loop-1) {

// read input values


for(i=0; i <= 4; i++) {
fread(&inputbuffer[i], sizeof(short), 1, fp);
}

//put samples in to separate buffer places


tempbuffer[0]= inputbuffer[0] & 0x03ff;
tempbuffer[1]= 0x03ff & ((0x003f&(inputbuffer[0]>>10))|(0x03c0&(inputbuffer[1]<<6)));
tempbuffer[2]= 0x03ff & (inputbuffer[1]>>4);
tempbuffer[3]= 0x03ff & ((0x0003&(inputbuffer[1]>>14))|(0x03fc&(inputbuffer[2]<<2)));
tempbuffer[4]= 0x03ff & ((0x00ff&(inputbuffer[2]>>8))|(0x0300&(inputbuffer[3]<<8)));
tempbuffer[5]= 0x03ff & (inputbuffer[3]>>2);
tempbuffer[6]= 0x03ff & ((0x000f&(inputbuffer[3]>>12))|(0x03f0&(inputbuffer[4]<<4)));
tempbuffer[7]= 0x03ff & (inputbuffer[4]>>6);

//RUN DECOMPRESSION
for(i=0; i<= 7; i++) {
//find sign, exponent, mantissa
sign = tempbuffer[i] & 0x0200;
exp = (tempbuffer[i]>>6) & 0x0007;
mantis = (tempbuffer[i] & 0x003f);

// restore output value


difout = (exp_lut[exp] + (mantis << (exp+1)))<<1;
if (sign != 0) {
difout = -difout;
}

//prediction
d1out += difout;
valout += d1out;

//clip output value


if (valout>32767)
valout = 32767;
if (valout<-32768)
valout = -32768;

// convert back to big endian


out = ((valout & 0x00ff)<<8);
out = (out | ((valout & 0xff00)>>8));

// write output value


fwrite(&out, sizeof(short), 1, op);
}
}
}
fclose(fp);
fclose(op);
}

195
Entropy coding tester, Rice-, Pod- and iPod encoder and
decoder
///////////////////////////////////////////////////////////////////
//entropy coding test program................……//
//pod vs rice vs ipod test encoder...........…...//
//no prediction, but it......................………...//
//can easily be included in main if desired...//
//...........................................………………..//
//Ivar Løkken, NTNU 2004.....................….//
//x86 users, remove byteswapping.............…//
//////////////////////////////////////////////////////////////////

#include <stdio.h>

// table for output bitshift


static unsigned short bittab[16] = {
0x0001,0x0002,0x0004,0x0008,0x0010,0x0020,0x0040,0x0080,
0x0100,0x0200,0x0400,0x0800,0x1000,0x2000,0x4000,0x8000};

FILE *fp, *op, *tp;

//wav file info variables


char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
unsigned short out = 0; //output variable
unsigned short maxwordlength = 0; //max wordlength indicator
unsigned char coding = 0; //Pod or Rice selector
unsigned char prefixbits = 0; //number of bits in the prefix

//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k = 6;
unsigned long A = 0; //accumulated value for calculation of k
unsigned char N = 0; //sample count
short i = 0; //counting variable
short j = 15; //counting variable
short x = 0; //how often is new k calculated

//encoder functions
void pod_encoder(void);
void rice_encoder(void);
void ipod_encoder(void);

int main(void)
{

fp = fopen("in.wav", "rb"); //open wav-file for reading


op = fopen("out.comp", "wb"); //open output-file for writing
tp = fopen("test.hex", "wb"); //test file for whatever the user wants to store

if (fp) {
//read and write wav header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue

196
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// RUN COMPRESSION
printf("Please select encoding method (0 = Pod-coding, 1 = Rice-coding, 2 = iPod-coding): ");
scanf("%u", &coding);
if (coding == 0) {
pod_encoder();
} else if (coding == 1) {
rice_encoder;
} else if (coding == 2) {
ipod_encoder();
}
}
fclose(op);
fclose(fp);
fclose(tp);
}

//pod encoder
void pod_coder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);

197
value = (value | ((value_temp & 0xff00)>>8));

// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {
sign = 0;
}

// perform Pod-coding

// find overflow
overflow = 0;
overflow = value >> k;
fwrite(&numzeros, sizeof(char), 1, tp);

// find number of zeros


numzeros = 0;
// overflow can be max (16-k) bits
for (i=0;i<(16-k);i++) {
if (overflow > (bittab[i]-1)) {
numzeros++;
} else {
break;
}
}

// find max wordlength just to see how the coding performs


if (((numzeros<<1)+k+1) > maxwordlength) {
maxwordlength = (numzeros<<1)+k+2;
}

// put together and write output data bit by bit


// data fills the out-variable continuously from
// MSB and downwards
// using bit-table, bittable[j] is a 1 in position
// j counting from LSB to MSB
// when out-variable is filled (j<0), it starts
// filling out a new one immideately

// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}

// zeros followed by overflow or just a one if the overflow is 0


if (numzeros == 0) {
out = out | bittab[j];
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}else{
//zeros
for (i=numzeros; i>0; i--) {
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);

198
j=15;
out=0;
}
}

// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
}

// uncoded part (bit 1 to bit k of value)


for (i=k; i>0; i--) {
if ((value & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j = 15;
out = 0;
}
}

// calculate k for next sample


N++;
A+=value;
x++;
// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
//if (x==64 || N == 255) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}
printf("Max wordlength: %d \n", maxwordlength);
}

//rice encoder
void rice_encoder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {

199
sign = 0;
}
// perform Rice-coding

// find overflow
overflow = 0;
overflow = value >> k;

// find max wordlength just to see how the coding performs


// in worst-case
if ((overflow+k+2) > maxwordlength) {
maxwordlength = overflow + k + 2;
}
// put together and write output data bit by bit
// data fills the out-variable continuously from
// MSB and downwards
// using bit-table, bittable[j] is a 1 in position
// j counting from LSB to MSB
// when out-variable is filled (j<0), it starts
// filling out a new one immideately

// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}

// (overflow) zeros followed by terminating 1


for (i=overflow ; i>0; i--) {
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}
out = out | bittab[j];
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}

// uncoded part (bit 1 to bit k of value)


for (i=k; i>0; i--) {
if ((value & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j = 15;
out = 0;
}
}

// calculate k for next sample


N++;
A+=value;
x++;

200
//new k is calculated for every n samples, where (x==n) is given in if
//comment out if when you want new k calculated for every sample
//if (x==4 || N ==255) {
for (k=0; (N<<k)<A; k++);
x=0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}

}
printf("Max wordlength: %d \n", maxwordlength);
}

//iPod encoder
void ipod_encoder(void)
{
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read input value and change endian


fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));

// convert to sign-magnitude
if (value < 0) {
value = (-value);
sign = 1;
} else {
sign = 0;
}

// perform iPod-coding

// find overflow
overflow = 0;
overflow = value >> k;
// shift coding up one number
overflow = overflow + 1;
fwrite(&prefixbits, sizeof(char), 1, tp);

// find number of bits in prefix


prefixbits = 0;
// overflow can be max (16-k) bits
for (i=0;i<(16-k);i++) {
if (overflow > (bittab[i]-1)) {
prefixbits++;
} else {
break;
}
}

// find max wordlength just to see how the coding performs


if (((prefixbits<<1)+k+1) > maxwordlength) {
maxwordlength = (prefixbits<<1)+k+2;
}

// put together and write output data bit by bit


// data fills the out-variable continuously from
// MSB and downwards
// using bit-table, bittable[j] is a 1 in position
// j counting from LSB to MSB
// when out-variable is filled (j<0), it starts
// filling out a new one immideately

201
// if the value is positive
if (sign == 0) {
// zeros followed by overflow

//zeros
for (i=prefixbits; i>0; i--) {
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}
// overflow
for (i=prefixbits; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
// if the value is negative, output 1's and inverted overflow
} else if (sign == 1) {
//ones
for (i=prefixbits; i>0; i--) {
out = out | bittab[j];
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}
// inverted overflow
for (i=prefixbits; i>0; i--) {
if ((overflow & bittab[i-1]) == 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}
}

// uncoded part (bit 1 to bit k of value)


for (i=k; i>0; i--) {
if ((value & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j = 15;
out = 0;
}
}

// calculate k for next sample

202
N++;
A+=value;
x++;
// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
//if (x==64 || N == 255) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}
}
printf("Max wordlength: %d \n", maxwordlength);
}

///////////////////////////////////////////////////////////////////
//entropy coding test program................……//
//pod vs rice vs ipod test decoder...........…...//
//no prediction, but it......................………...//
//can easily be included in main if desired...//
//...........................................………………..//
//Ivar Løkken, NTNU 2004.....................….//
//x86 users, remove byteswapping.............…//
//////////////////////////////////////////////////////////////////

#include <stdio.h>

// table for output bitshift


static unsigned short bittab[16] = {
0x0001,0x0002,0x0004,0x0008,0x0010,0x0020,0x0040,0x0080,
0x0100,0x0200,0x0400,0x0800,0x1000,0x2000,0x4000,0x8000};

FILE *fp, *op, *tp;

//wav file info variables


char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//data variables
unsigned short in = 0; //input variable variable
short out = 0;
short valout = 0;
unsigned short maxwordlength = 0; //max wordlength indicator
unsigned char coding = 0; //Pod or Rice selector
unsigned char prefixbits = 0; //number of bits in the prefix

//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k = 6;
unsigned long A = 0; //accumulated value for calculation of k
unsigned char N = 0; //sample count
short i = 0; //counting variable
short j = 15; //counting variable
short x = 0; //how often is new k calculated

//decoder functions
void pod_decoder(void);
void rice_decoder(void);
void ipod_decoder(void);

203
int main(void)
{

fp = fopen("out.comp", "rb"); //open wav-file for reading


op = fopen("out.wav", "wb"); //open output-file for writing
tp = fopen("testd.hex", "wb"); //test file for whatever the user wants to store

if (fp) {
//read and write wav header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels, sizeof(short), 1, fp);
fwrite(&channels, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// RUN COMPRESSION
printf("Please select decoding method (0 = Pod-coding, 1 = Rice-coding, 2 = iPod-coding): ");
scanf("%u", &coding);
if (coding == 0) {
pod_decoder();
} else if (coding == 1) {
rice_decoder;
} else if (coding == 2) {
ipod_decoder();
}
}
fclose(op);

204
fclose(fp);
fclose(tp);
}

//Pod decoder function


void pod_decoder(void)
{
//read the 16 first bits
fread(&in, sizeof(short), 1, fp);

for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read sign
sign = in & bittab[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

// count zeros
numzeros = 0;
while ((in & bittab[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fwrite(&numzeros, sizeof(char), 1, tp);

// if numzeros = 0, skip the "1" prefix


if (numzeros == 0) {
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
// read the next part (numzeros and k bits) bit by bit and construct output
valout = 0;
for (i=(numzeros+k) ; i>0; i--) {
if ((in & bittab[j]) != 0) {
valout = valout | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// calculate k for next sample


// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
N++;
A += valout;
x++;
//if (x==3 || N == 255) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}

// reset accumulation every 255th sample


if (N==255) {
N=0;

205
A=0;
}

// restore sign representation


if (sign != 0)
valout = -valout;

// convert back to big endian


out = ((valout & 0x00ff)<<8);
out = (out | ((valout & 0xff00)>>8));

// write output value


fwrite(&out, sizeof(short), 1, op);
}
}

void rice_decoder(void)
{
//read the 16 first bits
fread(&in, sizeof(short), 1, fp);

for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read sign
sign = in & bittab[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

// count zeros (number of zeros correspond to overflow)


overflow = 0;
while ((in & bittab[j]) == 0){
overflow++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fwrite(&overflow, sizeof(short), 1, tp);

// skip the terminating 1


j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

// read the next, uncoded part (k bits)


for (i=k ; i>0; i--) {
if ((in & bittab[j]) != 0) {
valout = valout | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// put together output value


valout = valout | (overflow<<k);

// calculate k for next sample

206
N++;
A += valout;
x++;
//new k is calculated for every n+1 samples, where (x==n) is given in if
//comment out if when you want new k calculated for every sample
//if (x==4 || N==65535) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}
// reset accumulation every 255th sample
if (N==255) {
N=0;
A=0;
}

// restore sign representation


if (sign != 0)
valout = -valout;

// convert back to big endian


out = ((valout & 0x00ff)<<8);
out = (out | ((valout & 0xff00)>>8));

// write output value


fwrite(&out, sizeof(short), 1, op);
valout = 0;
out = 0;
}
}

void ipod_decoder(void)
{
//read the 16 first bits
fread(&in, sizeof(short), 1, fp);

for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// read first bit to see if value is positive or negative


if (in & bittab[j] == 0) {
sign = 0;
// if first bit zero, non inverted prefix, count zeros
prefixbits = 0;
while ((in & bittab[j]) == 0){
prefixbits++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
// and put overflow to variable
for (i=(prefixbits); i>0; i--) {
if (in & bittab[j] != 0) {
overflow = overflow | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
} else {
// if first bit is one, inverted prefix, count ones
sign = 1;
prefixbits = 0;
while ((in & bittab[j]) != 0){
fwrite((in&bittab[j]), sizeof(short), 1, tp);

207
prefixbits++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fclose(tp);
// and put "deinverted" overflow to variable
for (i=(prefixbits); i>0; i--) {
if (in & bittab[j] == 0) {
overflow = overflow | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
}
// remove upshift
overflow = overflow - 1;
valout = 0;
valout = (overflow << k);

// read the next part (k bits) bit by bit and construct output
for (i=k ; i>0; i--) {
if ((in & bittab[j]) != 0) {
valout = valout | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// calculate k for next sample


// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
N++;
A += valout;
x++;
//if (x==3 || N == 255) {
for (k=0; (N<<k)<A; k++);
x = 0;
//}

// reset accumulation every 255th sample


if (N==255) {
N=0;
A=0;
}

// restore sign representation


if (sign != 0)
valout = -valout;

// convert back to big endian


out = ((valout & 0x00ff)<<8);
out = (out | ((valout & 0xff00)>>8));

// write output value


fwrite(&out, sizeof(short), 1, op);
}
}

208
Final lossless codec, encoder and decoder
//////////////////////////////////////////////////////////////////////
//compression test program..........……….........//
//pod-encoding........................……………......//
//selectable prediction......................……….…//
//from no prediction up to fourth order....…...//
//mono or stereo.............................…………...//
//...........................................………………….//
//Written for Macintosh, Intel users remove...//
//endian conversion....................………….......//
//........................................…………………....//
//encoder...............................……………........//
//.........................................…………………...//
//Ivar Løkken, NTNU, 2004............……........//
/////////////////////////////////////////////////////////////////////

#include <stdio.h>

// table for output bitshift


static unsigned long bittab[32] = {
0x00000001,0x00000002,0x00000004,0x00000008,0x00000010,0x00000020,0x00000040,0x00000080,
0x00000100,0x00000200,0x00000400,0x00000800,0x00001000,0x00002000,0x00004000,0x00008000,
0x00010000,0x00020000,0x00040000,0x00080000,0x00100000,0x00200000,0x00400000,0x00800000,
0x01000000,0x02000000,0x04000000,0x08000000,0x10000000,0x20000000,0x40000000,0x80000000};

//predictor variables
long valprev[2] = {0,0}; //previous value
long diff[2] = {0,0}; //difference
long diffprev[2] = {0,0}; //previous difference
long diff2[2] = {0,0}; //second order difference
long diff2prev[2] = {0,0}; //previous second order difference
long diff3[2] = {0,0}; //and so forth, [2] because of stereo
long diff3prev[2] = {0,0}; //one for each channel
long diff4[2] = {0,0};
long diff4prev[2] = {0,0};
long residual = 0; //prediction residual

//encoder variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, output wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
int chandec = 0; //channel decorrelation indicator

//data variables
short value = 0; //current input sample value
short value_temp = 0; //temp value
short left = 0; //left channel value
short right = 0; //right channel value
long side = 0; //Side = L-R

//misc. variables
short i = 0; //counting variable
short j = 15; //counting variable
unsigned char m = 0; //left/right indicator
short x = 0; //how often is new k calculated
unsigned short out = 0; //output variable
unsigned short maxwordlength = 0; //max wordlength indicator
int order = 0; //prediction order

//compress function

209
void compress(long invalue);
FILE *fp, *op, *tp;

//main routine
int main(void)
{
fp = fopen("reference.wav", "rb"); //open wav-file for reading
op = fopen("out.comp", "wb"); //open output-file for writing
tp = fopen("test.hex", "wb"); //test file for whatever test data
//the user will include
if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

// read wave header and copy it to output file


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

//select parameters
printf("Select prediction order (0, 1, 2, 3 or 4): ");
scanf("%u", &order);
printf("%u", order);
printf("\nDo you want to include channel decorrelation (0=no, 1=yes)? ");
scanf("%u", &chandec);

210
printf("%u", chandec);

// RUN COMPRESSION

for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

// check if order is ok
if (order < 0 | order > 4) {
printf("Error, invalid prediction order \n");
break;
}
// byteswap channels variable
channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));

//if the file is mono


if (channels == 1) {
// read input value and change endian
fread(&value_temp, sizeof(short), 1, fp);
value = 0;
value = ((value_temp & 0x00ff)<<8);
value = (value | ((value_temp & 0xff00)>>8));
m=0;
compress(value);
} else if (channels == 2) {
// read left and right value and put in left and right vaiables
fread(&value_temp, sizeof(short), 1, fp);
left = 0;
left = ((value_temp & 0x00ff)<<8);
left = (left | ((value_temp & 0xff00)>>8));
fread(&value_temp, sizeof(short), 1, fp);
data_size_sw = data_size_sw-2;
right = 0;
right = ((value_temp & 0x00ff)<<8);
right = (right | ((value_temp & 0xff00)>>8));
if (chandec == 0) {
//no channel decorrelation
m=0;
compress(left);
m=1;
compress(right);
} else {
// channel decorrelation
side = left - right;
m=0;
compress(left);
m=1;
compress(side);
}
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
}
fclose(op);
fclose(fp);
fclose(tp);
}

//compression routine
void compress(long invalue)
{
//0th, 1st or 2nd order prediction, depending on what's chosen
if (order == 0) {
residual = invalue;

211
} else if (order == 1) {
residual = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 2) {
residual = diff[m] - diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 3) {
residual = diff2[m]-diff2prev[m];
diff2prev[m] = diff2[m];
diff2[m] = diff[m] - diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
} else if (order == 4) {
residual = diff3[m]-diff3prev[m];
diff3prev[m] = diff3[m];
diff3[m] = diff2[m]-diff2prev[m];
diff2prev[m] = diff2[m];
diff2[m] = diff[m]-diffprev[m];
diffprev[m] = diff[m];
diff[m] = invalue - valprev[m];
valprev[m] = invalue;
}

// convert to sign-magnitude
if (residual < 0) {
residual = (-residual);
sign = 1;
} else {
sign = 0;
}

// perform Pod-coding

// find overflow
overflow = 0;
overflow = residual >> k[m];
fwrite(&numzeros, sizeof(char), 1, tp);

// find number of zeros


numzeros = 0;

// overflow can be max (18-k) bits


for (i=0;i<(18-k[m]);i++) {
if (overflow > (bittab[i]-1)) {
numzeros++;
} else {
break;
}
}

// find max wordlength just to see how the coding performs


if (((numzeros<<1)+k[m]+1) > maxwordlength) {
maxwordlength = (numzeros<<1)+k[m]+2;
}

// put together and write output data bit by bit


// data fills the out-variable continuously from
// MSB and downwards
// using bit-table, bittable[j] is a 1 in position
// j counting from LSB to MSB
// when out-variable is filled (j<0), it starts
// filling out a new one immideately

212
// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}

// zeros followed by overflow or just a one if the overflow is 0


if (numzeros == 0) {
out = out | bittab[j];
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}else{
//zeros
for (i=numzeros; i>0; i--) {
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out=0;
}
}

// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j=15;
out = 0;
}
}

// uncoded part (bit 1 to bit k of value)


for (i=k[m]; i>0; i--) {
if ((residual & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
fwrite(&out, sizeof(short), 1, op);
j = 15;
out = 0;
}
}

// calculate k for next sample


N[m]++;
A[m]+=residual;
x++;
// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
//if (x==64 || N == 255) {
for (k[m]=0; (N[m]<<k[m])<A[m]; k[m]++);
x = 0;

213
//}
// reset accumulation every 255th sample
if (N[m]==255) {
N[m]=0;
A[m]=0;
}
}

//////////////////////////////////////////////////////////////////////
//compression test program..........……….........//
//pod-encoding........................……………......//
//selectable prediction......................……….…//
//from no prediction up to fourth order....…...//
//mono or stereo.............................…………...//
//...........................................………………….//
//Written for Macintosh, Intel users remove...//
//endian conversion....................………….......//
//........................................…………………....//
//decoder...............................……………........//
//.........................................…………………...//
//Ivar Løkken, NTNU, 2004............……........//
/////////////////////////////////////////////////////////////////////

#include <stdio.h>

// table for output bitshift


static unsigned long bittab[32] = {
0x00000001,0x00000002,0x00000004,0x00000008,0x00000010,0x00000020,0x00000040,0x00000080,
0x00000100,0x00000200,0x00000400,0x00000800,0x00001000,0x00002000,0x00004000,0x00008000,
0x00010000,0x00020000,0x00040000,0x00080000,0x00100000,0x00200000,0x00400000,0x00800000,
0x01000000,0x02000000,0x04000000,0x08000000,0x10000000,0x20000000,0x40000000,0x80000000};

//table for 16-bit bit comparison


static unsigned short bittabs[16] = {
0x0001,0x0002,0x0004,0x0008,0x0010,0x0020,0x0040,0x0080,
0x0100,0x0200,0x0400,0x0800,0x1000,0x2000,0x4000,0x8000};

//data variables
unsigned short in = 0; //input variable variable
long outvar = 0; //output variable from function call
short PCMout_t = 0; //16-bit output data
short PCMout = 0; //16-bit output data right endian
long side = 0; //side band (left-right)
long left = 0; //left channel
long right = 0; //right channel

//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short numzeros=0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, estimation of output length
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count

//predictor variables
long residual = 0; //decoded residual
long diff[2] = {0,0}; //calculated difference when 2nd o. pred
long diff2[2] = {0,0}; //3rd order
long diff3[2] = {0,0}; //4th order
long out[2] = {0,0};

//misc variables
short i = 0; //counting variable
short j = 15; //counting variable
unsigned char m = 0; //channel indicator

214
short x = 0; //how often is new k calculated
unsigned int order = 0; //prediction order
int chandec = 0; //channel decorrelation indicator

//decompression function
long decompress(void);

FILE *fp, *op, *tp;

//main routine
int main(void)
{
fp = fopen("out.comp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
tp = fopen("test.hex", "wb"); //test-files for whatever the user will store

if (fp) {
//wav header variables
char id[4];
unsigned long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//read and write wav header content


fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

215
//enter parameters
printf("Select prediction order (same as used in encoder): ");
scanf("%u", &order);
printf("\nIs channel decorrelation used in compressed file (0=no, 1=yes)? ");
scanf("%u", &chandec);

//read the 16 first bits


fread(&in, sizeof(short), 1, fp);

//RUN DECOMPRESSION
for ( ; data_size_sw>0; data_size_sw=data_size_sw-2) {

//check if order is ok
if (order < 0 | order > 4) {
printf("Error, invalid prediction order \n");
break;
}

//byteswap channels variable


channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));

//if the file is mono


if (channels == 1) {
// decompress file and put in output variable
m=0;
outvar = decompress();
//make sure it is within allowed range
if (outvar > 32767)
outvar = 32767;
if (outvar < -32768)
outvar = -32768;
PCMout_t = outvar;
// convert back to big endian
PCMout = ((PCMout_t & 0x00ff)<<8);
PCMout = (PCMout | ((PCMout_t & 0xff00)>>8));
// write output value
fwrite(&PCMout, sizeof(short), 1, op);
} else if (channels == 2) {
if (chandec == 0) {
//no channel decorrelation
m=0;
left = decompress();
m=1;
right = decompress();
} else {
//restore channel information
m=0;
outvar = decompress();
left = outvar;
m=1;
outvar = decompress();
side = outvar;
right = left - side;
}
//left channel write
outvar = left;
if (outvar > 32767)
outvar = 32767;
if (outvar < -32768)
outvar = -32768;
PCMout_t = outvar;
//convert back to big endian
PCMout = ((PCMout_t & 0x00ff)<<8);
PCMout = (PCMout | ((PCMout_t & 0xff00)>>8));
//write output value

216
fwrite(&PCMout, sizeof(short), 1, op);
data_size_sw=data_size_sw-2;
//right channel write
outvar = right;
if (outvar > 32767)
outvar = 32767;
if (outvar < -32768)
outvar = -32768;
PCMout_t = outvar;
//convert back to big endian
PCMout = ((PCMout_t & 0x00ff)<<8);
PCMout = (PCMout | ((PCMout_t & 0xff00)>>8));
//write output value
fwrite(&PCMout, sizeof(short), 1, op);
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
}
fclose(fp);
fclose(op);
}

//decompression function
long decompress(void) {

// read sign
sign = in & bittabs[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

// count zeros
numzeros = 0;
while ((in & bittabs[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
fwrite(&numzeros, sizeof(char), 1, tp);

// if numzeros = 0, skip the "1" prefix


if (numzeros == 0) {
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// read the next part (numzeros and k bits) bit by bit and construct output
residual = 0;
for (i=(numzeros+k[m]) ; i>0; i--) {
if ((in & bittabs[j]) != 0) {
residual = residual | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

217
}

// calculate k for next sample


// if x=n in IF, k is calculated every n samples. Remove if to
// calculate k for every sample
N[m]++;
A[m] += residual;
x++;
//if (x==3 || N == 255) {
for (k[m]=0; (N[m]<<k[m])<A[m]; k[m]++);
x = 0;
//}

// reset accumulation every 255th sample


if (N[m]==255) {
N[m]=0;
A[m]=0;
}

// restore sign representation


if (sign != 0)
residual = -residual;

// construct output data, depending on prediction order used


if (order == 0) {
out[m] = residual;
} else if (order == 1) {
out[m] += residual;
} else if (order == 2) {
diff[m] += residual;
out[m] += diff[m];
} else if (order == 3) {
diff2[m] += residual;
diff[m] += diff2[m];
out[m] += diff[m];
} else if (order == 4) {
diff3[m] +=residual;
diff2[m] += diff3[m];
diff[m] += diff2[m];
out[m] += diff[m];
}
return out[m];
}

218
Hybrid lossless/lossy encoder and decoder

///////////////////////////////////////////////////////
//Hybrid lossless/lossy codec..…….//
//.........................……………….......//
//10 bit per sample output rate…….//
//LSB-removal or mono-samples.....//
//lossy-mode...................…………...//
//fixed 2nd order prediction.…….....//
//and Pod-encoding.............………..//
//.............................………………...//
//Written for Macintosh........……....//
//intel users, remove endian conv....//
//......................………………..........//
//encoder...........……………............//
//...................………………............//
//Ivar Løkken.............…………......//
//NTNU, 2004..........………....…...//
/////////////////////////////////////////////////////

#include <stdio.h>
#include <math.h>

// table for output bitshift


static unsigned long bittab[32] = {
0x00000001,0x00000002,0x00000004,0x00000008,0x00000010,0x00000020,0x00000040,0x00000080,
0x00000100,0x00000200,0x00000400,0x00000800,0x00001000,0x00002000,0x00004000,0x00008000,
0x00010000,0x00020000,0x00040000,0x00080000,0x00100000,0x00200000,0x00400000,0x00800000,
0x01000000,0x02000000,0x04000000,0x08000000,0x10000000,0x20000000,0x40000000,0x80000000};

//buffer that holds one frame, 16x2 samples


short inputbuffer[128];
short outputbuffer[255];

//wav-header data
char id[4];
long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//variables for data read and write


short value = 0; //current input sample value
short value_temp = 0; //temp value
unsigned short out = 0; //output variable

//prediction variables
long valprev[2] = {0,0}; //previous value
long diff[2] = {0,0}; //difference
long diffprev[2] = {0,0}; //previous difference
long residual = 0; //prediction residual

//encoding variables
unsigned short sign = 0; //sign-bit
unsigned short overflow = 0; //binary part
unsigned char numzeros = 0; //number of zeros
unsigned char k[2] = {6,6}; //variable k, wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count

//lossy-mode variables
unsigned char lsb_rem = 0; //lsbs to be removed if lossy-mode 1
unsigned char mono_samples = 0; //samples in a frame to be sent if mono in lossy-mode 2
unsigned char header = 0; //frame header
unsigned char frame_length = 128; //frame length in samples
unsigned int lossy_mode = 0; //selects lossy mode (0 = none, 1 = lsb_removal, 2 = mono, 3 = mono test)

219
//counting variables
short i = 0;
short j = 15;
unsigned short outbuf_pos = 0;
short y = 0;

unsigned char LR = 0; //left/right indicator


unsigned short maxwordlength = 0; //max wordlength indicator

//functions
void mono_test_only(void);
void compress_frame(unsigned char length);
void compress_sample(void);
void read_write_wavinfo(void);
void read_frame(void);
void write_header(void);
void write_frame(void);
void check_lossy(void);

FILE *fp, *op, *tp;

//main program
int main(void)
{
fp = fopen("modernlive2.wav", "rb"); //open wav-file for reading
op = fopen("out.comp", "wb"); //open output-file for writing
read_write_wavinfo();
printf("please select lossy-mode (0=none, 1 = lsb removal, 2 = mono samples, 3 = mono samp. test only): \n");
scanf("%u", &lossy_mode);
while (data_size_sw > 0) {
if (lossy_mode == 3) {
mono_test_only();
break;
} else {
if (channels == 1 | channels == 2) {
//overrides lossy-mode selection if signal is mono.
if (channels == 1)
lossy_mode = 0;
outbuf_pos = 0;
read_frame();
if(data_size_sw < 0)
break;
compress_frame(frame_length);
write_frame();
if (lossy_mode != 0)
check_lossy();
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
}
fclose(op);
fclose(fp);
}

220
//function that reads wav header and copies it to output file
void read_write_wavinfo(void)
{
// read wave header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// byteswap channels variable


channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));
}

//function that reads a frame of data and puts it in the input buffer
void read_frame(void)
{
for(y=0; y<frame_length; y++) {
if(data_size_sw<0)
break;
// read input value and change endian
fread(&value_temp, sizeof(short), 1, fp);
data_size_sw=data_size_sw-2;
inputbuffer[y] = 0;

221
inputbuffer[y] = ((value_temp & 0x00ff)<<8);
inputbuffer[y] = (inputbuffer[y] | ((value_temp & 0xff00)>>8));
//remove LSBs if lossy mode 1
if (lossy_mode == 1) {
//convert to sign magnitude
if (inputbuffer[y] < 0) {
inputbuffer[y] = (-inputbuffer[y]);
i = 1;
} else {
i = 0;
}
//remove LSBs
inputbuffer[y] = inputbuffer[y]>>lsb_rem;
//back to twos complement
if (i == 1) {
inputbuffer[y]=(-inputbuffer[y]);
}
}
}
}

//function that handles compression of an entire frame


void compress_frame(unsigned char length)
{
LR=0;
// if lossy-mode is used, header must be included
if (lossy_mode != 0) {
if (lossy_mode == 1) {
header = lsb_rem;
} else if (lossy_mode == 2) {
header = mono_samples;
}
//8-bit header is used,
//put header in output framebuffer
for (i=8; i>0; i--) {
if ((header & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out = 0;
}
}
}
//if lossy mode is 0 or 1, compress entire frame
if (lossy_mode != 2){
for (y=0;y<length;y++) {
compress_sample();
if (channels == 2)
LR = !LR;
}
//if not compress some samples in mono (left chn. only)
} else {
for (y=0;y<(length-mono_samples);y++) {
compress_sample();
LR = !LR;
}
LR=0;
y=(length-mono_samples);
while(y<length) {
compress_sample();
y=y+2;
}
}

222
}

//compression routine
void compress_sample(void)
{
//2nd order prediction
residual = diff[LR] - diffprev[LR];
diffprev[LR] = diff[LR];
diff[LR] = inputbuffer[y] - valprev[LR];
valprev[LR] = inputbuffer[y];

// convert to sign-magnitude
if (residual < 0) {
residual = (-residual);
sign = 1;
} else {
sign = 0;
}

// perform Pod-coding

// find overflow
overflow = 0;
overflow = residual >> k[LR];

// find number of zeros


numzeros = 0;
// overflow can be max (18-k) bits
for (i=0;i<(18-k[LR]);i++) {
if (overflow > (bittab[i]-1)) {
numzeros++;
} else {
break;
}
}

// find max wordlength just to see how the coding performs


if (((numzeros<<1)+k[LR]+1) > maxwordlength) {
maxwordlength = (numzeros<<1)+k[LR]+2;
}

// put together and write output data bit by bit


// data fills the out-variable continuously from
// MSB and downwards
// using bit-table, bittable[j] is a 1 in position
// j counting from LSB to MSB
// when out-variable is filled (j<0), it starts
// filling out a new one immideately

// sign
if (sign != 0) {
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out = 0;
}

// zeros followed by overflow or just a one if the overflow is 0


if (numzeros == 0) {
out = out | bittab[j];
j--;

223
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out=0;
}
}else{
//zeros
for (i=numzeros; i>0; i--) {
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out=0;
}
}

// overflow
for (i=numzeros; i>0; i--) {
if ((overflow & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j=15;
out = 0;
}
}
}

// uncoded part (bit 1 to bit k of value)


for (i=k[LR]; i>0; i--) {
if ((residual & bittab[i-1]) != 0){
out = out | bittab[j];
}
j--;
if (j<0) {
outputbuffer[outbuf_pos] = out;
outbuf_pos++;
j = 15;
out = 0;
}
}

// calculate k for next sample


N[LR]++;
A[LR]+=residual;
for (k[LR]=0; (N[LR]<<k[LR])<A[LR]; k[LR]++);
// reset accumulation every frame
if (N[LR]==255) {
N[LR]=0;
A[LR]=0;
}
}

//function that writes the frame to file


void write_frame(void)
{
//write the frame output to file, since j is not reset, the remainder will be send in
//the next write, this is for no-lossy or lossy-mode 1
for(y=0; y<outbuf_pos;y++) {
fwrite(&outputbuffer[y], sizeof(short), 1, op);
}
outputbuffer[0] = outputbuffer[outbuf_pos+1];

224
}

// lossy mode check and calculation


void check_lossy(void)
{
//lsb_removal lossy mode
if (lossy_mode == 1) {
// 80 because 128/80 is 10 bits pr. sample, which translates to ca 1Mbps
if (outbuf_pos > 80) {
// each word > 80 means 16 bits to much, to remove 1LSB from each sample in the
//frame saves 128 bits, use shift to compensate
//i.e. 128-sample frame => three left shift
lsb_rem = (outbuf_pos-80)>>3;
} else {
lsb_rem = 0;
}
}

//mono samples lossy mode


if (lossy_mode == 2) {
if (outbuf_pos > 80) {
//if n samples over 80, n samples must be sent in mono
//since 1 sample is saved for each mono send
//also make sure it's an even number to avoid mixing channels
mono_samples = ((outbuf_pos-80)>>1)<<1;
} else {
mono_samples = 0;
}
}
}

//mono test only routine


void mono_test_only(void)
{
tp = fopen("monotest.wav", "wb");
// read wave header
fread(id, sizeof(char), 4, fp); //read in first four bytes
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, tp);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, tp);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, tp);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, tp);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, tp);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, tp);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, tp);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, tp);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, tp);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, tp);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, tp);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, tp);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, tp);
} else {

225
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// byteswap channels variable


channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));
if (channels == 2) {
while(data_size_sw>0) {
//write 40 stereo samples
for (z=0;z<40;z++) {
if(data_size_sw <0)
break;
fread(&value_temp, sizeof(short), 1, fp);
fwrite(&value_temp, sizeof(short),1,tp);
data_size_sw = data_size_sw-2;
if(data_size_sw<0)
break;
}
//and 24 mono samples per frame
for(z=40;z<64;z++){
if(data_size_sw<0)
break;
fread(&value_temp, sizeof(short),2,fp);
//write left input sample to left output
fwrite(&value_temp, sizeof(short),1,tp);
//and to right
fwrite(&value_temp, sizeof(short),1,tp);
data_size_sw = data_size_sw-4;
z++;
}
}
} else {
printf("Error, mono-mode test can only be done on stereo file");
fclose(tp);
}
}

///////////////////////////////////////////////////////
//Hybrid lossless/lossy codec..…….//
//.........................……………….......//
//10 bit per sample output rate…….//
//LSB-removal or mono-samples.....//
//lossy-mode...................…………...//
//fixed 2nd order prediction.…….....//
//and Pod-encoding.............………..//
//.............................………………...//
//Written for Macintosh........……....//
//intel users, remove endian conv....//
//......................………………..........//
//decoder...........……………............//
//...................………………............//
//Ivar Løkken.............…………......//
//NTNU, 2004..........………....…...//
/////////////////////////////////////////////////////

226
#include <stdio.h>

// table for output bitshift


static unsigned long bittab[32] = {
0x00000001,0x00000002,0x00000004,0x00000008,0x00000010,0x00000020,0x00000040,0x00000080,
0x00000100,0x00000200,0x00000400,0x00000800,0x00001000,0x00002000,0x00004000,0x00008000,
0x00010000,0x00020000,0x00040000,0x00080000,0x00100000,0x00200000,0x00400000,0x00800000,
0x01000000,0x02000000,0x04000000,0x08000000,0x10000000,0x20000000,0x40000000,0x80000000};

//table for 16-bit bitcheck


static unsigned short bittabs[16] = {
0x0001,0x0002,0x0004,0x0008,0x0010,0x0020,0x0040,0x0080,
0x0100,0x0200,0x0400,0x0800,0x1000,0x2000,0x4000,0x8000};

//outbut buffer
long outbuffer[128];

//wav header variables


char id[4];
long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//lossy mode variables


unsigned char frame_length = 128; //frame length
unsigned short header = 0; //header variable
unsigned char lsb_rem = 0; //lsbs to be removed for a given frame
unsigned char lsb_rem_last = 0; //same but from last frame
unsigned int lossy_mode = 0; //lossy-mode variable
unsigned char mono_samples=0; //number of mono samples

//read and write data variables


unsigned short in = 0; //input variable
long outvar = 0; //output variable
short PCMout_t = 0; //16-bit output data
short PCMout = 0; //16-bit output data right endian
long left = 0; //left channel variable
long right = 0; //right channel variable

//predictor variables
long residual = 0; //decoded residual
long diff[2] = {0,0}; //calculated difference when 2nd order
long out[2] = {0,0}; //output variable

//decoder variables
unsigned short sign = 0; //sign-bit
unsigned short numzeros=0; //number of zeros
unsigned char k[2] = {6,6}; //k-variable, compressed wordlength estimation
unsigned long A[2] = {0,0}; //accumulated value for calculation of k
unsigned char N[2] = {0,0}; //sample count
unsigned char LR = 0; //channel indicator

//counting variables
short i = 0; //counting variable
unsigned char y = 0; //counting variable
short j = 15; //counting variable
char x = 0; //counting variable

//functions
void decompress_frame(unsigned char length);
void decompress_sample(void);
void read_write_wavinfo(void);
void put_back_lsbs(void);
void back_to_stereo(void);
void write_frame_tofile(void);

227
FILE *fp, *op;

//main program
int main(void)
{
fp = fopen("out.comp", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
read_write_wavinfo();

//read first 16 bits


fread(&in, sizeof(short), 1, fp);
printf("please select lossy-mode (0=none, 1 = lsb removal, 2 = mono samples): \n");
scanf("%u", &lossy_mode);
while(data_size_sw > 0) {
if (channels == 1 | channels == 2) {
//overrides lossy mode selection if mono signal
if (channels == 1)
lossy_mode = 0;
//decompress frame and check lossy modes
decompress_frame(frame_length);
if (lossy_mode == 1) {
put_back_lsbs();
} else if (lossy_mode == 2){
back_to_stereo();
}
//write output data to file
write_frame_tofile();
} else {
printf("Error, not 1 or 2 channels \n");
break;
}
}
fclose(fp);
fclose(op);
}

//function that reads and writes data from wav header


void read_write_wavinfo(void)
{
if (fp) {
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);

228
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);
// byteswap channels variable
channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));
}
}

//function that handles decompression of one frame


void decompress_frame(unsigned char length)
{
//if lossy-mode is used, header is included
if (lossy_mode != 0) {
// read frame header
header = 0;
for(i=8;i>0;i--) {
if((in & bittabs[j]) != 0) {
header = header | bittabs[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
//header content is put in right variable
if (lossy_mode == 1) {
lsb_rem_last = lsb_rem;
lsb_rem = header;
} else if (lossy_mode == 2) {
mono_samples = header;
}
}

LR = 0;
// decompress the samples in the frame
if (lossy_mode != 2) {
for (y=0;y<length;y++) {
decompress_sample();
if (channels == 2) {
LR = !LR;
}
}
//if mono-mode, decompress stereo samples first, then mono samples
} else {
y=0;
for (y=0;y<(length-mono_samples);y++) {
decompress_sample();
LR = !LR;
}

229
LR = 0;
y=(length-mono_samples);
while (y<length) {
decompress_sample();
y=y+2;
}
}
}

//decompression routine
void decompress_sample(void)
{
// read sign
sign = in & bittabs[j];
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}

// count zeros
numzeros = 0;
while ((in & bittabs[j]) == 0){
numzeros++;
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// if numzeros = 0, skip the "1" prefix


if (numzeros == 0) {
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}
// read the next part (numzeros and k bits) bit by bit and construct output
residual = 0;
for (i=(numzeros+k[LR]) ; i>0; i--) {
if ((in & bittabs[j]) != 0) {
residual = residual | bittab[i-1];
}
j--;
if (j<0){
fread(&in, sizeof(short), 1, fp);
j = 15;
}
}

// calculate k for next sample


N[LR]++;
A[LR] += residual;
for (k[LR]=0; (N[LR]<<k[LR])<A[LR]; k[LR]++);

// reset accumulation every 255th sample


if (N[LR]==255) {
N[LR]=0;
A[LR]=0;
}

// restore sign representation


if (sign != 0)
residual = -residual;

230
residual = residual;
// construct output data, depending on prediction order used
diff[LR] += residual;
out[LR] += diff[LR];
outbuffer[y] = out[LR];
}

//routine to correct amplitude if LSBs have been removed


//convert to sign-magnitude and put back LSBs
//lsb_rem_last for the first two samples, since prediction
//gives a two-sample delay (from residual n to sample n)
//which must be alligned with the LSB-removal
void put_back_lsbs(void)
{
for(y=0;y<frame_length;y++){
if(outbuffer[y]<0){
outbuffer[y] = (-outbuffer[y]);
i = 1;
} else {
i = 0;
}
if (y == 0 | y == 1) {
outbuffer[y] = outbuffer[y]<<lsb_rem_last;
} else {
outbuffer[y] = outbuffer[y]<<lsb_rem;
}
if (i == 1) {
outbuffer[y] = -outbuffer[y];
}
}
}

//function to produce stereo output if mono-mode is used


//same compensation for delay as in put_back_lsb
void back_to_stereo(void)
{
outbuffer[1] = outbuffer[0]
y = (frame_length-mono_samples)+2;
while(y<frame_length){
outbuffer[y+1] = outbuffer[y];
y = y+2;
}
}

//write output data to file


void write_frame_tofile(void)
{
for(y=0;y<frame_length;y++){
outvar = outbuffer[y];
//limit value
if (outvar > 32767)
outvar = 32767;
if (outvar < -32768)
outvar = -32768;
PCMout_t = outvar;
// convert back to big endian
PCMout = ((PCMout_t & 0x00ff)<<8);
PCMout = (PCMout | ((PCMout_t & 0xff00)>>8));
// write output value
fwrite(&PCMout, sizeof(short), 1, op);
data_size_sw = data_size_sw - 2;
}
}

231
Dropped packet simulator
/////////////////////////////////////////////////
//File that emulates dropped.….//
//packets and different ways....//
//of handling to see how it…...//
//affects audio quality....……...//
//....................…………............//
//Ivar L√∏kken, NTNU 2004.//
///////////////////////////////////////////////

#include <stdio.h>
#include <math.h>

void read_write_wavinfo(void); //read and write wav header


void run_with_drop(void); //run with packet loss
void run_no_drop(void); //run without packet loss

//wav-header data variables


char id[4];
long size, data_size, data_size_sw;
short format_tag, channels, channels_temp, block_allign, bits_per_sample;
long format_length, sample_rate, avg_bytes_sec;

//audio data variables


short last_packet[256]; //buffer last packet in case of repeat
short value = 0; //value in
short silence = 0x8000; //silence variable

//parameter variables
unsigned int packet_length = 0; //length of packet in samples
unsigned int drop_interval = 0; //how often packets are dropped
unsigned int lost_in_a_row = 0; //how many packets are lost in a row
unsigned int handling_mode = 0; //how to handle lost packets
unsigned int losson = 0; //packet loss on/off indicator

//counting variables
short i = 0;
short j = 0;

FILE *fp, *op;

//main program
int main(void)
{
fp = fopen("modernlive.wav", "rb"); //open wav-file for reading
op = fopen("out.wav", "wb"); //open output-file for writing
read_write_wavinfo(); //read wav-header
printf("\nDo you want packets to be lost (0=no, 1=yes): ");
scanf("%u", &losson);
if (losson) {
//select parameters
printf("\nPlease select packet length in samples (max 256): ");
scanf("%u", &packet_length);
printf("\nPlease select packet drop interval: ");
scanf("%u", &drop_interval);
printf("\nPlease select how many packets should be dropped each time (default = 1): ");
scanf("%u", &lost_in_a_row);
printf("\nPlease select handling mode (1 = insert silence, 2 = repeat last OK packet): ");
scanf("%u", &handling_mode);
//run
run_with_drop();
} else {
//run with no packet loss

232
run_no_drop();
}
}

//function that reads wav header and copies it to output file


void read_write_wavinfo(void)
{
// read wave header
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "RIFF", 4)) { //if it is a RIFF, continue
fwrite(id, sizeof(char), 4, op);
fread(&size, sizeof(long), 1, fp);
fwrite(&size, sizeof(long), 1, op);
fread(id, sizeof(char), 4, fp);
if(!strncmp(id, "WAVE", 4)) { //If it is a WAVE, continue
fwrite(id, sizeof(char), 4, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&format_length, sizeof(long), 1, fp);
fwrite(&format_length, sizeof(long), 1, op);
fread(&format_tag, sizeof(short), 1, fp);
fwrite(&format_tag, sizeof(short), 1, op);
fread(&channels_temp, sizeof(short), 1, fp);
fwrite(&channels_temp, sizeof(short), 1, op);
fread(&sample_rate, sizeof(long), 1, fp);
fwrite(&sample_rate, sizeof(long), 1, op);
fread(&avg_bytes_sec, sizeof(long), 1, fp);
fwrite(&avg_bytes_sec, sizeof(long), 1, op);
fread(&block_allign, sizeof(short), 1, fp);
fwrite(&block_allign, sizeof(short), 1, op);
fread(&bits_per_sample, sizeof(short), 1, fp);
fwrite(&bits_per_sample, sizeof(short), 1, op);
fread(id, sizeof(char), 4, fp);
fwrite(id, sizeof(char), 4, op);
fread(&data_size, sizeof(long), 1, fp);
fwrite(&data_size, sizeof(long), 1, op);
} else {
printf("Error: RIFF-file, but not a wave-file\n");
}
} else {
printf("Error: not a RIFF-file\n");
}

//byteswapping of data_size since it is arranged with msbyte last


data_size_sw = 0;
data_size_sw = ((data_size&0x000000ff)<<24);
data_size_sw = (data_size_sw|((data_size&0x0000ff00)<<8));
data_size_sw = (data_size_sw|((data_size&0x00ff0000)>>8));
data_size_sw = (data_size_sw|((data_size&0xff000000)>>24));
printf("Data size: %d \n", data_size_sw);

// byteswap channels variable


channels = ((channels_temp & 0x00ff)<<8);
channels = (channels | ((channels_temp & 0xff00)>>8));
}

//function that reads and writes data with packet loss


void run_with_drop(void)
{
while (data_size_sw > 0) {
//read and write the packets that shall not be lost
for (i=0;i<(drop_interval-lost_in_a_row);i++) {
for (j=0;j<packet_length;j++) {
fread(&value, sizeof(short), 1, fp);
fwrite(&value, sizeof(short),1,op);

233
//back up last packet in case of repetition mode selected
last_packet[j] = value;
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
}
if (data_size_sw<0) break;
}
//dropped packets
for (i=(drop_interval-lost_in_a_row);i<drop_interval;i++) {
for (j=0;j<packet_length;j++) {
//if handling mode 1, write silence to output file
if (handling_mode==1) {
fread(&value, sizeof(short), 1, fp);
fwrite(&silence, sizeof(short),1,op);
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
//if handling mode 2, write last OK packet to output file
} else if (handling_mode == 2) {
fread(&value, sizeof(short), 1, fp);
fwrite(&last_packet[j], sizeof(short),1,op);
data_size_sw = data_size_sw - 2;
if (data_size_sw<0) break;
}
}
if (data_size_sw<0) break;
}
}
fclose(fp);
fclose(op);
}

//function that reads and writes data with no loss


void run_no_drop(void)
{
while (data_size_sw > 0) {
fread(&value, sizeof(short), 1, fp);
fwrite(&value, sizeof(short),1,op);
data_size_sw = data_size_sw - 2;
}
fclose(fp);
fclose(op);
}

234
Appendix 7. MatLab Scripts
Prediction w. selectable filter and resulting entropy
calculation
function [ErLeft, ErRight]=decorr(path, B, A)
%Matlab-function for intra-channel decorrelation of wavfile.
%The function plots histogram and calculates entropy
%
%FIR- or IIR-filters of any order may be used
%
%Designed for two-channel 16-bit wavefile
%
%[error_left, error_right]=diffdecorr('c:\path\filename.wav', B, A)
%
%B and A are filter coefficients
%a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
% - a(2)*y(n-1) - ... - a(na+1)*y(n-na)
%
%Made by: Ivar Løkken, 19/1-04

signal=wavread(path);
samples=signal*(2^15-1);
%Removes normalization of wavefile.
%The function Wavread normalizes sample values to [-1 1]
%actual sample values for 16-bits is [-32767 32767]

vector=samples(:);
%Puts the two channels in one vector

LCH=vector(1:length(vector)/2);
RCH=vector(length(vector)/2+1:length(vector));
%separates left channel from right

ErLCH=filter(B,A,LCH);
ErRCH=filter(B,A,RCH);
%calculates prediction error

subplot(2,1,1);
hist(ErLCH,min(ErLCH):max(ErLCH));
title('Histogram, predicted error Left Channel');
ylabel('Number of occurances');
xlabel('sample value');
subplot(2,1,2);
hist(ErRCH,min(ErRCH):max(ErRCH));
title('Histogram, predicted error Right Channel');
ylabel('Number of occurances');
xlabel('sample value');
%plots normalized histograms

histoL=hist(ErLCH,min(ErLCH):max(ErLCH));
probsL=histoL/sum(histoL);
histoR=hist(ErRCH,min(ErRCH):max(ErRCH));
probsR=histoR/sum(histoR);
%generates probability distribution based on histogram

IL=-log2(probsL);
IL(IL==-Inf) = 0;
IL(IL==Inf) = 0;
prodL=probsL.*IL;
ErLeft=sum(prodL);

IR=-log2(probsR);

235
IR(IR==-Inf) = 0;
IR(IR==Inf) = 0;
prodR=probsR.*IR;
ErRight=sum(prodR);
%Calculates entropy using standard formula

Wavfile entropy calculator


function [Left, Right]=entropy(path)
%Matlab-function for calculation of entropy.
%Designed for two-channel 16-bit wavefile
%
%
%Calculates entropy for left and right channel separatly
%[left, right]=entropy('c:\path\filename.wav')
%
%Made by: Ivar Løkken, 14/1-04

signal=wavread(path);
samples=signal*(2^15-1);
%Removes normalization of wavefile.
%The function Wavread normalizes sample values to [-1 1]
%actual sample values for 16-bits is [-32767 32767]

vector=samples(:);
%Puts the two channels in one vector

LCH=vector(1:length(vector)/2);
RCH=vector(length(vector)/2+1:length(vector));
%separates left channel from right

histoL=hist(LCH,min(LCH):max(LCH));
probsL=histoL/sum(histoL);
histoR=hist(RCH,min(RCH):max(RCH));
probsR=histoR/sum(histoR);
%generates probability distribution based on histogram

IL=-log2(probsL);
IL(IL==-Inf) = 0;
IL(IL==Inf) = 0;
prodL=probsL.*IL;
Left=sum(prodL);

IR=-log2(probsR);
IR(IR==-Inf) = 0;
IR(IR==Inf) = 0;
prodR=probsR.*IR;
Right=sum(prodR);
%Calculates entropy using standard formula

236
Lossy compression error calculator
function [PCMerror, SER] = errorcal(infile, outfile)

%%%%%%%%%%%%%%%%%%%%%
%Funtion that calculates SER and............%
%error rate for lossy compression...….....%
%.......................................……………....%
%Ivar Løkken, NTNU 2002.............…....%
%%%%%%%%%%%%%%%%%%%%%
%[Error, SER] = errorcal(infile, outfile)..%
%.......................................……………....%
%%%%%%%%%%%%%%%%%%%%%

in_t = wavread(infile);
compfile_t = wavread(outfile);

if size(in_t)<size(compfile_t)
siz = size(in_t);
else
siz = size(compfile_t);
end
in = in_t(1:siz);
compfile = compfile_t(1:siz);

%SER and absolute maximum error


error = in-compfile;
errorones = ones(siz);
inpower = (1/(2*siz+1))*(errorones)'*(in.*in);
errorpower = (1/(2*siz+1))*(errorones)'*(error.*error);
SER = 10*log(inpower/errorpower);
PCMerror = (max(abs(error))/max(abs(in)));

subplot(3,1,1), plot(in);
title('Input (normalised & quantised)');
subplot(3,1,2), plot(compfile);
title('Output');
subplot(3,1,3), plot(error);
title('Error');
s=sprintf('SNR = %4.1fdB\n', SER);
text(0.5,-90,s);
s=sprintf('Max absolute error (normalized) = %4.1fdB\n', PCMerror);
text(0.5,-110,s);

function centroid=speccent(path)

237
Spectral centroid calculator
%function for calculating spectral centroid of wav-file
%
%Ivar Løkken
%
%centroid=speccent(path)
%where path is wavefile path,
%for instance. 'c:\music\file.wav'

%read signal
[signal, FS, NBITS] = wavread(path);
N=length(signal)+1;

%do FFT and plot it

FT=abs(fft(signal));

%perform antialiasing (energy above FS/2 must be removed)


FTfilt=FT(1:N/2);

%plot filtered FFT from 0 to FS/2


fftDB=db(FTfilt);
freq=[0:FS/N:FS*(1-1/N)/2];
plot(freq, fftDB);
title('FFT and spectral centroid of wavfile');
ylabel('Amplitude, dB');
xlabel('Frequency, HZ');
grid;

%calculate centroid
sumFA=0;
sumA=0;
for i=1:N/2
sumFA=sumFA+i*FTfilt(i);
sumA=sumA+FTfilt(i);
i=i+1;
end
cent=sumFA/sumA;

%convert to correct frequency scale


centroid=cent*(FS/(2*N));
s=sprintf('Spectral centroid = %4.1fHZ\n', centroid);
text(4000,40,s);

238
Appendix 8. Tools Used During Development
Hardware
Apple Powerbook G4, 1Ghz/512Mb/40Gb/12”, running Mac OS-X 10.3 ”Panther”
and Windows 2000 SP4 through Virtual PC 6.
Toshiba Satelite 4070CDS Celeron, 366Mhz/192Mb/4Gb/13”, running Windows
2000 SP4.
Eizo FlexScan F57 external 17” CRT-monitor.
The WLS hardware developed as part of this thesis.

Software
General programming: Xcode Tools v.1.1 for OS-X.
Schematics design: DesignWorks Lite 4.5 for OS-X.
Calculations, testing: Mathworks MatLab 6.5 for Unix/OS-X.
Analog simulations: AimSpice 3.8 for Windows, MacInit Mi-Sugar 0.5.2 for OS-X.
Chart and diagram drawings: The Omni Group OmniGraffle for OS-X.
MCU-programming: SDCC microcontroller compiler for UNIX/OS-X.
CC2400-setup: Chipcon SmartLink RF for Windows.
General documentation: Microsoft Office-X for OS-X.

239

Você também pode gostar