Sound Synthesis Theory Explained

Sound Synthesis Theory
Introduction
This book covers a sub-field of Music Technology called sound synthesis. Although the tone is generally aimed at musicians and people with little prior knowledge of music systems, there may be some mathematical concepts and programming techniques that are not familiar. The book focuses on synthesis from a digital perspective rather than an analogue one, since it aims to demonstrate the theory of digital synthesis rather than applications to a specific medium or piece of software.
The Korg MS10 is an example of an early analogue synthesizer.
What is sound synthesis?

Sound synthesis is the technique of generating sound, using electronic hardware or software, from scratch. The most common use of synthesis is musical, where electronic instruments called synthesizers are used in the performance and recording of music. Sound synthesis has many applications both academic and artistic, and we commonly use synthesizers and synthesis methods to: Generate interesting and unique sounds or timbres incapable of being produced acoustically. Recreate or model the sounds of real-world acoustic instruments or sounds. Facilitate the automation of systems and processes (text-to-speech software, train station P.A.s)
Background and history

One of the earliest musical synthesizers was Thaddeus Cahill's Teleharmonium, presented to an audience of about nine hundred in 1906. This massive instrument defined a fundamental synthesis technique called additive synthesis which used combinations of pure tones to generate its sounds. Other early synthesizers used technology derived from electronic analog computers, laboratory test equipment, and early electronic musical instruments. However, it was not until the late 1960s that technology had developed far enough for synthesizers to be a commercial success, most notably with Robert Moog's modular and mini-modular analogue synthesizers. Since the late 1980s most new synthesizers have been completely digital. At the same time analogue synthesis has also revived in popularity, so in recent years the two trends have combined in the appearance of virtual analog synthesizers, digital synthesizers which model analog synthesis using digital signal processing techniques. Digital synthesizers use digital signal processing (DSP) techniques to make musical sounds. Some digital synthesizers now exist in the form of 'softsynth' software that synthesizes sound using conventional PC hardware. Others use specialized DSP hardware.
Sound in the Time Domain

The appearance and behaviour of sound waves
Sound is variation in air pressure and density caused by the propagation of waves through a medium. Human hearing systems sense these waves as they cause the ear drum to move; this movement is transduced into other types of energy inside the ear where it is finally sent to the brain as electrical impulses for analysis. Since sound waves are variations in air pressure over time; it is typical to represent the waves as a varying voltage or a stream of data over time in order to capture, analyse, and reproduce the sounds. When visualising the behaviour of sound waves over time, that is, in the time domain, we use the term amplitude to describe the sound level at a point in time. Amplitude is typically represented as a value between 1 and -1 where 1 and -1 represent maximum amplitude of the signal, and 0 represents zero amplitude.
Figure 1.1. A simple sinusoidal waveform represented as varying amplitude over time.
The waveform in Fig. 1.1 is called a sine wave or sinusoid. Sine waves can be considered the fundamental building blocks of sound and are very smooth-sounding, basic tones. The figure demonstrates that the amplitude varies over time, but that pattern of variance repeats periodically. This short, constant period gives the sine wave its particular qualities.
Figure 1.2. A more complex waveform.
The waveform in Fig. 1.2 is more complicated than the sinusoid in 1.1. There are peaks and troughs of different amplitudes, and, although the pattern does repeat itself over time (see if you can find it) it is harder to spot. In the same way that a sine wave behaves in a simple way and sounds simple, this sound behaves with greater complexity and also sounds more complex. For this reason, detailed, complex sounds that change over time often have no discernible features when viewed this close upthere may be no repeating pattern or behaviour which we can use to tell us something about the sound. As you can see from Fig. 1.1 and 1.2, we are looking at a section of the sound over a very short time scale; it may be necessary to lengthen the time scale in order to gain some information about it.
Figure 1.3. A time-domain plot of a drum kit over 2 seconds.
In Fig. 1.3 we are given a look at a sound over the course of about 2 seconds rather than 2 milliseconds. From this perspective, we can see the way the overall sound amplitude changes over time; in particular, the parts with high amplitude can easily be seen as drum hits - they appear suddenly and drop in amplitude very quickly as one would expect from striking a drum head. It may have been very difficult to tell what kind of instrument was being played if this sound was viewed over the range of a few milliseconds. From this, we should conclude that the short time interval and long time interval perspectives both show different types of information and that selecting the right perspective to suit one's needs is important.
Sinusoids, frequency and pitch

As indicated in Fig. 1.1, the sine wave has a periodic form that repeats every seconds which is known as the period, cycle orwavelength. The wave also has a positive maximum amplitude, and a negative maximum amplitude, . The frequency, , of a sine wave is the number of cycles per second and is measured in Hertz (Hz). We can obtain the frequency from wavelength from the following equation:
Furthermore, we can express a sine wave with the following mathematical form (with angles in radians). This form may be useful to programmers interested in creating their own controllable sine functions in code:
High frequencies are often associated with words such as 'brightness', whereas low frequencies are often associated with 'depth' or 'bass'. For example, an instrument such as an electric guitar played clean may be called 'bright' or 'sharp' whereas an acoustic double-bass may be referred to as 'dark' and 'warm'. Words like these are not objective quantities we can measure precisely, but are often used in describing the timbre of a particular sound. The frequencies present in a sound make a large contribution to timbre, and there are many different shades of timbre that can be achieved through
combinations of different frequencies that make up a sound. The human hearing system also associates frequency with pitch if a particular frequency is sustained or perceived for a period of time; and we associate particular frequencies with particular notes in the standard Western scale:
Cycle length (t) Frequency (Hz) Note name
0.0045
220.0
A3
0.0040
246.94
B3
0.0038
261.63
C4
0.0034
293.66
D4
0.0030
329.63
E4
0.0028
349.23
F4
0.0025
392.0
G4
0.0022
440.0
A4
Fig. 1.4. The relationship between wavelength, frequency and pitch.
[edit]Construction
and deconstruction of sinusoids
It has already been mentioned that sine waves can be considered the building blocks of sound. This is possible because a single sine wave can represent a single frequency- if we combine a series of different sinusoids together, we can theoretically recreate the frequency spectrum of an entire sound, be it real or imagined. In the same way, we can also break a complex sound down into its individual frequency components, allowing us to analyse or even control its minutest characteristics. Both these processes are typically simplified due to the incredibly complex nature of real-world or "realistic" sounds and the subsequent demands analysis and modification put on systems performing the task.
Fig. 1.5 demonstrates the appearance of two sine waves summed together. The characteristics of both waves are combined in the resultant waveform, which now, due to its increased complexity, develops new features. We can continue this process by adding more and more sine waves, each one representing a single frequency component of our desired sound. This technique is the basis of
additive synthesis which is covered later in the book. Furthermore, in the way that we have constructed this sound, it is possible to filter out the two component frequencies from the whole; this is typically done by analysis of the waveform in the frequency domain, which is covered in the subsequent chapter.
Figure 1.5. The summation of two sine waves of different amplitude and frequency, causing their characteristics to be blended into one combined waveform.
Sound in the Digital Domain

[edit]Introduction Digital systems (e.g. computers) and formats (e.g. CD) are clearly the most popular and commonplace methods of storing and manipulating audio. Since the introduction of the compact disc in the early 1980s, the digital format has provided increasingly greater storage capacity and the ability to store audio information at an acceptable quality. Although analogue formats still exist (vinyl,tape), they typically serve a niche audience. Digital systems are ubiquitous in modern music technology. It must be stressed that there is no argument as to whether one domain, be it analogue or digital is superior, but the following provides some desirable features of working with audio in the digital domain. Storage. The amount of digital audio data capable of being stored on a modern hard drive is far greater than a tape system. Furthermore, we can choose the quality of the captured audio data, which relates directly to file size and other factors.
Control. By storing audio information in digital, we can perform powerful and complex operations on the data that would be extremely difficult to realise otherwise.
Durability. Digital audio can be copied across devices without any loss of information. Furthermore, many systems employ error correction codes to compensate for wear and tear on a physical digital format such as a compact disc.
[edit]Digital
<-> Analogue Conversion
Acoustic information (sound waves) are treated as signals. As demonstrated in the previous chapter, we traditionally view these signals as varying amplitude over time. In analogue systems, this generally means that the amplitude is represented by a continuous voltage; but inside a digital system, the signal must be stored as a stream of discrete values.
Figure 2.1. An overview of the digital <-> analogue conversion process.
Digital data stored in this way has no real physical meaning; one could describe a song on a computer as just an array of numbers; these numbers are meaningless unless there exists within the system a process that can interpret each number in sequence appropriately. Fig. 2.1 shows an overview of the process of capturing analogue sound and converting it into a digital stream of numbers for storage and manipulation in such a system. The steps are as follows:
1. An input such as a microphone converts acoustic air pressure variations (sound waves) into variations in voltage. 2. An analogue to digital converter (ADC) converts the varying voltage into a stream of digital values by taking a 'snapshot' of the voltage at a point in time and assigning it a value depending on its amplitude. It typically takes these 'snapshots' thousands of times a second, the rate at which is known as the sample rate. 3. The numerical data is stored on the digital system and then subsequently manipulated or analysed by the user. 4. The numerical data is re-read and streamed out of the digital system. 5. A digital to analogue converter (DAC) converts the stream of digital values back to a varying voltage. 6. A loudspeaker converts the voltage to variations in air pressure (sound).
Although the signal at each stage comes in a different form (sound energy, digital values etc.), the information is analogous. However, due to the nature of the conversion process, this data may become manipulated and distorted. For instance, low values for sample rates or other factors at the ADC might mean that the continuous analogue signal is not represented with enough detail and subsequently the information will be distorted. There are also imperfections in physical devices such as microphones which further "colour" the signal in some way. It is for this reason that musicians and engineers aim to use the most high-quality equipment and processes in order to preserve the integrity of the original sound throughout the process. Musicians and engineers must consider what other processes their music will go through before consumption, too (radio transmission etc.). [edit]Sampling Sound waves in their natural acoustic form can be considered continuous; that is, their time-domain graphs are smooth lines on all zoom factors without any breaks or jumps. We cannot have these breaks, or discontinuities because sound cannot switch instantaneously between two values. An example of this may be an idealised waveform like a square wave - on paper, it switches between 1 and -1 amplitude at a point instantaneously; however a loudspeaker cannot, by the laws of physics, jump between two points in no time at all, the cone has to travel through a continuous path from one point to the next.
Figure 2.2. Discrete samples (red) of a continuous waveform (grey).
Sampling is the process of taking a continuous, acoustic waveform and converting it into a digital stream of discrete numbers. An ADC measures the amplitude of the input at a regular rate creating a stream of values which represent the waveform in digital. The output is then created by passing these values to the DAC, which drives a loudspeaker appropriately. By measuring the amplitude many thousands of times a second, we create a "picture" of the sound which is of sufficient quality to human ears. The more and more we increase this sample rate, the more accurately a waveform is represented and reproduced. [edit]Nyquist-Shannon
sampling theorem
The frequency of a signal has implications for its representation, especially at very high frequencies. As discussed in the previous chapter, the frequency of a sine wave is the number of cycles per second. If we have a sample rate of 20000 samples per second (20 kHz), it is clear that a high frequency sinusoid such as 9000 Hz is going to have fewer "snapshots" than a sinusoid at 150 Hz. Eventually there reaches a point where there are not enough sample points to be able to record the cycle of a waveform, which leads us to the following important requirement:
The sample rate must be greater than twice the maximum frequency represented.
Why is this? The minimum number of sample points required to represent a sine wave is two, but we need at least slightly more than this so that we're not dependent phase (samples at exactly twice the sine wave frequency, the samples may fall on the peaks of the sine wave, or on the zero crossings). It may seem apparent at this time that using just two points to represent a continuous curve such as a
sinusoid would result in a crude approximation - a square wave. And, inside the digital system, this is true. However, both ADCs and DACs have low-pass filters set at half the sample rate (the highest representable frequency). What this means for input and output is that any frequency above the cutoff point is removed and it follows from this that the crude sine representation - a square wave in theory - becomes filtered down to a single frequency (i.e. a sine wave). From this, we have two mathematical results:
and
Where
is the sample rate,
is the highest frequency in the signal.
is the Nyquist
frequency. Frequencies over the Nyquist frequency are normally blocked by filters before conversion to the digital domain when recording; without such processes there would be frequency component foldover, otherwise known as aliasing. [edit]Sampling
accuracy and bit depth
It has been established that the higher the sample rate, the more accurate the representation of a waveform in a digital system. However, although there are many reasons and arguments for higher sample rates, there are two general standards: 44100 samples per second and 48000 samples per second, with the former being most commonplace. The main consideration for this is the fact that the human hearing range extends, at maximum, to an approximate limit (that varies from person to person) of 20000 Hz. Frequencies above this are inaudible. Considering the example of 44.1 kHz, we find that the Nyquist frequency evaluates to 22050 Hz, which is more than the human hearing system is capable of perceiving. There are other reasons for this particular sample rate, but that is beyond the scope of this book.
Figure 2.3. Effects of increased sample rate and bit depth on representing a continuous analogue signal.
There is one more important factor to consider when considering the sampling process: bit depth. Bit depth represents the precision with which the amplitude is measured. In the same way that there are a limited amount of samples per second in a conversion process, there are also a limited amount of amplitude values for a sample point, and the greater the number, the greater the accuracy. A common bit resolution found in most standard digital audio systems (Hi-Fi, Compact Disc) is 16 binary bitswhich allows for a range of 65536 ( ) individual amplitude values at a point in time. ) only allows for four
Lower bit values result in a greater distortion of the sound - a two bit system (
different amplitudes, which results in a massively inaccurate approximation of the input signal.
Oscillators and Wavetables

[edit]Oscillators
Figure 5.1 Sine, square, triangle, and sawtooth waveforms
An oscillator is a repeating waveform with a fundamental frequency and peak amplitude and it forms the basis of most popular synthesis techniques today. Aside from the frequency or pitch of the oscillator and its amplitude, one of the most important features is the shape of its waveform. The timedomain waveforms in Fig. 5.1 show the four most commonly used oscillator waveforms. Although it is possible to use all kinds of unique shapes, these four each serve a range of functions that are suited to a range of different synthesis techniques; ranging from the smooth, plain sound of a sine wave, to the harmonically rich buzz of a sawtooth wave.
Oscillators are generally controlled by a keyboard synthesiser or MIDI protocol device. A key press will result in a MIDI note value which will be converted to a frequency value (Hz) that the oscillator will accept as its input, and the waveform period will repeat accordingly to the specified frequency. From here, the sound can be processed or manipulated in a variety of ways in the synthesizer or program to enrich or modify the sound further. [edit]Generating [edit]Sine
oscillator waveforms
wave
As mentioned previously, the sine wave can be considered the most fundamental building block of sound. The best way to generate an oscillator which produces this waveform is to make use of an inbuilt library or function in the system concerned. Many programming languages have standard mathematics libraries with many of the trigonometric functions represented. A cycle of a sine wave is radians long and has a peak amplitude of , as shown in Fig. 5.2. In a digital system, the generated waves will be a series of equally-spaced values at the sample rate.
Figure 5.2 One cycle of a sine wave with phase 2 (radians).
With a sample rate of 44100 cycles per second, and a required cycle length of 1 second, it will take 44100 samples to get from 0 to . In other words, we can determine the steps per cycle from cycle length :
Where
is the sample rate. Each step will therefore take the following amount in radians:
or
Where , in the second result, is the same result in terms of frequency. The importance of this is that it is possible to expand it into an algorithm that will be suitable for generating a sinusoidal wave of a user specified frequency and amplitude- effectively the simplest synthesizer possible! A sinusoidal wave can be generated by repeatedly incrementing a phase value by an amount required to reach a desired number of length cycles a second, at the sample rate. This value can be passed to a sine function to create the output value, between the user specified peak amplitude. Input: Peak amplitude (A), Frequency (f) Output: Amplitude value (y) y = A * sin(phase) phase = phase + ((2 * pi * f) / samplerate) if phase > (2 * pi) then phase = phase - (2 * pi) The most important thing to note about this algorithm is that when the phase value has exceeded it will subtract by one whole period. This is to ensure that the function "wraps" round to the correct position instead of going straight back to 0; if a phase increment oversteps and resets to 0, undesirable discontinuities would occur, causing harmonic distortion in the oscillatory sound. [edit]Square
wave
The square wave cannot be generated from a mathematical function library so easily but once again the algorithm is particularly straightforward since it is constructed from straight line segments. Unlike the sine wave, square waves have many harmonics above their fundamental frequency, and have a much brighter, sharper timbre. After examining a number of different waveforms it will start to become apparent that waveforms with steep edges and/or abrupt changes and discontinuities are usually harmonically rich. (Note that the following square, sawtooth, and triangle functions are "naive"; they are equivalent to sampling the ideal mathematical functions without first bandlimiting them. In other words, all of the harmonics above the Nyquist frequency will be aliased back into the audible range. This is most obvious when sweeping one of these waveforms into the high frequencies. The aliased harmonics will move up and down the frequency spectrum, making "radio tuning" sounds in the background. A better method to produce waveforms for audio would be additive synthesis, or something like MinBLEPs. A properly-bandlimited waveform will have "jaggies" as you approach the discontinuities instead of piecewise straight lines.)
Figure 5.3 One cycle of a square wave with phase 2 (radians).
The square wave is constructed in a very similar fashion to the sine wave, and we use the same approach by cycling through a pattern with a phase variable, and resetting once we exceed radians. Input: Peak amplitude (A), Frequency (f) Output: Amplitude value (y) if phase < pi then y = A else y = -A phase = phase + ((2 * pi * f) / samplerate) if phase > (2 * pi) then phase = phase - (2 * pi) As is evident there is no reliance on an external function, the square wave can be defined by simple arithmetic, since it essentially switches between two values per cycle. One can expand on this by introducing a new variable which controls the point in the cycle the initial value switches to its signed value; this waveform is known as a pulse wave. Pulse waves are similar in character to square waves but the ability to modulate the switching point offers greater sonic potential.
[edit]Sawtooth
wave
Figure 5.4 One cycle of a sawtooth wave with phase 2 (radians).
The sawtooth wave is more similar in sound to a square wave although it has harmonic decay and an appropriately "buzzy" timbre. It is constructed out of diagonal, sloping line segments and as such requires a line gradient equation in the algorithm. The mathematical form:
Where represents amplitude and form as follows:
is the phase. This can be incorporated into the algorithmic
Input: Peak amplitude (A), Frequency (f) Output: Amplitude value (y) y = A - (A / pi * phase) phase = phase + ((2 * pi * f) / samplerate) if phase > (2 * pi) then phase = phase - (2 * pi)
[edit]Triangle
wave
Figure 5.5 One cycle of a triangle wave with phase 2 (radians).
The triangle wave shares many geometric similarities with the sawtooth wave, except it has two sloping line segments. The algebra is slightly more complex and programmers may wish to consider consolidating the line generation into a new function for ease of reading. Triangle waves contain only odd-integer harmonics of the fundamental, and have a far softer timbre than square or saw waves, which is nearer to that of a sine wave. The mathematical form of the two lines segments are: For to radians:
For
to
radians:
The algorithm then, is similar to the previous examples but with the gradient equations incorporated into it. In the example algorithms presented here it is evident that a range of different waveshapes can be designed, but there is the realisation that the shapes can only be described as mathematical functions. Complex shapes may become very demanding due to the increased processing power for more complicated mathematical statements. Input: Peak amplitude (A), Frequency (f) Output: Amplitude value (y)
if phase < pi then y = -A + (2 * A / pi) * phase else y = 3A - (2 * A / pi) * phase phase = phase + ((2 * pi * f) / samplerate) if phase > (2 * pi) then phase = phase - (2 * pi) [edit]Wavetables There may be a situation or a desire to escape the limitations or complexity of defining an oscillatory waveform using mathematical formulae or line segments. As mentioned before, this could be a concern for processing power, or simply the fact that it would be easier to specify the shape through an intuitive graphical interface. In cases like these, musicians and engineers may use wavetables to be their source oscillator. Wavetables are popular in digital synthesis applications because accessing a block of memory is computationally faster than calculating values using mathematical operations.
Figure 5.6 The basic structure of a wavetable oscillator.
The wavetable is in essence an array of N values, with values 1 through to N representing one whole cycle of the oscillator. Each value represents an amplitude at a certain point in the cycle. Wavetables are often displayed graphically with the option for the user to draw in the waveshape he or she requires, and as such it represents a very powerful tool. There is also the possibility of loading a prerecorded waveshape as well; but note that a wavetable oscillator is only a reference table for one cycle of a waveform; it is not the same as a sampler. The wavetable has associated with it a read pointer which cycles through the table at the required speed and outputs each amplitude value in sequence so as to recreate the waveform as a stream of digital values. When the pointer reaches the last value in the table array, it will reset to point one and begin a new cycle.
[edit]Using
wavetables
The size of the wavetable and the sampling rate of the system determine what the fundamental frequency of the wavetable oscillator will be. If we have a wavetable with 1024 individual values and a sampling rate of 44.1 kHz, it will take:
seconds to complete one cycle. As previously shown, frequency can be determined from us a fundamental frequency of:
, giving
Hz.
It therefore becomes apparent that, in order to change the frequency of our oscillator we must change either the size of the wavetable or the sampling rate of the system. There are some real problems with both approaches: Changing the wavetable size means switching to a different sized wavetable with the same, updated waveform. This would require dozens, hundreds, or even thousands of individual wavetables, one for each pitch, which is obviously totally inefficient and memory-consuming. Digital systems, especially mixers that combine several synthesized or recorded signals, are designed to work at a fixed sampling rate and to make sudden changes to it is once again inefficient and extremely hard to program. The sample rate required to play high frequencies with an acceptable level of precision becomes very high and puts high demand on the system.
One of the most practical and widely-used approaches to playing a wavetable oscillator at different frequencies is to change the size of the "steps" that the read pointer makes through the table. As with our previous example, our 1024-value wavetable had a fundamental frequency of 43.5 Hz when every point in the table was outputted. Now, if we stepped through the table every 5 values, we would have:
Hz.
It follows from this a general formulae for calculating the required step size, S for a given frequency, f:
Where N is the size of the wavetable and is the sample rate. It is important to note that because the step size is being altered, the read pointer may not land exactly on the final table value N, and so it must "wrap around" in the same fashion as the functionally generated waveforms in the earlier section. This can be done by subtracting the size of the table from the current pointer value if it exceeds N; the algorithmic form of which can easily be gleaned from the examples above.
[edit]Frequency
precision and interpolation
We must consider that some frequency values may generate a step size that has a fractional part; that is, it is not an integer but a rational number. In this case we find that the read pointer will be trying to step to locations in the wavetable array that do not exist, since each member of the array has an integer-valued index. There may be a value at position 50, but what about position 50.5? If we desire to play a frequency that uses a fractional step size, we must consider ways to accommodate it: Truncation and rounding. By removing the fraction after the decimal point we reduce the step size to an integerthis is truncation. For instance, 1.3 becomes 1, and 4.98 becomes 4.Rounding is similar, but chooses the closest integer3.49 becomes 3, and 8.67 becomes 9. For simple rounding, if the value after the decimal point is less than 5, we round down (truncate), otherwise we round up to the next integer. Rounding may be supported in the processor at no cost, or can be done by adding 0.5 to the original value and then truncating to an integer. For wavetable synthesis, the only difference between truncation and rounding is a constant 0.5 sample phase shift in the output. Since that is not detectable and neither an improvement nor a detrimenta decision between truncation and rounding comes down to whichever is more convenient or quicker. Linear interpolation. This is the method of drawing a straight line between the two integer values around the step location and using the values at both points to generate an amplitude value that interpolates between them. This is a more computationally demanding process but introduces greater precision. Higher order interpolation. With linear interpolation considered first order interpolation (and truncation and rounding considered zero order interpolation) there are many higher-order forms that are commonly usedcubic Hermite, Lagrangian, and others. Just as linear interpolation requires two points for the calculation, higher orders require even more wavetable points to be used in the calculation, but produce a more accurate, lower distortion result. Sinc interpolation can be made arbitrarily close to perfect, at the expense of computation time.
By increasing the wavetable size, the precision of the above processes becomes greater and will result in a closer fit to the idealised, intended curve. Naturally, large wavetable sizes result in greater memory requirements. Some wavetable synthesizer hardware designs prefer table sizes that are powers of two (128, 256, 512, 1024, 2048, etc.), due to shortcuts that exploit the way in which digital memory is constructed (binary).
Additive Synthesis
[edit]Introduction As previously discussed in Section 1, sine waves can be considered the building blocks of sound. In fact, it was shown in the 19th Century by the mathematician Joseph Fourier that any periodic function can be expressed as a series of sinusoids of varying frequencies and amplitudes. This concept of constructing a complex sound out of sinusoidal terms is the basis for additive synthesis, sometimes called Fourier synthesis for the aforementioned reason. In addition to this, the concepts of additive synthesis have also existed since the introduction of the organ, where different pipes of varying pitch are combined to create a sound or timbre.
Figure 6.1. Additive synthesis block diagram.
A simple block diagram of the additive form may appear like in Fig. 6.1, which has a simplified mathematical form based on the Fourier series:
Where
is an offset value for the whole function (typically 0),
are the amplitude weightings for
each sine term, and is the frequency multiplier value. With hundreds of terms each with their own individual frequency and amplitude weightings, we can design and specify some incredibly complex sounds, especially if we can modulate the parameters over time. One of the key features of natural sounds is that they have a dynamic frequency response that does not remain fixed. However, a popular approach to the additive synthesis system is to use frequencies that are integer multiples of the fundamental frequency, which is known as harmonic additive synthesis. For example, if the first oscillator's frequency, represents the fundamental frequency of the sound at 100 Hz, then the second oscillator's frequency would be , and the third and so on. This series of sine waves produces an even "harmonic" sound that can be described as "musical". Oscillator frequency relationships that are not integer related, on the other hand, are called "inharmonic" and tend to be noisier and take on the characteristics of bells or other percussive sounds.
[edit]Constructing
common harmonic waveforms in additive synthesis
Figure 6.2. The first four terms of a square wave constructed from sinusoidal components (partials).
If we know the amplitude weightings and frequency components of the first sinusoidal components or partials of a complex waveform, we can reconstruct that waveform using an additive system with oscillators. The popular waveforms square, sawtooth and triangle are harmonic waveforms because the constituent sinusoidal components all have frequencies that are integer multiples of the fundamental. The property that distinguishes them in this form is that they all have unique amplitude weightings for each sinusoid. Fig. 6.2 demonstrates the appearance of the time-domain waveform as a set of sines at unique amplitude weightings are added together; in this case the form begins to approximate a square wave, with the accuracy increasing with each added partial. Note that to construct a square wave we only include odd numbered harmonics- the amplitude weightings for , , etc. are 0. Below is a table that demonstrates the partial amplitude weightings of the common waveshapes: Waveshape General rule
Sine
Square
1/3
1/5
1/7
1/9
for odd
Triangle
0 -1/9 0 1/25 0 -1/49 0 1/81
for odd
, alternating + and -.
Sawtooth
1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9
A conclusion you may draw from Fig. 6.2 and the table is that it requires a large amount of frequency partials to create a waveform that closely approximates the idealised mathematical forms of the waveforms introduced in Section 5. For this reason, it should be apparent that additive synthesis techniques are perhaps not the best method for producing these forms. The strengths of additive synthesis lie in the fact that we can exert control over every partial component of our sound, which can produce some very intricate and wonderful results. With the constant modification of the frequency and amplitude values of each oscillator, the possibilities are endless. Some examples of ways to control the weightings and frequencies of each component oscillator are illustrated: Manual control. The user controls a bank of oscillators with an external control device (typically MIDI), tweaking the values in real time. More than one person can join in and change / alter the timbre to their whims. External data. Digital information from another source is taken and converted into appropriate frequency and amplitude values. The varying data source will then effectively be in 'control' of the timbral outcomes. Composers have been known to use data from natural sources or pieces derived from interesting geometric, aleatoric and mathematical models. Recursive data. Given a source set of values and a set of algorithmic rules, the control parameters reference the previous value entered into the system to determine the result of the next one. Users may wish to "interfere" with the system to set the process on a new path. See Markov chains.
There is, however, the major consideration of computational power, however - complex sounds may require many oscillators all operating at once which will put major demand on the system in question.
[edit]Additive
resynthesis
In Section 1 it was mentioned that just as it is possible to construct waveforms using additive techniques, we can analyse and deconstruct waveforms as well. It is possible to analyse the frequency partials of a recorded sound and then resynthesize a representation of the sound using a series of sinusoidal partials. By calculating the frequency and amplitude weighting of partials in the frequency domain (typically using a Fast Fourier transform), an additive resynthesis system can construct an equally weighted sinusoid at the same frequency for each partial. Older techniques rely on banks of filters to separate each sinusoid; their varying amplitudes are used as control functions for a new set of oscillators under the user's control. Because the sound is represented by a bank of oscillators inside the system, a user can make adjustments to the frequency and amplitude of any set of partials. The sound can be 'reshaped' - by alterations made to timbre or the overall amplitude envelope, for example. A harmonic sound could be restructured to sound inharmonic, and vice versa.
Subtractive Synthesis
Whereas additive synthesis is the process of combining individual sinusoidal partials to construct a complex sound, subtractive synthesis is essentially the reverse of this process. By starting with a harmonically (or partially) rich sound, a subtractive system will filter and modify the signal to reduce it to a desired form. By doing this, one can use one or two oscillators instead of a bank of 10 to achieve similar sonic results. Subtractive synthesis is an extremely popular method of synthesis that has been employed in hardware and software synthesizers since their popularity skyrocketed in the 1970s. This method of synthesis can be found traditionally in old modular or compact analogue synthesizers as well as modern virtual analogue models and software synthesizers. Fig. 7.1 illustrates a simple block diagram form of a subtractive system.
Figure 7.1 Block diagram of a simplified subtractive synthesis system.
Modulation Synthesis
[edit]Introduction When we talk about modulation from an audio synthesis point of view, we refer to a time-varying signal (the carrier) being affected in some way by another (the modulator). Modulation can be found in a range of different sound effects and synthesis techniques and some of these effects occur naturally and help us to identify certain types of sound; for instance the commonly found performance styles of tremolo (modulation of amplitude) and vibrato (modulation of frequency) that are used in many stringed instruments are examples of this. Modulation is typical in synthesis because it enriches the character of the sound and also adds to the variance in timbre / character over time which is so often found in nature.
Figure 8.1 Unipolar (between 0 and 1) and Bipolar (between -1 and +1) waveforms.
In the two basic methods of modulation synthesis that occur, ring modulation and amplitude modulation, there are two unique types of signal that occur in each method: bipolar and unipolarsignals. A bipolar signal is the type of signal we have been examining in previous chapters, it has both a negative and positive amplitude and the waveform generally "rests" around zero in a time-domain plot. A unipolar signal is a bipolar signal that has been constant-shifted, that is, a constant value added to the overall signal to shift it into a range above zero, typically between 0 and 1. The reason for these two different types of signal follows.
[edit]Ring
Modulation
Ring modulation is the multiplication of two bipolar audio signals by each other. Each value of a carrier signal, C, is multiplied by a modulator signal, M, to create a new ring-modulated signal, R:
There are different ways to implement this; most likely it is suitable to simply multiply the two signals, but alternatively the amplitude input of a carrier module can be the output of a modulator module, as well. The frequency of the modulator signal also plays an important role in the characteristic of the RM signal. From this, we achieve the following important result:
If the frequency of M is under 20 Hz or so, we will generally perceive the tremolo effect, where the amplitude of C will vary at the frequency ofM. Periodic signals M with a frequency below 20 Hz are called low-frequency oscillators.
When the frequency of M is in the audible range, that is, 20 Hz or more, there is an effect on the timbre of the signal. The variations in amplitude become fast enough that the modulator generates a set of frequency sidebands. With two sine waves as carrier and modulator, RM will generate a frequency spectrum containing two sidebands, which are the sum and difference of the carrier and modulator frequencies. When this occurs, the actual carrier frequency is removed from the spectrum, leaving two harmonic sidebands (if the frequency of C and M are an integer ratio to one another) or two inharmonic sidebands (if the ratio is otherwise). For instance, if the carrier is 900 Hz and the modulator is 500 Hz, we will get two sidebands; one at 400 Hz (900 - 500) and one at1400 Hz (900 + 500). If C and M are not sine waves (i.e. their waveforms are more complex) then the resultant signal will contain more than one or many different sidebands at different frequencies and amplitudes, indicating a more complex sound. Figure 8.2 illustrates two examples of ring modulation - the original example with frequencies C = 900 and M = 500 but also when C = 400 and M = 1000, which introduces negative frequencies into the spectrum. This results in a "wrapping" phenomenon where the difference sideband of C and M is -600 Hz! As a result, we find that a difference sideband occurs at 600 Hz, and that is true for any negative frequency; a sideband will occur at its the unsigned (positive) frequency.
Figure 8.2 Frequency-domain spectra of ring modulated signals. a) a C frequency of 900 and M of 500 and b) a C frequency of 400 and M of 1000, showing the emergence of negative frequencies into the audible spectrum.
[edit]Amplitude
Modulation
Amplitude modulation is similar to ring modulation, except it works with a unipolar modulator. The amplitude of the carrier signal, C, is modulated by the unipolar modulator M. At infrasonic (below 20hz) frequencies, the modulator serves to attenuate or boost the amplitude of the signal. A simple example of this would be a typical ADSR envelope, which scales the amplitude of the carrier signal over time from 0 to 1 and back down again. However, for synthesis techniques we generally consider the effect that periodic modulator signals above 20 Hz have on a carrier. Once again, the mathematical form is simply the product of two signals:
Where C is the carrier signal and M is a unipolar modulator, typically set to vary between values of 0 and 1. Without mention of the unipolar modulator, this technique would appear to be identical to ring modulation. Like ring modulation, amplitude modulation produces a pair of sidebands for every sinusoidal component in the carrier and modulator, and these sidebands are generated at frequencies the sum and difference of the two signal frequencies. The difference between the two techniques is highlighted here:
The difference between amplitude modulation and ring modulation is that in AM the carrier frequency is preserved and the sidebands generated are at half the amplitude of the carrier amplitude.
Figure 8.3 The frequency-domain spectrum of an amplitude-modulated signal. The two sidebands are sum and difference frequencies of the carrier and modulator, C and M, and have amplitudes at half the amplitude of the carrier signal.
One of the advantages of AM, like its cousin, RM, is that using just two signals or oscillators, we can create somepartially rich signals. Using a harmonically dense signal such as a square wave oscillator can create a wealth of sidebands from a minimum of control parameters and computation. Control over these generated partials may not, however, be as detailed and straightforward as techniques such as additive synthesis. As a result, we find that AM and RM is used more often in signal processing than signal generation.
Figure 8.4 A time-domain plot of an amplitude-modulated signal. M(t), the sinusoidal 10hz modulator signal, C(t) the sinusoidal 220 Hz carrier signal, and A(t) the two combined using amplitude modulation.
Expanding on amplitude modulation requires us to introduce more parameters and elements to the technique to give it some "weight" with other, more popular techniques. For instance, we can introduce into the system a unipolar low-frequency oscillator which is set to control the amplitude of the modulator; by changing the amplitude of the modulator we are modifying what is known as the modulation index; a factor which controls the strength of the AM sidebands. In addition to modifying the amplitude of the modulator, we can also modify the frequency of the modulator. As you may expect, this causes a shift in the frequencies of the sidebands generated through the AM process and can, carefully controlled, produce some interesting, dynamic sounds that are hard to produce with other techniques. Breaking away from sinusoidal oscillators in both cases of carrier and modulator, and even the modulation of the modulation index is one of the first steps to exploring this technique; try experimenting with the waveshapes introduced in the previous chapters.
Physical Modelling Synthesis

[edit]Introduction Physical Modelling synthesis is not confined to a particular technique, but rather represents a family of approaches to synthesis. Physical modelling systems attempt to model the propagation of sound
energy in a system, typically starting with a mathematical model or algorithm that is often recursive. Physical modelling techniques often start by attempting to replicate the basic structure of an acoustic instrument and hence mimic the sound it makes when 'excited'. This excitation normally comes in the form of an initial impulse, typically a short burst of noise. The main advantage of this system is that one can generate convincing acoustic sounds, such as a plucked string or drum hit. Aside from this, other advantages include the ability to tweak parameters (e.g. instrument body density, string length) in order to create specific types of instrument from a generic model. In the extremes, one can specify strange and unrealistic parameters in order to create models of instruments that are impossible to realise physically! Physical modelling systems typically employ delay lines in their structure and this can mean that the output of the system can go out of control and create unwanted feedback.
[edit]The
Karplus-Strong Algorithm
Alexander Strong and Kevin Karplus developed software and hardware implementations for this algorithm, naming it "Digitar" synthesis.
Synthesis software / tools

[edit]Analysis Audacity - Wave editor with analysis and basic synthesis / modification tools. Free SPEAR - Spectrum analysis and resynthesis tool. Free
[edit]Hosts
(VST, MIDI etc.)
VSTHost - Host for VST plugins. Windows. Free
[edit]Modular
/ environment
Kyma - Sound design environment. Commercial Max / MSP - Audio processing, synthesis and analysis environment. Commercial Reaktor - Audio processing, synthesis and analysis environment. Commercial Synthedit - Modular synthesizer / VST plugin builder. Commercial Puredata - Audio processing, synthesis and analysis environment. Free
[edit]Programming SuperCollider - Object orientated audio programming language. Free CSound - Audio programming language. Free Nyquist - Audio programming language. Free
[edit]Synthesizers SynFactory - Modular software synthesizer. Free
Links and Bibliography

[edit]Links Sound On Sound Magazine - Many articles on synthesis available online to view. The Theory and Techniques of Electronic Music - Electronic music manual written by Miller Puckette, inventor of Puredata. Mathematics Of The Discrete Fourier Transform - Electronic manual on the application of mathematics to many areas of digital sound synthesis / processing by Julius O. Smith, inventor of digital waveguide synthesis.
[edit]Related
/ Useful Wikibooks
Control Systems - Inter-disciplinary engineering text that analyzes the effects and interactions of mathematical systems.
[edit]Bibliography Boulanger, R. 2000. The CSound Book. London: MIT Press. ISBN: 978-0262522618 Loy, G. 2006. Musimathics: Mathematical Foundations Of Music, Vol. 1. MIT Press. ISBN: 9780262122825 Loy, G. 2007. Musimathics: Mathematical Foundations Of Music, Vol. 2. MIT Press. ISBN: 9780262122856 Roads, C. 1996. The Computer Music Tutorial. London: MIT Press. ISBN: 978-0-262-68082-0 Roads, C. 2002. Microsound. London: MIT Press. ISBN: 978-0-262-18215-7

Sound Synthesis Theory Explained

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Sound Synthesis Theory Explained

Enviado por

Direitos autorais:

Formatos disponíveis

Sound Synthesis Theory

The Korg MS10 is an example of an early analogue synthesizer.

What is sound synthesis?

Background and history

Sound in the Time Domain

Figure 1.2. A more complex waveform.

Figure 1.3. A time-domain plot of a drum kit over 2 seconds.

Sinusoids, frequency and pitch

Cycle length (t) Frequency (Hz) Note name

Fig. 1.4. The relationship between wavelength, frequency and pitch.

and deconstruction of sinusoids

Sound in the Digital Domain

<-> Analogue Conversion

Figure 2.1. An overview of the digital <-> analogue conversion process.

Figure 2.2. Discrete samples (red) of a continuous waveform (grey).

is the sample rate,

is the highest frequency in the signal.

accuracy and bit depth

Oscillators and Wavetables

Figure 5.1 Sine, square, triangle, and sawtooth waveforms

Figure 5.2 One cycle of a sine wave with phase 2 (radians).

Figure 5.3 One cycle of a square wave with phase 2 (radians).

Figure 5.4 One cycle of a sawtooth wave with phase 2 (radians).

Where represents amplitude and form as follows:

is the phase. This can be incorporated into the algorithmic

Figure 5.5 One cycle of a triangle wave with phase 2 (radians).

Figure 5.6 The basic structure of a wavetable oscillator.

precision and interpolation

Figure 6.1. Additive synthesis block diagram.

is an offset value for the whole function (typically 0),

are the amplitude weightings for

common harmonic waveforms in additive synthesis

0 -1/9 0 1/25 0 -1/49 0 1/81

1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9

Figure 7.1 Block diagram of a simplified subtractive synthesis system.

Physical Modelling Synthesis

Synthesis software / tools

(VST, MIDI etc.)

VSTHost - Host for VST plugins. Windows. Free

[edit]Synthesizers SynFactory - Modular software synthesizer. Free

Links and Bibliography

Você também pode gostar