Real-time digital systems often require the calculation of a root-mean, such as a root-mean square (RMS) level or average magnitude of a complex signal.
Real-time digital systems often require the calculation of a root-mean, such as a root-mean square (RMS) level or average magnitude of a complex signal.
Real-time digital systems often require the calculation of a root-mean, such as a root-mean square (RMS) level or average magnitude of a complex signal.
calculations By Brian Neunaber Non-recursive average Digital Systems Architect of The non-recursive average, or Software and Firmware moving average, is the weighted QSC Audio Products sum of N inputs: the current input and N-1 previous inputs. In digital Real-time digital systems of- filtering terminology, this is called ten require the calculation of a a finite impulse response, or FIR root-mean, such as a root-mean filter (Equation 1): square (RMS) level or average magnitude of a complex signal. While averaging can be effi- ciently implemented by most microprocessors, the square root may not be--especially with low- cost hardware. If the processor The most common use of the doesn’t implement a fast square moving average typically sets root function, it must be imple- the weights such that an=1/N. mented in software; although If we were to plot these weights as applying a specific filter to the recursive average’s window is a this yields accurate results, it may versus time, we would see the signal. The disadvantage of these decaying exponential (Figure 1). not be efficient. “window” of the input signal that windows is that computational Technically, the recursive average One common method for com- is averaged at a given point in complexity and storage require- has an infinite window, since it puting the square root is Newton’s time. This 1/N window is called a ments increase with N. never decays all the way to zero. method, which iteratively con- rectangular window because its In digital filtering terminology, verges on a solution using an initial shape is an N-by-1/N rectangle. Recursive average this is known as an infinite im- estimate. Since we’re computing There is a trick for computing The recursive average is the pulse response, or IIR filter. the square root of a slowly vary- the 1/N average so that all N weighted sum of the input, N From Figure 1, we see that ing average value, the previous samples need not be weighted previous inputs, and M previous earlier samples are weighted root-mean value makes a good and summed with each output outputs (Equation 3): more than later samples, allowing estimate. Furthermore, we can calculation. Since the weights combine the iterative Newton’s don’t change, you can simply method with a first-order recursive add the newest weighted input averager, resulting in a super-ef- and subtract the Nth weighted ficient method for computing the input from the previous sum root-mean of a signal. (Equation 2): In this article, I’ll develop and present three efficient recur- us to somewhat arbitrarily define sive algorithms for comput- an averaging time for the recur- ing the root-mean, illustrating sive average. For the first-order each method with signal flow case, we define the averaging diagrams and example code. time as the time at which the To some degree, each of these While this technique is com- The simplest of these in impulse response has decayed to methods trades hardware com- putationally efficient, it requires terms of computational com- a factor of 1/e, or approximately plexity for error. I’ll compare the storage and circular-buffer man- plexity and storage (while still 37%, of its initial value. An equiva- computational performance and agement of N samples. being useful) is the first-order lent definition is the time at error of each method and sug- Of course, many other win- recursive average. In this case, which the step response reaches gest suitable hardware for each dow shapes are commonly used. the average is computed as the 1–(1/e), or approximately 63%, of implementation. Typically, these window shapes weighted sum of the current its final value. Other definitions resemble, or are a variation of, a input and the previous output. are possible but will not be cov- Root-Mean raised cosine between –π/2 and The first-order recursive average ered here. The weighting of the The root-mean is computed as π/2. These windows weight the also lends itself to an optimisa- sum determines this averaging the square root of the average samples in the centre more than tion when combined with the time; to ensure unity gain, the over time of its input. This aver- the samples near the edges. Gen- computation of the square root, sum of the weights must equal age may be recursive or non-re- erally speaking, you should only which we’ll discuss shortly. one. As a consequence, only one cursive, and I’ll briefly review the use one of these windows when In contrast to the non-re- coefficient needs to be specified general case for both. there is a specific need to, such cursive average, the first-order to describe the averaging time.
eetindia.com | February 2006 | EE Times-India
Therefore, for first-order recur- when this technique can be used tion 9, we get (Equation 10): sentation for integer numbers. sive averaging, we compute the is RMS metering of a signal. A Implementation must be ac- mean level as (Equation 4): meter value that is displayed visu- complished using floating-point ally may only require an update or mixed integer/fractional every 50 to 100ms, which may number representation. be far less often than the input signal is sampled. Keep in mind, Rearranging Equation 9, we Root-mean using however, that recursive averag- get (Equation 11): Newton’s Method where x(n) is the input, m(n) ing should still be computed at A subtle difference between is the mean value, and a is the the Nyquist rate. Equations 10 and 11 is that m be- averaging coefficient. The aver- comes m(n), meaning that we’re aging coefficient is defined as Logarithms attempting to find the square (Equation 5): Recall that (Equation 7): root of a moving target. However, since m(n) is a mean value, or where y(n) is the approxima- slowly varying, it can be viewed tion of the square root of m(n). as nearly constant between itera- Equation 11 requires a divide tions. Since y(n) will also be slowly operation, which may be incon- varying, y(n-1) will be a good ap- venient for some processors. As proximation to y(n) and require where t is the averaging time, If you’ll be computing the an alternative, we can calculate fewer iterations--one, we hope-- and fS is the sampling frequency. logarithm of a square root, it’s and multiply the to achieve a good estimate. The root-mean may then be far less computationally expen- result by m to get To calculate the root-mean, one calculated by taking the square sive to simply halve the result . Again using Newton’s may simply apply Newton’s Meth- root of Equation 4 (Equation 6, instead. A common example of Method, we find that we may od for calculating the square root where y(n) is the root-mean): this optimisation is the calcula- iteratively calculate the reciprocal to the mean value. As long as the tion of an RMS level in dB, which square root as (Equation 12): averaging time is long compared may be simplified as follows to the sample period (t &62;&62; Equation 6 (Equation 8): 1/fS), one iteration of the square root calculation should suffice for reasonable accuracy. This seems simple enough, but we can actu- Efficient computation methods ally improve the computational Googling “fast square root” will and calculate the square root efficiency, which will be discussed get you a plethora of informa- as (Equation 13): in one of the following sections. tion and code snippets on im- Using reciprocal square root plementing fast square-root al- Unlike the iterative square-root gorithms. While these methods method, however, the iterative may work just fine, they don’t Newton’s Method reciprocal square-root requires take into account the applica- Newton’s Method (also called no divide. This implementation tion in which the square root the Newton-Rapson Method) is a Although Newton’s Method is best suited for floating-point is required. Oftentimes, you well known iterative method for for the reciprocal square root processing, which can efficiently may not need exact precision estimating the root of an equa- eliminates the divide operation, handle numbers both greater to the last bit, or the algorithm tion.1 Newton’s Method can be it can be problematic for fixed- and less than one. We present this itself can be manipulated to quite efficient when you have a point processors. Assuming implementation as a signal flow optimise the computation of reasonable estimate of the result. that m(n) is a positive integer diagram in Figure 2. The averaging the square root. I present a few Furthermore, if accuracy to the greater than 1, yr(n) will be a coefficient, a, is defined by Equa- basic approaches here. last bit is not required, the num- positive number less than one- tion 5, and z–1 represents a unit ber of iterations can be fixed to -beyond the range of repre- sample delay. Only calculate it keep the algorithm deterministic. when you need it We may approximate the root Probably the simplest optimisa- of f(x) by iteratively calculating tion is to only calculate the square (Equation 9): root when you absolutely need it. Although this may seem obvious, it can be easily overlooked when computing the root-mean on ev- ery input sample. When you don’t need an output value for every in- put sample, it makes more sense If we wish to find , to compute the square root only then we need to find when you read the output value. the root to the equation f(y)=y2- One example of an application m. Substituting f(y) into Equa-
eetindia.com | February 2006 | EE Times-India
A code listing for a C++ class following root-mean equation that implements the computa- (Equation 14): tion in Figure 2 is presented in where a is defined by Equa- Listing 1. In this example class, tion 5. Now y(n) converges to initialisation is performed in the class constructor, and each call to CalcRootMean() performs one iteration of averaging and square- root computation. Listing 1. C++ class that computes the root-mean using Newton’s Method for the recipro- cal square root static const double Fs = the square root of the average 48000.0; // sample rate of x(n). An equivalent signal-flow static double AvgTime = 0.1; representation of Equation 14 is // averaging time presented in Figure 3. Here, an additional y(n–1) term is summed class RecipRootMean so that only one averaging coef- return RootMean; unsigned int nMaxVal; { ficient is required. Note that x(n) } public: and y(n–1) must be greater than }; IntNewtonRootMean() zero. { double Mean; A code listing for a C++ class With some care, Figure 3 may nRootMean = 1; // >0 double RecipRootMean; that implements the computation also be implemented in fixed- or divide will fail double AvgCoeff; shown in Figure 3 is presented point arithmetic as shown in nScaledRootMean = 0; in Listing 2. As in the previous Listing 3. In this example, scaling double AvgCoeff = 0.5 RecipRootMean() example, initialisation is per- is implemented to ensure valid * ( 1.0 - exp( -1.0 / (Fs * { formed in the class constructor, results. When sufficient word size AvgTime) ) ); AvgCoeff = 1.0 - exp( -1.0 and each call to CalcRootMean() is present, x is scaled by nAvgCo- nAvgCoeff = (unsigned / (Fs * AvgTime) ); performs one iteration of averag- eff prior to division to maximise int)floor( ( skScaleFactor Mean = 0.0; ing/square-root computation. the precision of the result. * AvgCoeff ) + 0.5 ); RecipRootMean = Listing 2. C++ class that Listing 3. C++ class that nMaxVal = (unsigned 1.0e-10; // 1 > initial implements the floating-point implements the fixed-point ver- int)floor( ( skScaleFactor RecipRootMean > 0 version of Figure 3 sion of Figure 3 / AvgCoeff ) + 0.5 ); } static const double Fs = static const double Fs = } ~RecipRootMean() {} 48000.0; // sample rate 48000.0; // ~IntNewtonRootMean() {} static double AvgTime = 0.1; sample rate double // averaging time static double AvgTime = unsigned int CalcRootMean(double x) 0.1; // aver- CalcRootMean(unsigned { class NewtonRootMean aging time int x) Mean += AvgCoeff * (x { static const unsigned int { - Mean); public: sknNumIntBits = 32; // # if ( x < nMaxVal ) RecipRootMean *= 0.5 * bits in int { ( 3.0 - (RecipRootMean * double RootMean; static const unsigned int nScaledRootMean += ( RecipRootMean * Mean) double AvgCoeff; sknPrecisionBits = sknNu- ( nAvgCoeff * x ) / nRoot- ); mIntBits / 2; Mean ) - ( nAvgCoeff * return RecipRootMean * NewtonRootMean() static const double skScal- nRootMean ); Mean; { eFactor = pow(2.0, (dou } } RootMean = 1.0; ble)sknPrecisionBits); else }; // > 0 or divide will fail static const unsigned int { AvgCoeff = 0.5 * ( 1.0 sknRoundOffset = nScaledRootMean += Using direct square root - exp( -1.0 / (Fs * Avg- (unsigned int)floor( 0.5 * nAvgCoeff * ( ( x / nRoot- Let’s go back and take a closer Time) ) ); skScaleFactor ); Mean ) - nRootMean ); look at Equation 11. Newton’s } } method converges on the solu- ~NewtonRootMean() {} class IntNewtonRootMean tion as quickly as possible with- { nRootMean = ( nScaled- out oscillating around it, but if we double public: RootMean + sknRound- slow this rate of convergence, the CalcRootMean(double x) Offset ) >> sknPrecision- iterative equation will converge { unsigned int nRootMean; Bits; on the square root of the aver- RootMean += AvgCoeff * unsigned int nScaledRoot- return nRootMean; age of its inputs. Adding the av- ( ( x / RootMean ) - Root- Mean; } eraging coefficient results in the Mean ); unsigned int nAvgCoeff; };
eetindia.com | February 2006 | EE Times-India
Divide-free RMS using normalisation Now we’ll look at the special case of computing an RMS value on fixed-point hardware that does not have a fast divide operation, which is typical for low-cost embedded processors. Although many of these proces- sors can perform division, they do so one bit at a time, requiring at least one cycle for each bit of word length. Furthermore, care must be taken to insure that the RMS calculation is implemented y(n–1) must be greater than zero. bits in y(n-1) filters before the iterative square with sufficient numerical preci- A sample code listing that mpy x0,x0,a a,x0 root. These filters may simply be sion. With fixed-point hardware, ;a=x(n)^2 one or more cascaded first-order the square of a value requires recursive sections. First-order twice the number of bits to retain ;x0=y_msw(n-1) sections have the advantage of the original data’s precision. mac -x0,x0,a x0,y1 producing no overshoot in the With this in mind, we ma- ;a=x(n)^2-y_msw(n-1)^2 step response. In addition, there nipulate Equation 14 into the is only one coefficient to adjust following (Equation 15): ;y1=y_msw(n-1) and quantisation effects (primar- Although the expression normf b1,a ily of concern for fixed-point x(n)2–y(n–1)2 must be calcu- ;normalize implementation) are far less than lated with double precision, this x(n)^2-y_msw(n-1)^2 by that of higher-order filters. implementation lends itself to y_msw(n-1) The implementer should be a significant optimisation. Note implements Figure 4 is shown move a,x0 aware that cascading first-order in Listing 4. This assembly-lan- ;x0=[x(n)^2- sections changes the definition guage implementation is for the y_msw(n-1)^2]norm(y_ of averaging time. A simple but Freescale (formerly Motorola) msw(n-1)) gross approximation that main- DSP563xx 24-bit fixed-point mpy x1,x0,a y:(r4),y0 tains the earlier definition of step processor. ;a= AVG_COEFF response is to simply divide the Listing 4. Freescale DSP563xx averaging time of each first-or- assembly implementation of ;*[x(n)^2-y_msw(n- der section by the total number that a/2y(n–1) acts like a level- divide-free RMS using normali- 1)^2]norm(y_msw(n-1)), of sections. However, it is the dependent averaging coefficient. sation implementer’s responsibility to If a slight time-dependent vari- RMS ;y0=y_lsw(n-1) verify that this approximation is ance in the averaging time can ; r4: address of output bits add y,a suitable for the application. be tolerated--which is often the 24-47 [y_msw(n)] ;a=y(n-1)+avg_coeff Second-order sections may case--1/y(n–1) can be grossly ap- ; r4+1: address of output bits also be used, if you want (for ex- proximated. On a floating-point 0-23 [y_lsw(n)] ;*[x(n)^2-y_msw(n- ample) a Bessel-Thomson filter re- processor, shifting the averaging ; x0: input [x(n)] 1)^2]norm(y_msw(n-1))} sponse. If second-order sections coefficient to the left by the nega- move a0,y:(r4)- are used, it’s best to choose an tive of the exponent approxi- FS equ 48000.0 ;save y_lsw(n) odd-order composite response mates the divide operation. This ;sampling rate in Hz move a,y:(r4) since the averaging square- process is commonly referred AVG_TIME equ 0.1 ;save y_msw(n) root filter approximates the final to as normalisation. Some fixed- ;averaging time in rts first-order filter with Q=0.5. Care point DSPs can perform normali- seconds must be taken to minimise the sation by counting the leading AVG_COEFF equ @XPN(- Of course, this method can be overshoot of this averaging filter. bits of the accumulator and 1.0/(FS*AVG_TIME)) ;cal- implemented even without fast Adjusting the averaging time of shifting the accumulator by that culate avg_coeff normalisation. You can imple- this filter in real time will prove number of bits.2 In both cases, ment a loop to shift x(n)2–y(n–1)2 more difficult, since there are a the averaging coefficient will be move #>AVG_COE to the left for each leading bit in number of coefficients that must truncated to the nearest power FF,x1 ;load y(n–1). This will be slower but can be adjusted in unison to ensure of two, so the coefficient must avg_coeff be implemented with even the stability. be multiplied by 3/2 to round move y:(r4)+,a simplest of processors. the result. This implementation ;get y_msw(n-1) Results is shown in Equation 16. move y:(r4),a0 Higher Order Averaging Three methods of calculating Figure 4 is the signal-flow dia- ;get y_lsw(n-1) Higher order recursive averag- the RMS level are compared in gram that represents Equation clb a,b ing may be accomplished by Figure 5. The averaging time is 16. Just as in Figure 3, x(n) and ;b=number of leading inserting additional averaging set to 100ms, and the input is one
eetindia.com | February 2006 | EE Times-India
second of 1/f noise with a 48kHz sampling frequency. The first trace is the true RMS value calculated using Equation 6. The second trace is the RMS calculation using Equation 14. The third trace is the no-divide calculation of Equation 16. The fourth trace is the RMS value using the reciprocal square- root method of Equation 13. For the most part, the four traces line up nicely. All four ap- proximations appear to converge at the same rate as the true RMS value. As expected, the largest deviation from the true RMS value is the approximation of Equation 16. This approximation will have the greatest error during large changes in the level of the input signal, although this error is tem- porary: the optimised approxima- tion will converge upon the true RMS value when the level of the input signal is constant. The errors between the three approximations and the true RMS value are shown in Figure 6. The error of the RMS approxima- tion using Equation 14 slowly decreases until it is below 1E–7, which is sufficient for 24-bit ac- curacy. The optimised approxi- mation of Equation 16 is sub- stantially worse, at about 1E–4, but still good enough for many applications. The approximation that uses the reciprocal square root is “in the noise”--less than 1E–9. For highly critical floating- point applications, this is the efficient method of choice. As you would expect, the errors discussed above will be worse with shorter averaging times and bet- ter with longer averaging times. aging with Newton’s meth- trades off hardware capabilities Kent, Boston, pp. 170-176, Table 1 summarises the approxi- od for calculating the square for error, most of you should 1988. mate error versus averaging time root, you’ll gain a very efficient find one of these methods suit- 2. Motorola. DSP56300 Family of these three methods, along method for computing the able for your application. Manual, Rev. 3, Motorola Litera- with suitable hardware architec- root-mean. Although the three ture Distribution, Denver, 2000. ture requirements. methods I presented here are Endnotes: Suitable for average reader developed for different hard- 1. D. G. Zill. Calculus with Ana- By combining recursive aver- ware and each, to some degree, lytic Geometry, 2nd ed., PWS- Email Send inquiry