Você está na página 1de 32

XAudio2 Performance Tips

Tom Mathews Lead Developer Advanced Technology Group Microsoft

Overview
XAudio2 overview Voice & Graph optimization xAPO optimization Voice reuse Compression Streaming Debugging / Performance analysis

What Is XAudio2
Low-level cross-platform game audio API
Play hundreds of sounds at once Loop, start, stop, adjust sounds at any time Volume, pitch, filter, reverb, DSP Identical code on both platforms

Building block for higher-level sound design tools such as the XACT3 engine Replaced XAudio1 Replaced DirectSound for gaming purposes

Features
Flexible channel routing
Any channel can be sent to any other channel with attenuation/amplification

Multistage submixing
For example, each car can have a submix (exhaust, transmission, engine, etc.), and each cars mix can then be fed into another submix for environmental effects

Advanced Features
Deferred commands
Most operations (Start, SetParameter, SetOutputVoice, SetEffectChain) can be grouped and applied as atomic, sample-accurate operations

xAPOs (DSPs)
In-box APOs (Reverb, notch, etc.) Create custom equalizers, compressors, limiters, monitors, phase shifters, attenuators, delays, ..
And they can be cross-platform, like the in-box APOs.

XAudio2: Minimum CPU


XAudio2 requires at least SSE
Available since 1999 for PCs Makes extensive use of it in processing code Your processing code may do the same

Vectorized signal processing

XAudio2 also makes use of SSE2/FTZ/DAZ


Available since 2001 for PCs

XAudio2 makes use of XMA hardware-accelerated decode and VMX instructions for 360

Audio Flow
32k, Mono XMA2
Pitch/SRC + filter Effect1

Source Voices
EffectN

Submix Voices Mastering Voice


32k (5.1) 32k (Mono)
Filter Effect1 EffectN

32k (5.1)
Sample Rate. Conv.

Sample Rate Conversion

24k (Mono) xWMA


Pitch/SRC + filter Effect1 EffectN

32k (5.1)

48k (5.1)

EffectN

Effect1

44k (5.1) XMA2


Pitch/SRC + filter Effect1 EffectN

32k (5.1)

Graph Optimization
Apply FX to many voices at once for the price of one Make use of lower-rate sub-graphs

Filter

Effect1

EffectN

Sample Rate. Conv.

SUBMIX!

Lower rate == fewer samples == less CPU Run expensive global send FX at a lower rate/channels than the final mix

Provides for more detailed control of performance characteristics Allows for smooth crossfades between disparate FX
e.g. Environmental reverb crossfade

Source Voices

Setting up for best performance


XMA2
Pitch + filter

32k (Mono)

Effect1

EffectN

Sample Rate. Conv.

Use XAudio2_VOICE_NOPITCH & _NOSRC when possible


Minimize MaxFrequencyRatio when used

Stopped voices are not touched by the real-time processing thread Voice Pooling
Much faster than repeated allocation/free SetFrequencyRatio may be applied to reuse voices for data of a different sampling rate

Voice Pooling
Create pools of Voices

32k (Mono) XMA2


Pitch + filter

Effect1

EffectN

Sample Rate. Conv.

Each Pool is unique on Source Content (xWMA, XMA, ADPCM) and Channel Count

When you need a new Voice


Identify a lower priority voice in the pool Call Stop(), then FlushSourceBuffers() With February XDK, you no longer have to wait for the next Process() before reusing If needed: Call SetSourceSampleRate()

Remember: Stopped voices are CPU-free

FX Optimization
XAPO_BUFFER_SILENT

32k (Mono) XMA2


Pitch + filter

Effect1

EffectN

Sample Rate. Conv.

Indicates silent data should be assumed Actual memory may be uninitialized

Buffers are 16-byte aligned & interleaved perchannel


Use VMX128 instructions

Use in-place processing


In-place: Input buffer == Output buffer

Use EnableEffect/DisableEffect
More convenient than destroying and recreating the

XAudio2 Memory Pool


All internal XAudio2 allocations pooled
Allows for efficient parameter passing without imposing cumbersome parameter scope requirements Xaudio2 allocates sooner, rather than later

Pool reset when last IXAudio2 instance released


Gives applications control of memory pool lifespan Possible uses include reclaiming memory between levels

Remember this?

Memory is pooled for many things, including SRCs and Pitch Shifting

Compression

32k (Mono) XMA2


Pitch + filter

Effect1

EffectN

Sample Rate. Conv.

Always use compression to minimize disk/memory/cache footprint Reduce XMA/xWMA quality per sound for optimal quality/size tradeoff Seek tables:
Allows caller to skip past unwanted packets, without having to load the data itself.

Compression - Tradeoffs
PCM
Not compressed, so highest fidelity

32k (Mono)
Pitch + filter

XMA2

Effect1

EffectN

Sample Rate. Conv.

ADPCM (Windows Only)


Slight Compression (~4:1, lossy)

XMA (360 Only)


Hardware-accelerated decode (316 concurrent streams) Good compression (~6+:1)

xWMA
Software decode (Mono/Stereo~=.6-1.2% of 360 core) Excellent compression (~20+:1) Good for voices/music, no seamless looping

Streaming

32k (Mono) XMA2


Pitch + filter

Effect1

EffectN

Sample Rate. Conv.

Cycle a circular queue of buffers to submit new data to XAudio2 Submit new data within voices OnBufferEnd callback
Increasing read-ahead before starting the voice decreases chance of glitching, but can increase perceptible latency depending on implementation Consider streaming several buffers into the engine before throttling

XMA2 Block Size should be in increments of 32K to mirror DVD I/O patterns

xWMA Streaming

32k (Mono) XMA2


Pitch + filter

Effect1

EffectN

Sample Rate. Conv.

Each xWMA file contains a list of offsets (DPDS chunk) EachDPDS submit 1st needs a 2nd modified form of this Chunk: 5000 2000 (5000Submit Submit list: 0 3000) 7000 0 0
1000 1 2000 2 3000 3 5000 4 7000 5 12000 6 1000 1 2000 2 3000 3000 12000 4000 (70001 3000) 2 9000 (120003000)

Blocking Calls XAudio2 Thread


The XAudio2 realtime thread can be blocked by: StopEngine and IXAudio2::Release() DestroyVoice()
Thus, the need for voice reuse

XAudio2 callbacks

Check time spent in effect chain

Your code can be blocked by any XAudio2 API call, waiting on internal realtime thread locks.

Debugging
Use the debug versions of XAudio2, X3DAudio, XAPOBase, etc. SetDebugConfiguration may be used to control debug behavior for XAudio2 VolumeMeter xAPO useful for detecting clipping PIX counters available to track CPU, memory, and voice statistics
Similar data available via IXAudio2::GetPerformanceData

Watch for other threads on the core that may be slowing down XAudio2

Audio performance analysis with PIX

A Case Study
Sample Rate Conversion
Mono
Pitch/SRC + filter Effect1 EffectN Filter Reverb
Sample Rate. Conv.

Stere o

Pitch/SRC + filter

Effect1

EffectN

EffectN

Effect1

Quad

Pitch/SRC + filter

Effect1

EffectN

5.1

Pitch/SRC + filter

Effect1

EffectN

PIX

Timing Capture

OnProcessingPassEnd Callback Use callbacks to notify Hardware Thread 5 that it can resume execution

xbPerfView
w/ Sampling Capture

A Case Study
Adding submixes
Mono
Pitch/SRC + filter Effect1 EffectN
Filter Reverb
Sample Rate. Conv.

Sample Rate Conversion

Stere o

Pitch/SRC + filter

Effect1

EffectN

EffectN

Effect1

Quad

Pitch/SRC + filter

Effect1

EffectN

5.1

Pitch/SRC + filter

Effect1

EffectN

xbPerfView
w/ Submixing

A Case Study
SRC & Reverb
Mono
Pitch/SRC + filter Effect1

Change to Mono->5.1 Reverb


Filter Reverb
Sample Rate. Conv.

32k 48k
EffectN

Sample Rate Conversion

Stere o

Pitch/SRC + filter

Effect1

EffectN

32k 48k

EffectN

Effect1

Quad

Pitch/SRC + filter

Effect1

EffectN

5.1

Pitch/SRC + filter

Effect1

EffectN

xbPerfView
Final Numbers
Component MatrixMix Reverb Resampling Total Start CPU% 17.48% 6.37% 14.74% 38.59% Final CPU% 4.25% 4.94% 11.41% 20.60% % Freed 13.23% 1.43% 3.33% 17.99%

Idle

27.95%

48.47%

20.52%

With Processing to Spare

Summary
SUBMIX! Use OnBufferEnd callbacks to stream data Intentionally choose your compression methods Carefully manage your voice interactions Watch for Blocking Calls Pool voices where possible Use EnableEffect/DisableEffect Profile your title to focus your efforts

www.microsoftgamefest.com

2009-2010 Microsoft Corporation. All rights reserved.


This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.