Você está na página 1de 26

Firaxis LORE

And other uses of D3D11

Low Overhead Rendering


Engine
Or, how I learned
to Render 15,000+
batches at 60 FPS

Overview
Civ 5 is a big game, covers 6000 years of
history
The entire map can be populated/ polluted
with all sorts of things the user creates
Need to be able to render a huge amount
of possibly disparate types

Early Goals
Build brand new Engine for Civilization V
Like the game, we wanted graphics engine
to be able to stand the test of time
Decided while D3D11 was in Alpha to build
the engine natively for D3D11 architecture,
and map backwards to DX9

Step 1: Cutting the


Shaders startoverhead
in Firaxis
down
Shading Language (FSL)
FSL Files
superset of HLSL
Compiles into CPP and Header
file all shader constants are
mapped to structs, grouped into
packages where all packages
have same bindings
Model Code is templated FSL
generated header is then bound
with template code
Result is tiny amount of code
that fills out required shading,

CPP / H

Template Code

Compile Time Glue Code

Step 2: Abstracting the


Rendering

Still have to Support DX9, might have to


support consoles in future
Might have to write a driver
Our solution: Make DX9 look like DX11
Started with as a restricted design as
possible, and expanded as we needed to

Packetized Rendering

Stateless rendering, much simpler then D3D


Command based all rendering is performed by self
contained command
A command set may contain a list of surfaces to render, each
with shader constant payload
A surface is an immutable bundle of an IB, VB, textures,
shader def, etc
All state is bundled into a packages Alpha State, Z State, etc.
Commands reference one of these state packages
Entire Frame is queued up
Minimal per frame allocation

Only 5 Types of commands


COMMAND_RENDER_BATCHES
A List of surfaces to render into 1 or more
rendertargets, with alpha and Zstate bundles
Surfaces have IB, VB, sampler and texture bundles.
All required state is specified

COMMAND_GENERATE_MIPS
COMMAND_RESOLVE_RENDERTEXTURE
COMMAND_COPY_RENDERTEXTURE
COMMAND_COPY_RESOURCE

Packetized Rendering
Rendering System

Rendering Engine

D3D/Driver

Step 3: Threading
Job
Job
Job
Job
Job
Job
Job
Job
Job

Job
Job
Manager

Job
Job
Job

Rendering System

Why do we queue up entire


Frame?

Would seem like additional overhead, but perf analysis


shows it is a net win
Internal command setup is super-cheap, just some mem
copies
Engine cache coherency is vastly better
D3D driver cache coherency is much better with one giant
dump
Very low % of total CPU time spent in submission
Allows us to filter redundant D3D calls. Call overhead adds
up
Fast even in DX9

Implementation advantages
Once stateless concept grasped, code
maintaince easy
Next to no state-leaking (flickering alpha,
textures etc)
Because rendering is packetized, individual jobs
need little or no communication between each
other

NO THREADING BUGS

Threaded D3D11
submission

Top issues:
Generally High driver overhead for batch
submission
But: D3D11 has multithreaded submission
Command Streams not necessarily map 1:1 to
CommandLists
Civilization V can change how it submits via
settings the config files

Step 4: Gloating over results


Wildly surpassed commonly held beliefs on
# of batches possible, especially with
Test
Driver with native CL
Driver without CL
threading
support
support
Units

1686*

931

Landmarks

1152*

673

Lategame

3616*

2052

*Believed to be GPU limited

Conclusions
High throughput rendering is possible: IF:
care taken to reduce application overhead
Job based, pay-load based rendering
Redundant state and calls filtered
Use D3D11 command lists
Engine can peg 12 threads at 97% (sans
driver)

D3D11 Features:
Tessellation

Major addition
to D3D11 API

[Screenshot]

Terrain
Civ5 contains one of the most complex
terrain systems ever made
Complete procedural process
Use GPU to raytrace and anti-alias shadows
Caching system to deal with cases where
terrain is too big

Tessellation
Terrain very high detail, roughly
64x64 heightmap data per hex
Triangle count, when zoomed out,
can be in the millions
Used Tessellation as a drop-in

Tessellation Cont
Simple Bicupic Beta Spline patches
Adjusted global tessellation as camera moved
in and out
A strict performance increase : 10%-40% faster,
on both AMD and Nvidia hardware.
More Adapative techinques would work even
better, but didnt have time to implement them

Leaders

Leader Rendering
Largely done with DX10.1 rendering tech
New Variable bit rate compression technology
implemented for D3D11.
2.5 GBs of texture data reduced to 150mbs,
can be decompressed on the GPU
Details forthcoming, research is in publication
submission process extensive use of UAVs

Future Stuff, NO AO

Future Stuff (CS), AO

Q&A

Você também pode gostar