Chloride - Data Centre Tier Classifications

Data Centre Tier Classifications
Developed from The Uptime Institute & TIA-942 Single Line Diagrams
Benchmarking Data Centre ‘Quality’
There has long been the need to be able to ‘measure’ the

‘quality’ of a critical facility
The ‘quality’ is usually expressed as Availability of the IT
functionality of the facility in terms of ‘number-of-nines’
- e.g. Three Nines = 99.9% Availability
- Note that several engineered and human systems have to
contribute to the whole facility and its IT functionality, including the
IT hardware and software itself
At this top-level it can be applied over a short timeframe
- e.g. 99.9% over 1 year = one failure event lasting 8.76 hours
- It should never be assumed to cover multiple failure events
For further treatment of Availability and MTBF/MTTR see the Appendix A slides
IT functionality – Three-Nines?
Your mission critical hardware can only deliver its

maximum potential if the whole facility ‘works’
- IT Hardware & Connectivity
- Power & Cooling
- Fire detection, alarm and suppression
- Maintenance and emergency intervention
- Security, internal and external, physical and software attack
- Human Error, EPO, Systems Training & Facility Management
- External disasters – earthquake, hurricane, flood, fire .. air-crash
- 20+ systems are critical to the success of the IT functionality
99.99%20 = 99.8% = 17.5h downtime per year
99.999%20 = 99.98% = 1.75h downtime per year
The Uptime Institute
The Uptime Institute [1] has, for more than 10 years, sponsored
research and practical studies into data centre design, operation and
resultant resilience and developed a Tier Classification to describe and
differentiate facilities from an availability standpoint
A White Paper [2] from the Institute (authors of which include the
originator of dual power supplies in IT equipment and the Tier system
itself) is the basis of this review of the facility and operational concepts
The Uptime Institute is a commercial organisation and the guidelines it
created are not in the form of a technical standard. However much of
the principles and details have been incorporated in TIA-942 (see next
slide)
[1] The Uptime Institute, Building 100, 2904 Rodeo Park Drive East, Santa Fe, NM 87505, USA
[2] Title: Industry Standard Tier Classifications Define Site Infrastructure Performance, Turner, Seader &
Brill, © 2001-2005 The Uptime Institute, Inc
American ANSI/TIA Standard
ANSI/TIA-942-2005
- Telecommunications Infrastructure Standard for Data Centers
Telecommunications Industry Association
- Standards and Technology Dept, 2500 Wilson Boulevard, Arlington, VA 22201, USA
- www.tiaonline.org/standards/search_n_order.cfm
Follows the same Tier I-IV format and draws heavily on The Uptime
Institute publications but extends the detail, especially in connectivity
Entirely a USA centric ANSI specification, so can only be used as a
guide in EN territories
Specifically for telecom related data-centre environments and
<2700W/m2
Tier Classification – Tier I to IV
The classification system takes into account that at least 20

major M&E systems contribute to the overall IT availability
(such as fire alarms, EPO etc) and that Tier IV represents
99.995% site availability (over a five year period) with the
critical systems loaded to a maximum of 90%
Each and every system has to meet this table:
99.9999%20 = 99.998% = 1h downtime every 5 years

20 discrete systems, each of Six-Nines Availability!
Tier IV – the ultimate in resilience?
Fault Tolerant – defined as ‘a site that that can sustain at least one
unplanned worst-case infrastructure failure with no critical load impact’
Concurrently Maintainable – defined as ‘a site that is able to perform
planned maintenance activity without shutting down the critical load’.
Note that it is acceptable that the fault tolerance level will be reduced
during maintenance or after the first fault
Tier IV Classification only applies to dual power supply loads where
complete functionality is obtained with either power supply fed and
where the two inputs, in normal operation, share the power demand, as
defined by The Uptime Institute’s own specification [1]
A technical and philosophical argument reigns about Static Transfer
Switches for single-cord loads in Tier IV designs
- Is that Tier III.5 or IV.5?
[1] Title: Fault Tolerant Power Compliance Specifications, v2.0, see www.uptimeinstitute.org
Electrical Single Line Diagrams
There is no compunction on the designer to strictly follow

the designs derived from the Tier Classifications. In many
cases compromises will have to be made
- The benchmarking function of the Tier system then provides a
useful yardstick to measure against
In the rest of this presentation we only refer to the Electrical
systems, just one of the 20+ engineered systems that are
required to gain a Classification rating
A particular facility’s Tier rating is the lowest of all its
system Tier Classifications
- Tier IV power +Tier III ‘all other’ + Tier II cooling = Tier II Facility
Tier I – most critical power systems
A basic single-bus critical power system suitable

for single-corded IT loads
There is no specific redundancy called for,

although it can be argued that the standby
generator set is redundant for the grid supply
Although only ‘N’ is specified, the designer

should avoid multiple components in power-
parallel configuration as it drastically reduces
the potential Availability, i.e. N=1 is best
Maintenance generally involves supplying the

load with non-UPS power and an annual load
shut-down
Availability of Power at load typically 99.95%*

*Over 5 years operation
Tier II – increasing levels of redundancy
A single-bus power system suitable for both single

and dual-corded loads
Redundancy is called for in the standby generator

installation to reduce the chance of ‘failure-to-start’,
but not the mains supply
N+1 is specified for the UPS so a high degree of

maintenance can be concurrent
Load bank connections are mandatory
Dual-corded loads (expected minority) should be

fed by separate A+B PDU’s whilst only the single-
corded loads should be fed via STS’s (performing a
maintenance function rather than Availability
enhancement
Note the option of a ‘B’ UPS, practical when dual-

cord loads are few
Availability at load typically 99.98%*

*Over 5 years operation
Tier III – more redundancy + segregation
A dual-bus power system suitable for both single

and dual-corded loads
Redundancy is called for in the mains supply and

the standby generator sets. These must be
compartmentalised for lower common mode failure,
fire etc
N+1 is specified for the UPS so a high degree of

maintenance can be concurrent
Dual-corded loads should be fed by separate A+B

PDU’s whilst only the single-corded loads should
be fed via STS’s (performing a maintenance
function rather than Availability enhancement)
Note the ability of a rapid upgrade to a ‘B’ UPS and

Tier IV (but don’t forget the other systems)
An important ‘extra’ here is the Load Bus

Synchronisation. When the STS’s can have UPS
power on one input and the generator supply on
the other it is essential (for the load) to have the
two supplies within 30°
Availability of Power at the load typically 99.99%

Tier IV – the Uptime purist’s configuration
For a further dialogue on this circuit-breaker see Appendix B

Tier IV – The Uptime Institute original
Complete physical segregation of the two power supplies from the grid
to the dual-corded load – a true Dual-Bus system
- 2x(N+1) in every system, maximum 90% load
- Concurrent maintenance possible without load shut down and without
losing N+1 redundancy
- Needs two grid sub-stations (they will be on the same MV ring) and
diverse cable routes into the site
- Two mechanical load power switchboards in dual-bus
- Note! Many engineers question having N+1 on both A & B buses
ONLY dual-corded loads
- No STS’s, no common point of failure except the load
- Simple to operate, hence reliable
With care in design, installation, operation and maintenance, 99.999%
power Availability possible
Tier IV + STS’s + LBS = Tier how many?
Tier III.5 or IV.5? That is the question!
Not all loads are dual-corded

- Load transparent switching via STS’s is a great maintenance tool
Classic Tier IV but with STS’s for single-corded loads
- Absolutely needs the addition of Load Bus Synchronisation
Insist on three PDU’s in the data-room
- ‘A’ fed from UPS-A for dual-cord loads
- ‘B’ fed from UPS-B for (the other feed of) dual-cord loads
- ‘A/B’ with STS fed from UPS-A & B for single-cord loads
- Feeding dual-corded loads via STS’s reduces Availability to that of
the STS itself and negates the principle of dual-bus
Tier IV + STS’s + detail from TIA-942
Unique to TIA-942 - in the detail
Tier IV has to have impedance based battery monitoring systems

TIA-942 says that when a system (A or B) is shut down for routine
maintenance then the maintenance bypass should be energised by a
UPS supply
- Not to rely on the dual-corded loads to operate with one feed dead?
- TIA-942, Page 123, RH column ‘UPS Maintenance Bypass Arrangement’
A third UPS (C) system? Uneconomic, space hungry, utilisation 0.05%
and a poor return on investment
- Chloride solution (red-line on diagram)
• Cross-feed the output of each UPS system to the maintenance bypass of the
alternate system
• Manual control, padlocked and interlocked isolators, break-before-make, no
hot-transfer, no point of common coupling in an auto-mode, sync-check
blocking relays across breakers = safe
Secure Power Always
Ian F Bitterlin
International Sales Director
Contact details
Tel: +44 (0) 7717 467 579
E mail: ian.bitterlin@chloridepower.com
Web: www.chloridepower.com
Appendix A – Understanding ‘Nines’
Your mission critical

Power hardware can only
Availability
deliver its maximum potential if it has
continuous voltage available at the input
terminals and the rest of the facility ‘works’.
Power Availability = 99.9..?..9%

Cooling Availability = 99.9..?..9%
Security & Fire = 99.9..?..9%
Human Error, Training & Management = 99.9..?..9%
Natural & Unnatural Disaster = 99.9..?..9%
Availability: How good is 99.9%?
44 minutes of unsafe drinking water per month

3 crash-landings per week at Heathrow
3,000 letters lost by The Post Office, every hour
2,000 surgical mistakes in the NHS, every week
9,000 incorrect banking debits per hour
32,000 missed heartbeats, per person, per year
- Not all in one go, please.
UK numbers
“Availability”
The most abused piece of “data” in the industry
The term, and concept of, “Availability” is often misused to

describe the “Quality” of a data center facility power supply
It is generally assumed (but rarely achieved) that this also
applies to the associated cooling system
Expressed in terms of the percentage uptime, better known
as a ‘number-of-nines’
- 99.999% = “Five-Nines”
The calculation is based upon:
- MTBF = the Mean Time Between Failure (hours)
- MDT = the Mean Down Time (hours)
Simple Arithmetic
MTBF
Availability = x 100%
MTBF + MDT
Example
- MTBF = 25,000 hours
- MDT = 1 hour
- Availability = (25,000 ÷ 25,001) x100 = 99.996%
- Note that a 100% “reliable” system that needs annual
maintenance cannot have A=100%!
Availability Nines: A measure of “quality”?
MTBF 10 years 1 month 1 day

MDT 1 hour 30 seconds 1 second
Availability 99.99885% 99.99885% 99.99884%
Four-Nines = OK? But do you really want a failure every day?
In reality its worse. Assuming the system recovery time is 6 hours:
MDT 6+1 hours 6h+30s 6h+1s

Availability 99.992% 99.17% 74.99%
20ms power events in 12 months?
How many computer crashes will you accept?
Availability “Nines” MDT 20ms failures
99.0% 2 87.6 hrs 15,768,000

99.9% 3 8.76 hrs 1,576,800
99.99% 4 53 min 157,680
99.999% 5 5.3 min 15,768
99.9999% 6 31.5 sec 1,577
99.99999% 7 3.15 sec 158
99.999999% 8 315 ms 15
99.9999999% 9 31.5ms 2
The “Nines” cannot be applied to power over a single year!

Better to use MTBF/MDT for ‘one’ failure event
There are three common abuses
99.9% - sounds good but needs to be looked at carefully

99.999..9% (even 100%) with no obvious technical
foundation or caveats about time-frame, events or
maintenance for the claim
Hardly anyone ever mentions if there is increased risk
during essential maintenance – or if maintenance can be
carried out at all without a load shutdown
- The single most important factor in achieving high MTBF figures
in the field is correct, expert and timely maintenance
“High-Nines” Availability?
Beware systems that cannot be maintained!
A UPS system, cooling system and distribution scheme

requires intrusive maintenance every year for four hours
- therefore the maximum Availability of “Single-Bus”
space is 99.95%
Can you afford the risk of doing M&E at the same time?
- If not, it could be two shutdowns and <99.9%
Only Dual-Bus systems can be applied and achieve
“high nines” with Maintenance
Most critical systems installed today have a “weaker”
cooling design than electrical
- Even though the mechanical system has a much wider fault
tolerance the designer should ensure a balance is achieved
between the electrical and the mechanical designs, including
the “maintenance without shutdown” scenario
Appendix B – Load isolation breaker and N+?
To be able to run the load via the bypass and test the UPS
system as a parallel group is a very attractive and useful
operational/maintenance feature
- The load isolation breaker enables that function
Generally that means that between the PDU and the output
bus of the UPS system there are at least two MCCB’s or
ACB’s in series
- Typical MTBF published at 250,000h (28.5y) with maintenance
This negates the advantage of applying any reliability
enhancement strategy using N+(more than 1)
Distribution limits the UPS Availability
Utility/Generator Feed
Input Switchboard
Maintenance Bypass
Output Switchboard Typically 250,000h MTBF each

Two in series = 125,000h MTBF
Critical Load Bus
• N+X UPS does not improve things

• Bus-voltage Availability depends upon these two switches
• Single-bus maximum MTBF = 125,000h (14 years)
• e.g. With an 8h MDT that is = 99.99%
N+1 redundant UPS architecture: N?
1+1 2+1 3+1
100% Redundancy 50% Redundancy 25% Redundancy

600kVA Load
2x 600kVA modules 3x 300kVA modules 4x 200kVA modules
R = 10* R=9 R=8
Day One only Day One to Two Day One to Three
Highest UPS CapEx Scope for load shrink High scope for load shrink
High risk of partial load Medium risk of partial load Low risk of partial load
High load step Medium load step Low load step
1200kVA of batteries 900kVA of batteries 800kVA of batteries
25% space saving 33% space saving
Lower battery CapEx etc
*Based on Reliability (R) of a single module = 1

Chloride - Data Centre Tier Classifications

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Chloride - Data Centre Tier Classifications

Enviado por

Direitos autorais:

Formatos disponíveis

Data Centre Tier Classifications

There has long been the need to be able to ‘measure’ the

Your mission critical hardware can only deliver its

The classification system takes into account that at least 20

99.9999%20 = 99.998% = 1h downtime every 5 years

There is no compunction on the designer to strictly follow

A basic single-bus critical power system suitable

There is no specific redundancy called for,

Although only ‘N’ is specified, the designer

Maintenance generally involves supplying the

Availability of Power at load typically 99.95%*

A single-bus power system suitable for both single

Redundancy is called for in the standby generator

N+1 is specified for the UPS so a high degree of

Load bank connections are mandatory

Dual-corded loads (expected minority) should be

Note the option of a ‘B’ UPS, practical when dual-

Availability at load typically 99.98%*

A dual-bus power system suitable for both single

Redundancy is called for in the mains supply and

N+1 is specified for the UPS so a high degree of

Dual-corded loads should be fed by separate A+B

Note the ability of a rapid upgrade to a ‘B’ UPS and

An important ‘extra’ here is the Load Bus

Availability of Power at the load typically 99.99%

For a further dialogue on this circuit-breaker see Appendix B

Not all loads are dual-corded

Tier IV has to have impedance based battery monitoring systems

Your mission critical

Power Availability = 99.9..?..9%

44 minutes of unsafe drinking water per month

The term, and concept of, “Availability” is often misused to

MTBF 10 years 1 month 1 day

Availability 99.99885% 99.99885% 99.99884%

Four-Nines = OK? But do you really want a failure every day?

In reality its worse. Assuming the system recovery time is 6 hours:

MDT 6+1 hours 6h+30s 6h+1s

Availability “Nines” MDT 20ms failures

99.0% 2 87.6 hrs 15,768,000

The “Nines” cannot be applied to power over a single year!

99.9% - sounds good but needs to be looked at carefully

A UPS system, cooling system and distribution scheme

Output Switchboard Typically 250,000h MTBF each

Critical Load Bus

• N+X UPS does not improve things

1+1 2+1 3+1

100% Redundancy 50% Redundancy 25% Redundancy

*Based on Reliability (R) of a single module = 1

Você também pode gostar