Você está na página 1de 29

Data Centre Tier Classifications

Developed from The Uptime Institute & TIA-942 Single Line Diagrams
Benchmarking Data Centre ‘Quality’

 There has long been the need to be able to ‘measure’ the


‘quality’ of a critical facility
 The ‘quality’ is usually expressed as Availability of the IT
functionality of the facility in terms of ‘number-of-nines’
- e.g. Three Nines = 99.9% Availability
- Note that several engineered and human systems have to
contribute to the whole facility and its IT functionality, including the
IT hardware and software itself
 At this top-level it can be applied over a short timeframe
- e.g. 99.9% over 1 year = one failure event lasting 8.76 hours
- It should never be assumed to cover multiple failure events

For further treatment of Availability and MTBF/MTTR see the Appendix A slides
IT functionality – Three-Nines?

 Your mission critical hardware can only deliver its


maximum potential if the whole facility ‘works’
- IT Hardware & Connectivity
- Power & Cooling
- Fire detection, alarm and suppression
- Maintenance and emergency intervention
- Security, internal and external, physical and software attack
- Human Error, EPO, Systems Training & Facility Management
- External disasters – earthquake, hurricane, flood, fire .. air-crash
- 20+ systems are critical to the success of the IT functionality
99.99%20 = 99.8% = 17.5h downtime per year
99.999%20 = 99.98% = 1.75h downtime per year
The Uptime Institute

 The Uptime Institute [1] has, for more than 10 years, sponsored
research and practical studies into data centre design, operation and
resultant resilience and developed a Tier Classification to describe and
differentiate facilities from an availability standpoint
 A White Paper [2] from the Institute (authors of which include the
originator of dual power supplies in IT equipment and the Tier system
itself) is the basis of this review of the facility and operational concepts
 The Uptime Institute is a commercial organisation and the guidelines it
created are not in the form of a technical standard. However much of
the principles and details have been incorporated in TIA-942 (see next
slide)

[1] The Uptime Institute, Building 100, 2904 Rodeo Park Drive East, Santa Fe, NM 87505, USA
[2] Title: Industry Standard Tier Classifications Define Site Infrastructure Performance, Turner, Seader &
Brill, © 2001-2005 The Uptime Institute, Inc
American ANSI/TIA Standard

 ANSI/TIA-942-2005
- Telecommunications Infrastructure Standard for Data Centers
 Telecommunications Industry Association
- Standards and Technology Dept, 2500 Wilson Boulevard, Arlington, VA 22201, USA
- www.tiaonline.org/standards/search_n_order.cfm

 Follows the same Tier I-IV format and draws heavily on The Uptime
Institute publications but extends the detail, especially in connectivity
 Entirely a USA centric ANSI specification, so can only be used as a
guide in EN territories
 Specifically for telecom related data-centre environments and
<2700W/m2
Tier Classification – Tier I to IV

 The classification system takes into account that at least 20


major M&E systems contribute to the overall IT availability
(such as fire alarms, EPO etc) and that Tier IV represents
99.995% site availability (over a five year period) with the
critical systems loaded to a maximum of 90%
 Each and every system has to meet this table:

99.9999%20 = 99.998% = 1h downtime every 5 years


20 discrete systems, each of Six-Nines Availability!
Tier IV – the ultimate in resilience?

 Fault Tolerant – defined as ‘a site that that can sustain at least one
unplanned worst-case infrastructure failure with no critical load impact’
 Concurrently Maintainable – defined as ‘a site that is able to perform
planned maintenance activity without shutting down the critical load’.
Note that it is acceptable that the fault tolerance level will be reduced
during maintenance or after the first fault
 Tier IV Classification only applies to dual power supply loads where
complete functionality is obtained with either power supply fed and
where the two inputs, in normal operation, share the power demand, as
defined by The Uptime Institute’s own specification [1]
 A technical and philosophical argument reigns about Static Transfer
Switches for single-cord loads in Tier IV designs
- Is that Tier III.5 or IV.5?

[1] Title: Fault Tolerant Power Compliance Specifications, v2.0, see www.uptimeinstitute.org
Electrical Single Line Diagrams

 There is no compunction on the designer to strictly follow


the designs derived from the Tier Classifications. In many
cases compromises will have to be made
- The benchmarking function of the Tier system then provides a
useful yardstick to measure against
 In the rest of this presentation we only refer to the Electrical
systems, just one of the 20+ engineered systems that are
required to gain a Classification rating
 A particular facility’s Tier rating is the lowest of all its
system Tier Classifications
- Tier IV power +Tier III ‘all other’ + Tier II cooling = Tier II Facility
Tier I – most critical power systems

A basic single-bus critical power system suitable


for single-corded IT loads

There is no specific redundancy called for,


although it can be argued that the standby
generator set is redundant for the grid supply

Although only ‘N’ is specified, the designer


should avoid multiple components in power-
parallel configuration as it drastically reduces
the potential Availability, i.e. N=1 is best

Maintenance generally involves supplying the


load with non-UPS power and an annual load
shut-down

Availability of Power at load typically 99.95%*


*Over 5 years operation
Tier II – increasing levels of redundancy

A single-bus power system suitable for both single


and dual-corded loads

Redundancy is called for in the standby generator


installation to reduce the chance of ‘failure-to-start’,
but not the mains supply

N+1 is specified for the UPS so a high degree of


maintenance can be concurrent

Load bank connections are mandatory

Dual-corded loads (expected minority) should be


fed by separate A+B PDU’s whilst only the single-
corded loads should be fed via STS’s (performing a
maintenance function rather than Availability
enhancement

Note the option of a ‘B’ UPS, practical when dual-


cord loads are few

Availability at load typically 99.98%*


*Over 5 years operation
Tier III – more redundancy + segregation

A dual-bus power system suitable for both single


and dual-corded loads

Redundancy is called for in the mains supply and


the standby generator sets. These must be
compartmentalised for lower common mode failure,
fire etc

N+1 is specified for the UPS so a high degree of


maintenance can be concurrent

Dual-corded loads should be fed by separate A+B


PDU’s whilst only the single-corded loads should
be fed via STS’s (performing a maintenance
function rather than Availability enhancement)

Note the ability of a rapid upgrade to a ‘B’ UPS and


Tier IV (but don’t forget the other systems)

An important ‘extra’ here is the Load Bus


Synchronisation. When the STS’s can have UPS
power on one input and the generator supply on
the other it is essential (for the load) to have the
two supplies within 30°

Availability of Power at the load typically 99.99%


Tier IV – the Uptime purist’s configuration

For a further dialogue on this circuit-breaker see Appendix B


Tier IV – The Uptime Institute original

 Complete physical segregation of the two power supplies from the grid
to the dual-corded load – a true Dual-Bus system
- 2x(N+1) in every system, maximum 90% load
- Concurrent maintenance possible without load shut down and without
losing N+1 redundancy
- Needs two grid sub-stations (they will be on the same MV ring) and
diverse cable routes into the site
- Two mechanical load power switchboards in dual-bus
- Note! Many engineers question having N+1 on both A & B buses
 ONLY dual-corded loads
- No STS’s, no common point of failure except the load
- Simple to operate, hence reliable
 With care in design, installation, operation and maintenance, 99.999%
power Availability possible
Tier IV + STS’s + LBS = Tier how many?
Tier III.5 or IV.5? That is the question!

 Not all loads are dual-corded


- Load transparent switching via STS’s is a great maintenance tool
 Classic Tier IV but with STS’s for single-corded loads
- Absolutely needs the addition of Load Bus Synchronisation
 Insist on three PDU’s in the data-room
- ‘A’ fed from UPS-A for dual-cord loads
- ‘B’ fed from UPS-B for (the other feed of) dual-cord loads
- ‘A/B’ with STS fed from UPS-A & B for single-cord loads
- Feeding dual-corded loads via STS’s reduces Availability to that of
the STS itself and negates the principle of dual-bus
Tier IV + STS’s + detail from TIA-942
Unique to TIA-942 - in the detail

 Tier IV has to have impedance based battery monitoring systems


 TIA-942 says that when a system (A or B) is shut down for routine
maintenance then the maintenance bypass should be energised by a
UPS supply
- Not to rely on the dual-corded loads to operate with one feed dead?
- TIA-942, Page 123, RH column ‘UPS Maintenance Bypass Arrangement’
 A third UPS (C) system? Uneconomic, space hungry, utilisation 0.05%
and a poor return on investment
- Chloride solution (red-line on diagram)
• Cross-feed the output of each UPS system to the maintenance bypass of the
alternate system
• Manual control, padlocked and interlocked isolators, break-before-make, no
hot-transfer, no point of common coupling in an auto-mode, sync-check
blocking relays across breakers = safe
Secure Power Always

Ian F Bitterlin
International Sales Director

Contact details
Tel: +44 (0) 7717 467 579
E mail: ian.bitterlin@chloridepower.com
Web: www.chloridepower.com
Appendix A – Understanding ‘Nines’

Your mission critical


Power hardware can only
Availability
deliver its maximum potential if it has
continuous voltage available at the input
terminals and the rest of the facility ‘works’.

Power Availability = 99.9..?..9%


Cooling Availability = 99.9..?..9%
Security & Fire = 99.9..?..9%
Human Error, Training & Management = 99.9..?..9%
Natural & Unnatural Disaster = 99.9..?..9%
Availability: How good is 99.9%?

 44 minutes of unsafe drinking water per month


 3 crash-landings per week at Heathrow
 3,000 letters lost by The Post Office, every hour
 2,000 surgical mistakes in the NHS, every week
 9,000 incorrect banking debits per hour
 32,000 missed heartbeats, per person, per year
- Not all in one go, please.

UK numbers
“Availability”
The most abused piece of “data” in the industry

 The term, and concept of, “Availability” is often misused to


describe the “Quality” of a data center facility power supply
 It is generally assumed (but rarely achieved) that this also
applies to the associated cooling system
 Expressed in terms of the percentage uptime, better known
as a ‘number-of-nines’
- 99.999% = “Five-Nines”
 The calculation is based upon:
- MTBF = the Mean Time Between Failure (hours)
- MDT = the Mean Down Time (hours)
Simple Arithmetic

MTBF
 Availability = x 100%
MTBF + MDT

 Example
- MTBF = 25,000 hours
- MDT = 1 hour
- Availability = (25,000 ÷ 25,001) x100 = 99.996%
- Note that a 100% “reliable” system that needs annual
maintenance cannot have A=100%!
Availability Nines: A measure of “quality”?

MTBF 10 years 1 month 1 day


MDT 1 hour 30 seconds 1 second

Availability 99.99885% 99.99885% 99.99884%

Four-Nines = OK? But do you really want a failure every day?

In reality its worse. Assuming the system recovery time is 6 hours:

MDT 6+1 hours 6h+30s 6h+1s


Availability 99.992% 99.17% 74.99%
20ms power events in 12 months?
How many computer crashes will you accept?

Availability “Nines” MDT 20ms failures

99.0% 2 87.6 hrs 15,768,000


99.9% 3 8.76 hrs 1,576,800
99.99% 4 53 min 157,680
99.999% 5 5.3 min 15,768
99.9999% 6 31.5 sec 1,577
99.99999% 7 3.15 sec 158
99.999999% 8 315 ms 15
99.9999999% 9 31.5ms 2

The “Nines” cannot be applied to power over a single year!


Better to use MTBF/MDT for ‘one’ failure event
There are three common abuses

 99.9% - sounds good but needs to be looked at carefully


 99.999..9% (even 100%) with no obvious technical
foundation or caveats about time-frame, events or
maintenance for the claim
 Hardly anyone ever mentions if there is increased risk
during essential maintenance – or if maintenance can be
carried out at all without a load shutdown
- The single most important factor in achieving high MTBF figures
in the field is correct, expert and timely maintenance
“High-Nines” Availability?
Beware systems that cannot be maintained!

 A UPS system, cooling system and distribution scheme


requires intrusive maintenance every year for four hours
- therefore the maximum Availability of “Single-Bus”
space is 99.95%
 Can you afford the risk of doing M&E at the same time?
- If not, it could be two shutdowns and <99.9%
 Only Dual-Bus systems can be applied and achieve
“high nines” with Maintenance
 Most critical systems installed today have a “weaker”
cooling design than electrical
- Even though the mechanical system has a much wider fault
tolerance the designer should ensure a balance is achieved
between the electrical and the mechanical designs, including
the “maintenance without shutdown” scenario
Appendix B – Load isolation breaker and N+?

 To be able to run the load via the bypass and test the UPS
system as a parallel group is a very attractive and useful
operational/maintenance feature
- The load isolation breaker enables that function
 Generally that means that between the PDU and the output
bus of the UPS system there are at least two MCCB’s or
ACB’s in series
- Typical MTBF published at 250,000h (28.5y) with maintenance
 This negates the advantage of applying any reliability
enhancement strategy using N+(more than 1)
Distribution limits the UPS Availability

Utility/Generator Feed

Input Switchboard

Maintenance Bypass

Output Switchboard Typically 250,000h MTBF each


Two in series = 125,000h MTBF

Critical Load Bus

• N+X UPS does not improve things


• Bus-voltage Availability depends upon these two switches
• Single-bus maximum MTBF = 125,000h (14 years)
• e.g. With an 8h MDT that is = 99.99%
N+1 redundant UPS architecture: N?

1+1 2+1 3+1

100% Redundancy 50% Redundancy 25% Redundancy


600kVA Load
2x 600kVA modules 3x 300kVA modules 4x 200kVA modules
R = 10* R=9 R=8
Day One only Day One to Two Day One to Three
Highest UPS CapEx Scope for load shrink High scope for load shrink
High risk of partial load Medium risk of partial load Low risk of partial load
High load step Medium load step Low load step
1200kVA of batteries 900kVA of batteries 800kVA of batteries
25% space saving 33% space saving
Lower battery CapEx etc

*Based on Reliability (R) of a single module = 1

Você também pode gostar