Você está na página 1de 35

The following is intended to outline our general product direction.

It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in
making purchasing decision. The development, release, and timing
of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Andrew
Holdsworth
Director Real World
Performance
A Real-World
Engineering
Approach to
Performance
A Real-World Engineering
Approach to Performance
• Real World Performance Group Role
• Escalated Real Performance Problems
• Work with Pre-Production Software
• Work with Large ISVs
• Work with Leading Edge Products
• In many cases poor engineering decisions
make system performance/availability
impossible
Back To Engineering Basics

• The Speaker initially trained as a Naval Architect


designing ships
• It has been increasing obvious that basic engineering
design principals are being ignored in the
design/construction/rollout phases of system
deployment.
• Lets look at some “heavy” Engineering failures in
History and contrast them with today’s systems
design errors.
• All these examples were Hi-tech of their day !
The RMS Titanic

• The RMS Titanic struck an Iceberg at


11:40pm April 14 1912 and sank 2 Hours 40
Mins Later at 2:20am resulting in the loss of
1523 Lives.
• The Titanic was Marketed as unsinkable
• So What Happened ?
The RMS Titanic
• The RMS titanic was deemed unsinkable
because the ship had been sub divided into
16 water tight compartments by 15
Transverse Bulkheads
The RMS Titanic
• The Front 6 Compartments were damaged
and flooded.
• The Height of the Bulkheads was not
sufficient to prevent the water coming over 6th
Bulkhead and causing the ship to sink.
The RMS Titanic

• In summary the Design errors were


• The Level of Subdivision or height of watertight
bulkhead was wrong
• In sufficient lifeboats to prevent massive loss of
life
• In summary the Management errors
• Over confidence in a concept
• Lack of contingency planning/testing
So What Can We Learn From
the Titanic
• Availability is more than one innovative feature !
• Should you have more answers for availability than just one
feature !
• There can be no replacement for testing and
revalidation of calculations
• The importance of testing and good production metrics
• Is there an independent review process ?
• Backup systems need the same attention to detail and
testing as the primary defense mechanisms.
• Are you procedures for various scenarios in place ?
The Comet Aircraft

• First Jet Airliner in Service in 1952


• Novel design involving engines at wing roots
and new alloys.
The Comet Aircraft
• Then there were 3 crashes resulting in
grounding of the Comet
• Using the wreckage the cause were
determined to be cracks caused by fatigue
and high stress concentrations around a
window.
• Note today's Aero planes do not have
windows with sharp corners.
The Comet Aircraft
The Comet Aircraft

• In summary the design errors were


• Misunderstood failure modes for new types of structures
and materials
• In summary the Management Errors
• Insufficient testing with new concepts and technologies
• To willing to blame non standard parts of the design
actually delayed the correct root cause analysis.
• The results
• Huge amounts lessons learnt
• Effectively killed the British commercial jet aircraft business
So What do we Learn ?

• Learn to find the root cause of problems and


not blame any unique non standard
components.
• How often do I see RAC and CBO blamed when
the problem has nothing to do with either.
• The small details really do matter
• Good execution plans
• Cursor sharing
Tay Rail Bridge
• Built 1878 entirely from experience/human
judgement before any engineering codes
existed
Tay Rail Bridge

• 28th December 1879 The Bridge Collapsed


When a train was crossing the bridge in a
Force 10-11 Gale. Approx 75 Lives were lost.
Tay Rail Bridge

• On Enquiry The bridge design was found to


be inadequate for the wind loads on bridge.
• In summary the design team has massively
underestimated wind loads and the bridge
components were over stressed leading to
catastrophic failures.
• The bridge was rebuilt to a similar design with
upgraded specs and stands today.
So What do we Learn ?

• Sizing the spec of the bridge by judgment rather than


analysis
• How many people size hardware this way ?
• How many people really understand the loads placed
by various transactions on the hard/software
infrastructure ?
• If you do not know the loads(types and sizes), how is
possible to even anticipate the correct type/size of
hardware ?
• Inferior materials can produce a great result if sized
correctly
• So why get hung up about the latest high speed hardware
and focus on balanced scalable designs.
Taccoma Narrows Bridge

• Nov 7, 1940 the first Tacoma Narrows suspension


bridge collapsed due to wind-induced vibrations
Taccoma Narrows Bridge

• In summary the Bridge Designers had


neglected the Aerodynamic loading on the
structure
So What Do We Learn ?

• See what happens when you are completely


out of your depth with new technology
• Learn where the edge of the design envelope
is.
• Determine acceptable CPU, Disk and network
loads.
• Determine acceptable O/S loads, processes,
threads, sockets etc
• Determine acceptable Oracle Loads,
connections, parses, executions, scans etc.
Ronan Point Flats

• May 16th 1968, a tenant in a block of flats lit


her gas cooker. Tenants here was a gas leak
and the resulting explosion blew out the pre-
cast concrete walls. This caused the entire
end of the building to collapse. 4 Tenants
were killed in this incident.
Ronan Point Flats
Ronan Point Flats

• In summary the design was seen as


inadequate to prevent what should have been
a minor issue escalate into major structural
failure.
So what do we Learn ?

• This is a pathological race condition


• The failure of one small component lead to a major
structural failure
• The need to prevent race conditions by resilience in
the design.
• Is you application server/middle tier configured to protect or
destroy your database ?
• How do you remove run away SQL on your system
• How to you prevent run away SQL from ever being
executed.
Hyatt Kansas City

• In 1980 the Hyatt Hotel Opened in Kansas City


• The design include an atrium which included 3 arial
walkways suspended by steel rods to create an
impressive interior to the hotel
• July 17th 1981, between 1500 and 2000 people
inundated the Hotels atrium a walkways for a tea
dance.
• At 7:05 there was loud crack and the 2nd and 4th floor
walkways crashed to the ground killing 114 people
and injuring over 200 others.
• It was the worst structural failure in the history of the
United States
Hyatt Kansas City
Hyatt Kansas City

• In summary the contactor did not build what


what was specified.
• It meant that the attachments and bolts were
overloaded.
So What Do we learn ?

• Validate your design by stress testing !


• Beware of shortcuts or specification changes
and their implications
• Structures/systems are only as strong as the
weakest link.
• This applies to Hardware
• CPU, I/O, Network Bandwidths
• Software
• Backups systems, single threaded processes etc.
Summary of
Design/Management Flaws
• Lack of Knowledge/Ignorance of New
Technologies
• Lack of Testing
• Misunderstanding of actual Loads
• Weak links in the designs
• Did not Build what was specified
• Did not Consider Pathological Race
Conditions
A Note on Testing
QUESTIONS
ANSWERS

Você também pode gostar