Você está na página 1de 37

Discussing Foreign Keys and their usage in the Database

Marcel Kratochvil – August 2009 (Version 1.1) : http://foreignkeys.blogspot.com/

You know you are treading on hallowed turf when you dare to raise the specter that foreign keys aren't that
great. Someone then in the room turns around and points a pretend hand gun at you and pulls the trigger.
So core is the entrenched thinking about foreign keys that you wonder if there is any hope in this industry
for changing behaviors in how we tune and manage the database. The perception is that foreign keys
equate to data integrity, and this is now so ingrained in thinking patterns that merely questioning this
concept raises defensiveness.

My goal is to show that by focusing on the misuse of foreign keys and what they mean,
actually exposes fundamental flaws in the methodologies IT is using today regarding
tuning, data management and application development. I will also show that just by
looking at foreign keys our whole world view can change and be challenged, revealing
a new one, containing concepts that will challenge your thinking. I will be demonizing
foreign keys, but merely doing this to show that if a core, unquestioning tenant of the
database world can be exposed, then everything we cherish and take for granted needs
to be reviewed.

The paper is broken down into numerous discussion points, covering the issues from different angles. Some
of the points raised are addressed from different viewpoints in each discussion. The aim is to show why the
world view covering foreign keys and the relational model needs to change. I will discuss why foreign keys
represent a thinking strategy that typify what is holding back the IT industry from establishing efficient
practices. I will aim to show how this relates to core issues found in database tuning. In the appendix I
cover two scenarios showing why we need to challenge existing world views.

Discussion 1: The definition somewhat abstract in comparison with the way in which
the data is physically stored ”2
When I raised the notion at a recent
conference that foreign keys might not The logical or conceptual model is separate from the
be needed, the statement that came physical model.
back was “so you don't want data
integrity then?”. So ingrained is this Now lets cover some of the varying definitions for
notion of having foreign keys equating what data integrity or database integrity is:
to having data integrity that it forms
the core for a lot of the discussion points raised. iii. Database Integrity “Refers to the validity and
consistency of stored data. Integrity is usually expressed in
Its therefore worthwhile looking at some of the terms of constraints” 3
definitions:
iv. “Integrity refers to the accuracy and validity of the data
i. Relational Model is based on first-order predicate ... Integrity involves ensuring that the things they are
calculus. “A major aim of relational database design is to trying to do are correct”4
group attributes into relations to minimize data
redundancy and thereby reduce the file storage space vi. So what then is data integrity? “The standard
required by the implemented base relations” 1 features to ensure data integrity are (1) data validation (2)
automatic computations (3) verification of totals (4)
The relational model stated as thus does not include control of user access (5) transaction integrity and (6)
data semantics or whether the meaning behind the backup and recovery”5
data is correct. Its also important to highlight at this
point “reduce the file storage space”, this will 2 Database Systems. C.J Date. 4th Edition.
become important later. Chapter 2.
3 Database Systems. Thomas Connolly. 3rd Edition.
ii. “The conceptual view is a representation of the entire Chapter 1.
information content of the database, again in a form that is 4 Database Systems. C.J Date. 4th Edition.
Chapter 19.
1 Database Systems. Thomas Connolly. 3rd Edition. 5 Database Management Systems. Gerald V.Post.

Page 1 / 37
(which is the Primary key definition) and Referential
One could deduce that foreign keys are covered in Integrity (which is the Foreign Key definition). He
part under point 1, data validation. I would conclude refers to the rules as states.
that data validation has more to do with data
semantics more than the relationship rules. So by having a foreign key does not mean you have
data integrity. All you are doing is describing a
Integrity does try to ensure that queries when mathematical relation between two entities. The
performed are correct. When you look more at the mathematical relationship thus ensures that when
relational model and review the use of normalization, queries are performed there are no anomalies.
you begin to see that a key function that is there is
to provide protection from anomalies: The first key point I am raising is that the usage
behind what a foreign key actually is and does is
vii. “The process of normalization is formal method … so being misrepresented. The statement about foreign
that a relational schema can be normalized to a specific keys implies that by using them we are ensuring we
form to prevent the possible occurrence of update have data integrity. Examples abound in the English
anomalies”6 language where the meanings behind words change
over time, especially when it meets political or
So when we configure our schema to conform to the marketing aims. Examples include free trade,
relational model we are in fact establishing an globalization, proactive, empowerment and a more
environment so that when we query the data we can recent one is detox. In the case of foreign keys I am
be sure there are no anomalies arising when that suggesting that the meaning for term foreign key has
data is queried. We are assured that the response changed over time due to misuse, overuse and as I
from the query is mathematically exact. This is have seen, by uttering this term you command
important because the relational model is based on unquestioning attention. No one will dare question
predicate calculus. We can use this to mathematically the authority of the what the foreign key is.
prove a SQL statement (which I recall painfully doing
in my finally year of university). So the crux is that a foreign key can when used,
ensures that when a query is performed the results
Lets now look at constraints: can be mathematically shown to be correct.
Nowhere does it claim that the actual data that
viii. “Why are primary keys important? The answer is that comes back is accurate. Nowhere does it claim that
they provide the sole tuple-level addressing mechanism the data entered is correct. Correctness is covered in
within the relational model...its follows, therefore, that data semantics.
primary keys are absolutely fundamental the the operation
of the overall relational model”7
This doesn't dispel the notion that we have no need
for foreign keys, all I have shown is a foreign key
Date then importantly states:
and real data integrity are two independent
concepts. Can I have data integrity without foreign
ix. “ Foreign-to-primary-key matches represent references
keys? Answer is yes. Do I have data integrity with
from one relation to another; they are the glue that holds
the database together … Note carefully, however, that not Foreign Keys? The answer is no - remember that
all such relationships are represented by foreign-to- foreign keys do not give you backup, security, data
primary key matches.” 8 validation. Also, as I will show later, the whole notion
of referential integrity is an outdated concept.
Date doesn't use the strong words about foreign
keys as he does with the primary key. He clearly Are there situations where the relational logical
points out that primary keys are fundamental to the model says we must have a foreign key relationship
relational model. He then implies that foreign keys and it must be managed, but the real life situation
are useful but can be represented using other says its fine to break it? The following example
methods. highlights one such case, and as you will read later
on the use of applications on the Internet changes
Date refers to the two core integrity rules Entity the rules.

6 Database Systems. Thomas Connolly. 3rd Edition. In this case we have an e-commerce application. We
Chapter 13. have an account which has many purchases. The two
7 Database Systems. C.J Date. 4th Edition. tables are linked together by a primary/foreign key
Chapter 12. relationship. An account can make many purchases
8 Database Systems. C.J Date. 4th Edition. and it does not make sense to have purchases
Chapter 12.

Page 2 / 37
without an account. But lets now throw a spanner in objects the rules changed, the
the works. The accounts are created by the general way we work with data,
public, and as per a Government privacy ruling, the tuning and our understanding
owner of the account has the right to insist on their of the database should have
account being deleted. It is thus a legal requirement changed also. What I have
that on request the account is deleted. In addition, observed is the notion of
because of auditing requirements, purchase order being object/relational was an anathema to most,
information cannot be deleted. This is also a legal and rather than embracing and using the object
requirement as introduced by the tax department features, data architects, DBAs and developers have
and deleting purchase records can be seen as a steadfastly refused to, and stuck to the relational
fraudulent activity. Because the government cannot model hoping that by ignoring it or not dealing with
get an accurate figure for tax purposes. it, then this object interloper would somehow go
away. What compounded this view was that most of
So now we have an interesting situation. If a the Oracle tools did not support objects and the only
customer requests their account to be deleted, the interaction to them was by hiding them behind
actual record must be deleted. The parent record objects views.
must be destroyed. But We cannot delete the child
records for legal reasons also. The solution to this So if you have been thinking of having a debate
requires either warping our definition for a foreign based on the rules supporting the relational model,
key (possibly by nullifying the parent record data but remember that the rules have changed. The
keeping it), re-architecting the database to somehow arguments are irrelevant. Possibly fun to have, but in
get around this conflict, or just not getting hung up the greater picture will not achieve anything
on the notion that we need foreign key rules rigidly worthwhile. Subsequent discussion points will refer
maintained and living with the orphaned record. We to the relational model, but more to highlight how
then factor into reports that the parent might not the use of the model has been misinterpreted and
exist, do different join conditions and just generally misused over the years, and the core idea of
work around it. separating the logical from the physical model was
forgotten about.
The physical model does not have to match the
logical model. The introduction of objects breaks most of the
fundamental rules of the relational model, and this is
And now the big question? Is my data accurate if I seen when it comes to normalization.
follow the relational model to the extreme? Answer is 1NF “A relation in which the intersection of each row
a firm no. Is it the end of the world if I don't. No. We and column contains one and only one value”. A UNF
break the rules all the time, yet we have this table contains one or more repeating groups.9
perception relational is the way to go. And yet by
daring to attack a core relational concept, the So why introduce objects in the database when they
Foreign Key, I dare to attack the core belief system. break this cherished relational rule? A variety of
reasons, but one of the main ones is performance.
Why have the relational rules and what do these Using objects you can achieve magnitude
rules actually achieve? This is going to be a crucial improvements in performance over relational
question to answer. designs. This has been proven in the real world
where telecommunication systems use objects to
By this time in the discussion I hope you have been achieve performance. The use of objects also
furiously scribbling notes in the margins highlighting enables easier translation to real world objects in the
some of the possible logical inconsistencies in the design. Its a lot easier to design a car as an object
arguments raised above. You might even be thinking than as a relation.
that its going to be easy to dismiss the discussion
points based on pure deductive reasoning and thus So if objects are so great, why aren't we all using
prove that Foreign Keys are the best thing ever. them? Performance and complexity. I see object
usage in the database as one where if used correctly
Its best then that I highlight what might seem like a you can achieve great performance improvements,
small and inconsequential point and that is: Oracle is but at a cost in other areas, in particular the ability to
not a relational database, it is, and has been since do cross reference queries, usability, minor updates
Oracle 8.0, classed as an object/relational database. and maintenance. Also, its very easy to misuse
objects, and the core tenant behind objects with its
When Oracle expanded the database to include
9 Database Systems. Thomas Connolly. 3rd Edition.

Page 3 / 37
layered structure (hiding complexity) lends itself to Discussion 2: The Multimedia World View
inefficient programming practices resulting in
compounding performance problems. So object When you deal in the object/relational world and
usage can go from great performance to abysmal, then move into the multimedia world, you find you
whereas relational tends to promote a more are working in a niche. But this is a niche so new
consistent one. In relational breaking the data apart that the rules are defined retrospectively, only years
and then reassembling per query, lends itself to after you build and design applications. In this world,
consistent performance. If you think performance of its a different game. The methodology for
relational systems can vary, you have never then development, the way you tackle development, the
worked with objects. speed and management of development shows that
you are in a new paradigm.
The powerful capability of an object/relational
database is that you can simultaneously have both Traditional logic, traditional methodologies just do
models working, achieving the performance required. not work in this world view. Whey you look at
In this case the complexity arises in data images, audio and video you enter a world of fuzzy
management and replicating the data between the logic and thinking. You have to leave the safe world
structures. of binary logic behind. If you try to use them failure
is certain. This is one reason why there has been a
But back to foreign keys. In the object world, they slow adoption rate in the use of multimedia,
are not required to achieve data integrity. In the especially multimedia in the database.
object world, UNF is allowed. We can repeat fields to
improve performance, we sacrifice data redundancy When you deal with multimedia you enter the world
for performance, we replicate data, warp it and where the concept of exactness changes. Ask two
change it to achieve better performance or allow us photographers if a photo is award winning and they
to perform more complex queries against it. will disagree. Just like judging wine and whether a
Shiraz is better than a Cabernet, comparing photos,
The same holds true when looking at XML. XML is or digital images is a fuzzy science.
hierarchical and foreign keys just do not fit into it.
Spatial data is different again. Multimedia data is not accurate. It can change over
time. A video can be edited, soundtracks in it
changed, even the meta data embedded in it can
change. It can be compressed, resized or adjusted.
When asking about Foreign With multimedia you move into a world of
Keys and data integrity, it was uncertainties. Data accuracy isn't an achievable goal
commented "don't you want or the holy grail you aim for, its just not in the
your data secure?" As if having equation. Welcome to the world of fuzzy data. In this
a Foreign key is the be all and world, foreign keys are left at the front door along
end all. As if I have perfect data with other relational concepts that do not fit.
if I have a foreign key. As the British might say -
Bollocks to that. When discussing the topic of data The problem most people working with relational
integrity and foreign keys, the following real life data have is they perceive multimedia data as just
example was given to me: "don't you want correct another subtype, just an obscure or special case of
values entered for whether a Credit Card is Visa or relational data. Its just binary data. This is not the
MC?". This represents a typical comment that I case. I will show in later discussions that multimedia
have heard many a time, where people are is a superset of relational. Once you understand this,
confusing data integrity and misrepresenting what
the way you treat multimedia data is different and
the relational model was trying to achieve. The
you will start to see why relational concepts fall
answer to the above question has nothing to do
apart.
with foreign keys but highlights how most people
see the foreign key as the protector of the
Once you abandon the safe, traditional world of
database, its guardian. There is a difference
between data integrity and data semantics. relational and realize that attempting to obtain the
nirvana of creating a database with perfect data
integrity is not possible, then you move into the
world of fuzzy data management. This is the world
where imperfection is not your enemy but just part
of the equation, where you realize that having
`100% data integrity means either having no data or

Page 4 / 37
no users. Putting users into the equation means you immediately? If it was relational data it would be
have to deal with data imperfection or bad data on easy to do in one transaction, but with multimedia
entry. The mentality programmer's have today is to the attempt to delete such a large volume of data
write programs that minimize or completely eradicate will impact disk, logs and rollback. On a single disk,
any chance of bad data. or single I/O system such a delete could result in
major performance issues for other customers using
When you move away from the relational and logical the site. A solution, and a fair solution (but not the
model and move to the physical model, by trying to only solution) is to perform a delayed delete. In this
match your business to the real world, you will start case the collection parent record is deleted, then a
to move towards object relational. In doing so, some batch job runs and deletes the records in the image
core relational concepts, beliefs get in the way of table with a delay in between. So for the duration of
development. Add a web front end, security, and the delete (which could be days depending on
every relational concept cherished needs to be performance), during this time the foreign key
rethought. integrity rule is broken. We have orphaned child
records.
Lets highlight this by looking at two multimedia
examples. The first is the delayed delete and the We could adjust the logical design and include a
second concerns Internet replication. Lets setup a physical change – mark the parent as deleted but
multimedia application. In it I have a collection, and only delete it when all child records are deleted. This
a collection contains images and video. Lets say that raises the concept that our logical needs to be
one collection contains hundreds of thousands of changed to adapt to physical conditions (which is
images. There is a one to many relationship between covered later). It also introduces the fuzzy concept of
the collection and image table. A primary key in a record being marked as deleted and not actually
collection maps to a foreign key in the image table. being deleted. For those who believe that constraints
Now lets say we need to delete the collection (in a are crucial for maintaining consistency when there
real life scenario the collection could represent an are multiple applications potentially accessing the
area in a museum that has been made redundant). data, it means each one has to be modified to deal
To delete the collection is simple, but to then delete with this partially deleted record, otherwise it could
all the hundreds of thousands of images isn't (one reference it, when it shouldn't.
could state that with Securefiles in Oracle11G it is
faster, but lets assume its an expensive I/O). Should Lets extend the example and say that some of the
the delete all be done in one transaction and done images have been replicated to other sites on the

Page 5 / 37
Internet using an Internet distributed database (a For those who would like to dismiss fuzzy logic as
concept discussed in greater detail in another just a fad (or some might even say a failed trend
discussion). In this case, the delete of the child with AI development), keep in mind that fuzzy logic
images in the collection involves going through is a mathematical model that has proven that
security and remote procedure calls. Even if the calls binary logic is just a subset (or special case) within
fail, a job could easily run and cleanup orphaned it.
records at a later time. The need to follow foreign
key relationship rules just isn't required. See :
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/
When looking at the delayed delete it raises the vol1/sbaa/article1.html
issue, can we have a foreign key but have its
implementation delayed? So rather than do a There are numerous books in the market covering
cascade update in the same transaction, do the fuzzy logic. A very good one is Fuzzy Logic by
Daniel McNeill and Paul Freiberger.
update that night as part of a batch job? Or have a
batch job run daily that removes orphaned records?
In addition, on the Internet you should be able to
A relational purist would say no, it has to be in the
find discussions covering the notion that fuzzy is a
same transaction, but in reality this concept is done
superset of binary logic. This highlights the case
all the time. It might not be with foreign keys but it
that one can state that multimedia is a superset of
is done with other relationships or enforcement of relational.
constraints within the database. For example, if a
new user is created in the database via public means From : http://www.economicexpert.com/a/Fuzzy:logic.htm
(lets say a self service kiosk), are their details “Another common misconception is that fuzzy logic
validated immediately, or is their a process, which is a new way of expressing probability. However,
could be manual, to validate them? And what Bart Kosko has shown that probability is a subset of
happens on failure to validate? In this case we have fuzzy logic, as probability only handles one kind of
invalid user records which are cleaned up at a later uncertainty. He also proved a theorem
time. The developer will attach a status to the record demonstrating that Bayes' theorem can be derived
to indicate in fuzzy terms whether the user record is from the concept of fuzzy subset hood . This should
valid or pending, that way when reports are run they not by any means suggest that all those who study
can ignore the pending records. That way they can probability accept or even understand fuzzy logic,
achieve data integrity. however: to many, fuzzy logic is still a curiosity. “

How many times have you thought of using Primary


Key/Foreign Key and didn't for any reason? Its the
old exception to the rule. Do you have Primary Keys
on every table without any exceptions? You don't -
well if you don't then you have already started on
the fuzzy path. And if you do, do you follow 3NF?,
4NF, 5NF?, do you even know what that is? At what
point do you you stop, or is it after 1NF? There are
different grades, levels of relational, and are you just
making arbitrary decisions as to which one is best
suited for your environment?

Page 6 / 37
Discussion 3: Reactive Thinking major cause of tuning issues. The problem is that
nobody knows, and until someone stands up and
Conventional thinking still focuses on reactive dares to ask the question why are we following this
thinking. We are presented with a problem then rule? We will never know.
come up with the solution. In our environment we
keep working until we hit a problem. Problem solving This is what I believe has happened with the
like puzzle solving can be fun, and as the industry relational model and in particular foreign keys. One
support this thinking pattern it has become ingrained major benefit of the relational model was that when
in our culture. We don't anticipate. The work culture it was introduced it saved on data redundancy, it
does not reward proactive thinking. Solving problems forced into the design that only one value was
before they occur is expected. “We expect it to work needed to be stored. This was crucial as it saved on
until something happens”. disk space which was a precious resource in the
1970's, 80's and early 1990's. Now the rules have
The culture of an organization can control whether changed. Disk is incredibly cheap and plentiful and
abrasive or dominant personalities win arguments. DBAs are happy to waste huge amounts of disk
Have you ever been involved in discussion where a space to achieve better performance. For data
strong personality pushes the use of a tool because warehouses the standard rule is 8-1. We are now
they think its the best thing ever? You might duplicating our data via indexes, materialized views,
encounter developers who swear by PHP, Perl or replication and through the use of data warehouses
Ruby. In the Oracle community I have witnessed to the point where for every byte of actual data
similar personalities pushing Apex, ADF or Java. entered we have at least 8 additional supporting
Strong personalities who adopt a righteous attitude bytes. And now we see that foreign keys are being
can be even more dangerous in an environment as recruited as a new strategy for improving
they will push their agenda without listening to the performance. This is not what foreign keys are about
issues and usually end arguments using intimidation. and we should note that the equivalent performance
The arguments might be along the lines, “I am the gains could be achieved by implementing new
manager”, “I have a degree”, “We will use this tool constructs which are better suited for the purpose
unless you can show me your tool is better”, “Your (my initial thought was to suggest better adherence
tools doesn't work/is slow/expensive/doesn't to the SQL3 standard, but requires further
work/has issues or I just hate it”. investigation).

Society has this habit of creating reactionary rules, or So now we have proponents of foreign keys who
rules without expiry dates. This is typically seen in IT think they are the best thing because with them you
when it comes to database usage, tuning and could get better performance, but that's not why
application tuning. The rules are created as a they were created. So now we have the potential
reaction to some bad practice seen or technology scenario were in 10 years time DBAs could be
limitations. We hear about the need to use bind creating foreign key structures as readily as indexes
variables, because dynamic SQL is bad, or don't without understanding why they are there, except
modify this database parameter because it will cause that we might get better performance if they are
full table scans. All these rules are valid at the time there. Date was very careful to separate out the
of writing, but are stated without looking at a more physical design from the logical and ensure the two
general framework or under what conditions the are separate. This was done for the key reason of
rules are considered invalid or have expired. ensuring that logical model was kept correct. Date
states that “The DBA must also decide how the data
What actually happens is these rules get introduced is to be represented in the database – this process is
then written in stone “thou shalt not do this”. After referred to as the physical design”10
some time their original meaning gets lost as
different interpretations are applied to them. As If you are trying to follow the
technology changes, installs change, business relational model then this is an
practices change and staff change, then these rules important point, its critical to this
stay on and become ingrained in the business discussion point. You should do
practice. And when asked why? The answer is your database design, normalize it
invariable, that's how we have always done it. to a realistic point, then take that
design and transpose it to the physical. Just because
We now have a scenario were a rule is being blindly the logical design imposes foreign keys, does not
followed and because of technological changes could
be redundant. It could be still be valid. It could be a 10 Database Systems. C.J Date. 4th Edition.
Chapter 2.

Page 7 / 37
mean the physical design has to. The mentality of
developers is to take the logical design straight into And what ultimately causes this behavior, one where
the database without considering the physical we do not question why, review old decisions or are
limitations. Date stresses time after time there is a afraid to embrace change? The culture of an
fundamental difference between the logical and organization controls this, and this will be discussed
physical design. Its a point we have forgotten about in a later section as being the core reason why there
in the mist of time. are database performance issues.

This leads into another interesting issue caused by I am a DBA and Developer, not A DBA who became
bad thinking strategies and this is superstitious a developer, or a developer who become a DBA, I
behavior. When something works and works well, am both and very active in both areas. I am also
the steps required to achieve it are usually noted, active in System Administration, Network
but rarely do we review those steps to see if there is Administration, UI Design and security, but do not
an optimal way of achieving it. For a DBA doing class myself as being sufficiently worldly
maintenance on a table they might follow a well experienced in them to claim the usage title, even
worn path to perform that maintenance without even though I have extensive experience in all those
look at improving each step, or at times even asking fields. My focus for the last twelve years has been
why they are even bothering to do the whole on multimedia and the Internet.
maintenance routine. This can be seen by DBAs who
might be performing maintenance on indexes and
routinely rebuilding them, thinking this will balance
them and improve performance. Not realizing this is
a redundant step. It can also be seen by developers
who write their SQL statements a certain way or
write routines in code. If it works why change it?

Its an issue we are always getting ourselves in, and


its why its important to listen to peers, see how
others program, attend conferences, or just generally
review coding practices to see if they are redundant.
I am constantly having to review code I wrote with
every minor release to see if its mere existence is to
be questioned. I have rewritten routines three or
more times (ten's of thousands of line of code)
because the database architecture in a new release
changed the rules, meaning the code was dated. I
have thrown out code when the database was
enhanced and could do the feature I had once built
(eg. encryption). I have routines written that I have
hoping I can throw out when Oracle finally listens
and introduces in the features I am after. Its a
struggle, its a constant learning exercise, its very,
very hard to keep up to date with the rapidly
changing database. Leave the game for six months
and a lot of concepts learnt can become dated. And
yet what is seen in most organizations? Resistance to
change, resistance to new ideas, resistance to
reviewing old concepts. Lets keep the status quo.

In the organization I am in, I try to encourage new


staff to review coding practices, architectures and
challenge me to justify why they are there.
Sometimes they put forward views that are so out of
left field that I am forced to review core components
of the code base. I have even been know to rewrite
or abandon key code areas overnight because of
this.

Page 8 / 37
Discussion 4: The Rules Change comes to multimedia support. The 11G use of
Securefiles shows a new mentality for binary
When you have worked in the multimedia world you management. Its a start, but multimedia
really do start to appreciate that the rules change. management goes beyond Securefiles into new
Everything you have taken for granted, all your areas.
preconceptions, everything which form your core
foundation of knowledge, your safety net - it all Oracle has a very powerful story for multimedia and
changes. Its scary, its fun, its a new world. Every transactional support and gets close with the ability
core concept that was taken for granted needs to be to support doing a commit/rollback of multimedia in
reviewed, rethought and adapted for this new world the one transaction. I am not aware of other
view. databases being able to handle this. Most have blob
inserts treated separately. A number can't even
Its one of the main reasons why I believe the uptake handle a blob being larger than their equivalent of a
of multimedia has been so slow as its been so redo log.
frightening for DBAs and Developers to deal with this
new world view. Its scary. Its such a radically new And yet after ten years and trying to convince people
environment to work in because you are so far out of to think that having images in the database is the
your comfort zone, most would find that its safer to safest, securest way to go, I am still constantly
stay in the current safe, cozy bed. encountering developers, DBAs and manager who
insist on storing them outside the database, as if that
Some examples to highlight. Lets look at tuning. is the only possible way. Its frightening, its
With multimedia you don't deal with small, simple frustrating but it shows how ingrained into people's
rows and nice safe atomic transactions. You can deal psyches relational is. Its a worry, its a concern.
with records that are gigabytes in size. Ever try to
index an image? What does indexing an image
actually mean? In the multimedia world you are not
just dealing with binary, but a whole new range of
types and interacting with those types changes. If
you think changing from Varchar to date is complex,
try switching from a video to a photo. Try converting
audio to text or creating a graphic from spatial co-
ordinates. Now try searching on them. Think a where
clause can get complex, try doing search on a video
for a chair in it. Just trying to answer that one is so
complex and difficult its beyond current technology,
so you need to employ workarounds.

Now onto more typical multimedia issues and tuning.


How much CPU does it take to create a video
snippet? What about rotating an image or
compressing it. Does zipping a JPEG actually work?
Its at this point you realize how many features in
the database are built for relational and not
multimedia. Oracle pushes compression in the
database, but they refer to compression of relational
data. Multimedia compression is a science in its own With Oracle, backup/recovery is mature. Yet in the
right. Compression with RMAN and backups just multimedia world its still in its early infancy. So many
doesn't achieve anything of note. features are needed to make backups more efficient
and faster. Move to the Internet and there are no
Working with Multimedia is like being a left hander in features in Oracle designed to enable the efficient
a right handed world. All Oracle features are geared replication of images between sites, because the
around relational. Enter Multimedia and you feel like mentality just does not exist for understanding that
the poor second cousin. Try using Multimedia with with Multimedia the slow point is the network
Streams, replication, partitions, SGA management, connection and not CPU, I/O or memory.
clusters and you begin to see the limitations that are
inherit in them. And yet the Oracle database is a Lets say you have an audio file and want to allow
generation ahead of all other databases when it thousands of users around the world to download it.

Page 9 / 37
Do you put it on one central site and give the server Discussion 5: Feel threatened
a lot of CPU to allow retrieval? No, because it will not
scale because downloading a large number of audio Taking on Foreign Keys, critiquing them is so
files will result in the Internet connection becoming threatening a concept for nearly everyone that all
clogged? How about caching the image? No, because arguments I have raised, especially about the illusion
once more the Internet connection becomes clogged. of data integrity, consistency are dismissed out of
What about introducing RAC and allowing the image hand. No thought is dared given to this radical
to be delivered from different instances? Answer is concept, because if it was true the fundamental pillar
still the network will become clogged as all instances of their belief system would collapse. If such a belief
still deliver through the same network. Traditional system could collapse so easily, maybe the core on
tuning methods are just not focusing on the right which it is founded is flawed. Maybe there is just too
area. The solution is to replicate via the Internet, much dependent on it. As I am trying to show, the
securely, the audio files to multiple sites, in different relational model does not match the real world.
countries around the world. Load balancing can then
be achieved, but its achieved using a different mind Or maybe, if you look at the original definition for
set. No one supports this architectural concept, yet it relational, you see that the logical model was clearly
is core to scaling multimedia to deliver to large separated from the physical model. Date was very
numbers of users. clear on this point.

To re-emphasize, with multimedia and the Internet To that point I happily state that foreign keys are
the key performance dimension is not CPU, I/O, fine in the relational world when being used in the
Memory or internal networking performance, but logical model. When moving to the physical model,
rather the speed of the Internet connection to the that is when their usage should be reviewed. When
outside world. A dimension that is forgotten by all dealing with multimedia they should be reviewed.
database tuners as its beyond their supposed Foreign keys can be defined in the Oracle database
control, because in the current world, DBAs and as just definitions, which can prove to be useful
Network managers do not work together. In most when using certain tools. Its when you enable them
organizations they probably have never met. Which and force the relationship that you move into a
leads to a key discussion point as to why the culture different world.
of an organization is crucial for tuning and is covered
later. So to emphasize, the real world is different and will
not always or cannot match our logical model. There
So to get around these incapabilities you have to live are no data models so far in existence that work in
in a world where the rules change. For starters, the real world. None. Its not like you can create UML
realize this - Multimedia is a superset of relational. or an E/R diagram and know that when you put your
i.e. relational data is just another Multimedia type. model together it addresses all the database issues.
Fuzzy logic is a superset of boolean, its not a Some give the hope they are close, but so far all
replacement. When you being to appreciate that, you models assume infinite CPU, network bandwidth and
start to see that the multimedia world view is not a storage. None take into account performance, none
threat, not a new construct, but rather the next level factor in user mistakes on data entry, or entering in
in the database world. inaccurate date. None take into account availability,
cross platform issues or the changing nature of
technology. None. Not even close. Yet we adhere to
these models as if they biblical.

If so such a blind adherence to these models was


adopted in the car industry covering safety tests,
then they would be thrown out. In the car and plane
industry they create theoretical models which they
then test in real world scenarios. The might test a
crash between two cars to see if air bags deploy, or
push an airplane wing to its limits to see at what
point it will crack. They then check their theoretical
models against the real world and ensure they
match. If they don't they adjust, adapt or restart.
And thank goodness they do this, because would you
want to fly a plane where the model used for wing

Page 10 / 37
structure and failure tolerance was never updated for in an internal environment. To allow a user to make
new materials? a purchase you get them to create an account and
then buy an item. We now can track the user and
Yet in IT we try and fit our developers and users to keep them up to date, inform them and in effect
the relational model. So in IT do we take our world “control them”.
view and mash it to fit the model? How many times
would we like to change our users to fit the model? Internet users are getting fed up with being
Our view is that its not the models fault but the users controlled, and being asked to create accounts to
fault. buy something. When we go to the store we aren't
forced to fill in a company membership, we just buy
If our database design fails then it must be because the item and leave. So the mentality changes to the
our design wasn't right. The relational model is point where users enter in dummy data or just do
flexible, enables change, but how many times do not go to the site. Meaning if you force users to
DBAs have to take the model then change it to enter in data, the data you get is likely not accurate.
achieve scalability or reliability? In the 1990s when If you don't force customers to enter it in, but still
relational really started to appear, this was common give them the option, they will put in the data if they
practice and well understood. It was just part of the want to. If they do, its likely going to be more
course. accurate.

Now think of how many new structures are in the Another important point to consider is that as more
Oracle database and have been put there because an more people use browsers and get comfortable
the model is flawed and can't represent the real with the Internet and the simplicity of most screens,
world? When we look at foreign keys I see a model then this is the standard they will expect. The notion
that is out of date and unrealistic for most scenario's. is that for screen development, if I have a basic idea
It doesn't work with multimedia and its gaping holes of what I want to do, then the interface should be
are exposed in it for Internet applications. It needs intelligent and well designed enough to allow me to
to be updated, replaced or just abandoned. The do what I need without help. This concept is now
object oriented model is an improvement but still making its way back into internal applications and
fails miserably in most areas raised above. Blind within five to ten years will start to drive and
adherence to the wonderment of Foreign Keys to me dominate all application design. Its incredibly hard to
says, you are blinkered, uncompromising and a do, but most users will expect it. “If its so easy to do
dinosaur. on the internet why can't it be easy to do the same
on my internal system?”
The Internet introduced more users, a new market
and a larger number of ways of highlighting the The only way of achieving this is by building
flaws in the model. When you build applications for applications that run on a web browser. As browser
the Internet, you soon realize how smart your technology improves, they will be able to do more
internal users (intranet users) are in comparison the things easily (e.g. field validation, asynchronous
general public. updates, easier updates). So the outside world is
going to be making its way into the intranet world.
In the general population there
are users at such radically Its a different world out there in the Internet, a
different skill set levels that it different mindset and all your current notions,
pushes the concept of a UI models, assumptions, knowledge base, will likely not
design to new levels. You start work when applying them to the real world. And you
to understand what true data can't force the Internet population to adapt to your
accuracy is and what the real meaning of data relational model, as the saying goes, its easier to
integrity is. herd cats.

This was highlighted just recently to me when a When you adjust your thinking patterns to be fuzzy,
customer changed direction on their interface to one a lot changes. World perceptions change. On a
which takes into account the changing nature of the technical side a lot of core concepts once taken for
Internet. They noted that to get “accurate data don't granted are made irrelevant. Database tuning,
force users”. security and application debugging are all
different. From the way you trap errors to the
Traditional thinking when applied to Internet reliance of accurate user feedback when an issue
development used models based on computer usage occurs. All of this changes.

Page 11 / 37
Discussion 6: The practical side concept of having orphaned records (which really
isn't all that bad). I am aware of this feature being so
Now from a practical side. As a DBA what are some powerful and useful that sites have banned its use,
common issues seen with Foreign Keys? primarily because they have learnt the hard way that
when doing maintenance they might accidentally lose
Any DBA who has to manage a large database and data.
who has to do the maintenance of them in the early
hours of the morning, will understand how painful The relational model never factors in data
having foreign keys are. There are other object maintenance.
structures in the database that are also painful to
deal with like materialized views, triggers and This requirement by sites to ban
replication, but because Foreign Keys link tables such a core feature must surely
together they enter a new level of maintenance spark a light bulb in people's
torture. As Developers and Managers are usually thinking that there is something
oblivious to this, they just attribute the rantings of fundamentally wrong with their
the DBA when they encounter these issues as just usage? But no, because our thinking patterns
part of the DBAs usual negative attitude. prevent this. If we don't use Foreign Keys, cascade
deletes, its not because of the model, its because of
If I was a vindictive developer I would put self us.
referencing constraints on tables or constraints
between tables in different schema's, and then ask The classic breakup line: “I am breaking up this
the DBAs to perform a basic maintenance operation. relationship not because of you, its me”. Sure. Its
The satisfaction that would bring. really you, but we haven't the stomach to say so.

Foreign Keys usage slows down maintenance and The cascade delete scenario:
can cause major issues, especially if you forget they
are there, or in most cases are not aware they are A DBA deletes from a table that has the cascade
there. Now in today's environment there are a lot of delete constraint. Lets say they are running a
third party applications, and the DBA cannot deal maintenance script provided by the developers, with
with the developers of the application. But, they the aim being to upgrade the table. The foreign key
might have to perform maintenance on applications kicks in and deletes child records (lets make it fun
to which they have no documentation or knowledge and say this delete cascades and causes deletes in
about the internals (most people tend to forget this multiple tables). All well and good, except that the
little point, DBAs are responsible for all database delete was part of maintenance and the rows in the
applications, and not just the ones built internally). original table need to go back in. Oh dear, we have
So if a schema has foreign keys, a database import now have a very bad scenario – actual corruption.
now takes longer because it has to be run multiple The child records are deleted, they are lost, they
times (Note: If you don't understand why, then you can't go back in. This is a very serious issue, so
must be a developer or manager, and enough said.) serious that I have heard most sites just don't allow
cascade delete just because of it.
The response is to document the keys, their usage or
just disable them before maintenance. Yes, if that Table management scenario:
was done, and now I say to developers, how many
times do you warn the DBAs about Foreign Keys? Or Want to drop a table to do some storage
do you expect them to use some fancy WYSIWYG management? I am sorry, you can't because there is
tool to detect them? And so how do DBAs deal with a Foreign Key on it (don't even think of doing a
third party applications that use them? Walking truncate). Well the knowledgeable DBA or the
through a minefield is simpler. Basically, the DBA developer who is the reading the manuals counters
becomes a sapper and have to hunt and defuse the by saying you should do the drop with cascade.
foreign key mines. Wonderful idea. But wait, what was just dropped?
How do I put it back and did I put it back correctly?
Lets now focus on the terror of all foreign keys, the The developers at this point are saying that this is all
cascade delete. Such a powerful feature, such a core part of the DBAs job and they should just live with it.
component of foreign keys (once more refer to Date The DBAs should be extracting the definitions before
and his strong views on them) that not only do you they do maintenance, and yes there are some
link tables together but you hard code deletes wonderful tools for doing this. The manuals make it
between the two. In theory its a wonderful construct look all simple, especially as their examples work on
and achieves exactly what it aims to, it removes the

Page 12 / 37
2 or 3 tables. Try then working on a schema or you have good as admitted that the Foreign Key
schema's with 4000 or more tables. Then try to take model is flawed because if it was correct there would
into consideration tables with Foreign Keys linked to be no need to prevent usage of this fundamental
tables in another schema. It now becomes a feature.
maintenance nightmare. Now trying doing this
maintenance in a time window which is only open for Hang on you now ask, isn't that point pure
a couple of hours, usually at 2am and you soon black/white thinking? Yes it is, and I hope it
appreciate the pain DBAs suffer when it comes to highlights how flawed this binary thinking is when it
having to work with the keys. But the DBAs are comes to Foreign Keys. The real world is gray, it
forced to do it, all because without foreign keys we doesn't conform to the relational model and trying to
have not data integrity. The developers had better squeeze the real world to make it match the
hope the DBAs don't wise up and realize this is just a relational model just will not work.
myth and integrity can be achieved equally well
using other techniques. ENABLE NOVALIDATE Constraints:

Triggers are equally bad, but at least triggers can Now we get to my favorite, a constraint enabled with
be disabled/enabled quickly. A trigger doesn't NOVALIDATE. One major reason for introducing it
suffer from locking and scalability issues on DML was because it takes too long to enable a constraint.
that Foreign key's are prone to. They are though So its just turned on (there is the more frightening
prone to mutation which is being addressed by thought that we have data that doesn't conform but
Oracle. Triggers do offer advantages when it it will take too long to fix, so lets just ignore it and
comes to working on third party applications hope no one notices).
where you don't have access to the source code. In
this case triggers become an indispensable tool. Lets now forget about the fact that the table could or
could not be corrupted, but it doesn't matter because
This raises a new topic worth covering. Most tuning we now have our precious foreign key enabled and
books, manuals and presentations focus on tuning checking all new data going in. The concept of a
as if you have access to the source code. There are “novalidated” Foreign Key is so absurd when one
a lot of third party applications that companies are
blindly adheres to the relational model. But from a
buying and resulting in the DBA having to manage
fuzzy view its quite a sane concept. I'll repeat that
and tune them,as well as integrate them into the
because its important. When you change your world
environment. When looking at third party
view to the world of fuzzy logic, it's quite OK to
applications in your environment, the rules for
adopt this concept.
tuning are different.
Now ask the question "is the table correct?" when
Which scenario is worse – the potential of having an
you enable a constraint without valildation. The
orphaned record or the frightening specter that a
analogy here is to look at quantum theory and
DBA day doing maintenance at two in the morning
whether the spin of a particle is either up or down.
neglects to notice a cascade delete constraint and
When looking at a particle and before measurement,
inadvertently corrupts the data by issuing it?
quantum theory says the particle is both in an up
The greatest cause of downtime is that caused by the and down state (it is both, not one or the other).
DBA. Only when it is measured does it collapse to become
Corollary: To ensure high availability keep the DBA either up or down. When looking at this type of
away from the database. constraint, the table is both correct and not correct,
its only when we observe the table (look at the keys
Please note this doesn't imply the DBA is at fault and check them) do we see what state it falls into.
and to be blamed for the downtime. This is a fuzzy scenario, akin to Schroedinger's cat
where we can create a third state from two
How many safeguards need to be put in place to seemingly binary states. The table isn't either correct
prevent this scenario? How much double checking, or not correct, its in a new state. Relational model
and additional wasted time has to be put in place to has no hope of dealing with it, yet this new concept
protect the database from foreign keys? Why not just had to be invented to deal with realities of bad data
put them it into the application? Get the application or with the fact we don't have enough time to
right. Its not a hard concept to come to terms with. properly enable the constraint. Wow, Foreign keys
can be fun, but that's the geek in me talking.
And as you scream out and say, “we don't allow
cascade deletes on Foreign Keys at our site”, then I would have thought by the time this feature came

Page 13 / 37
in, this blind adherence to the binary, black/white and has been proven time and time again not work
thinking of Foreign Keys would have made everyone in real world situations. We have had the belief that
rethink their position on them. But no, the opposite the logical model must match to the physical model
happened. People just blocked them out, denied and if it doesn't fit, we can just make the real world
their existence. The model is fine, its our use of it fit to match it. We need to abandon the notion that
which is wrong - if we could only change the world the relational logical model can match the physical
to fit our model, it would prove to be perfect. So now one.
we see the relational purists put their head in the
sand and mutter "they do not exist, they do not
exist". Those that accepted this features existence Scare tactics. If I remove foreign keys then my
are the ones that have started to go down the fuzzy database would become a mess. Translation - I
path of thinking, the path for real world usage, the don't trust my developers to do the right thing or I
path to a newer model, one that works in the real couldn't be bothered checking code and programs
world. so I'll do the easiest thing and put in Foreign Keys.
My translation - the DBA works in a culture that
So back to maintenance, what do we have for DBAs? prevents them from working with the Developers
Blind adherence, usage of the model as dictated by and ensuring they understand relationships
developers and data architects, thinking that by between tables better.
having their data in a Foreign Key that its safe? The
DBAs are having to spend a lot of time managing The culture of the organization encourages
and dealing with these constructs and they suffer haphazard programming techniques. The
because of them. implementation of Foreign Keys is there as a band
aid solution highlighting an otherwise gaping
wound. Focus on healing the wound and not
Lets now do a thought exercise. Think about this.
applying a band aid which will not help.
Imagine a world where Foreign Keys never existed
(for the “relationists” that is hard, but please try).
Imagine then that someone said to a DBA, lets
introduce this feature: Its one that will slow down
insert and updates, require a new method for
breaking transactions, you will double, triple your
maintenance time and there is still no guarantee that
the data is safe, but you must put it on because I
want to maintain an ad hoc relationship between my
tables. The DBAs would say one of the following:

• go away (but not that nice)


• build it into your application
• your nuts
• focus instead on getting the data quality
correct
• get the interface better built
• write a batch job to clean up orphaned
records

After much bickering to Oracle they would then


introduce an optimizer directive to define a
relationship between tables, and that would be it.
Maybe it could interpret the FROM clause join
statements to work this out (such a novel concept).
The query would then be able to use this to get
some new performance from queries. If this
happened, no one would introduce the foreign key
concept.

We have Foreign Keys embedded in our databases


because of a relational purists view and focus on a
model that has been superseded by object relational

Page 14 / 37
foreign keys, you think they are married to them.
Paper Titles
9. Using fuzzy logic to show why foreign keys are a
In putting together this paper I came up legacy concept
with 15 possible paper titles. Its
worthwhile looking at each one in Though this paper only touches on fuzzy logic and
greater detail as that in itself raises why we need to start seriously embracing it,
interesting issues concerning foreign especially when working with multimedia, this title
keys. tried to unite the two concepts and show that the
relational model is possibly dated.
1. Tackling the definition of what is data accuracy.
10. Moving to a realistic data model by breaking
As covered in the first discussion point, the idea is the stranglehold of foreign keys
to try and address what data integrity actual Again trying to highlight that the current thinking
means, and what it really means when looking at practices lock people into thinking that only
the physical model. relational is the way to go.

2. Exposing the inconstancies of foreign keys 11. Foreign keys - sweet innocent child or
slobbering, smelly, ogre?
A simple title, but not catchy enough to get any
serious interest in the underlying topic. This title was put together to try and get you to
think that possibly foreign keys are not as simple as
3. A skeptics view of foreign keys we think.

A nice simple title, non threatening, designed to 12. I love foreign keys, I also love eating broken
appeal to those with a skeptical nature, without glass and rotten fish
eluding to anything dangerous.
This title was put together to illicit a sympathetic
4. Ban foreign keys response from DBAs who are suffering from
constantly having to do a lot of extra work because
De Bono released a great book titled “I am right you of them.
are wrong”. The title was a deliberate attention
getter as it represented everything De Bono was 13. Moving to a better workplace by releasing the
trying to highlight as being flawed with thinking stranglehold of foreign keys
practices. Its there to provoke and annoy. DBAs like
to do that. Hence this title. This title eludes to the locked in thinking practices
which are being seen in the workplace. By targeting
5. Daring to tread on the Holy Grail of Relational - foreign keys, the aim is move the thinking to a
The Foreign Key different world view.

By elevating the Foreign Key to an unrealistic level 14. Switching from theoretical to practical by
and then debunking its use, this title shows the critiquing foreign keys
dangers of creating s straw-man argument.
This title was designed to show that the physical
6. Revealing the curtain behind the wizard of model doesn't always need foreign keys, even
foreign keys though we push hard on their usage in the logical
one.
This title was designed to show that behind the
wonderment of foreign keys was actually something 15. A sane development environment by trashing
very different. foreign keys
Another attention getting title designed to equate a
7. Are you a foreign key dinosaur healthy working environment by countering the
arguments given for why we need foreign keys.
A reference to one of the discussion points raised.
Those that refer to foreign keys and use them The final title for this paper was picked because it
blindly are living in the past. was bland and boring. Its non threatening and
implies a possibly interesting discussion could
8. Leaving the foreign key affair at the altar ensue. The title was deliberate because it was seen
that any form of threatening title might be counter
So many people react so strongly when you question productive.

Page 15 / 37
Discussion Point 7: Logical Thinking go into the fuzzy world of Multimedia, a world where
the concept of exactness doesn't hold up.
I attended a presentation at OpenWorld last year
(2008) by a well known and respected company. Let's look at an interesting quirk in the Multimedia
Everything that was said was correct. If you looked world, its a simple case. Get two photographs (which
individually at each statement made it was correct. It can be referred to as a digital image) and ask the
was all factual and proven to be logically right and question is Image A the same as Image B? The
consistent. It was well received and everyone relationalist would say they are the same if each byte
attending loved it. My only comment as I listened to matches exactly (binary thinking - do a byte by byte
it was to scribble in the margin, that it was wrong. comparison). Its technically correct but in the
My world view, my experiences and perceptions Multimedia world absolutely useless. Don't think
could see that what was said was right but also binary think fuzzy, now try this definition:
wrong. The reason why is that when you change "two images are classed as observably identical when
your world view, then the intent behind the they are perceived to be identical"
arguments raised can be seen as false.
I have to use a number of analogies, hoping one
Lets say you attended a conference on mobile will allow you to see through the looking glass. If I
communication and all the speakers and attendees can't get you to my world view, if I can't break you
had analogue phones. All the discussions, papers out of the 2D world you are in, it is impossible to
would center about analogue phones. The convince you of the underlying argument
technology discussed would be correct, the concepts concerning Foreign Key usage. Failure can be
raised would all be valid. If you had a digital phone, easily judged by the type of questions that will be
you would be sitting in these presentations and argued in response to this paper.
going "what the" ..... You would know that with a
digital phone the rules change. You can stream Lets look at the issues that happen in Multimedia.
video, sms, access your email and do so much more. Get a person in a room and get them to look at both
The discussions, papers are all technically correct but images. Ask them if they are identical. Now change 1
still wrong, because its based on a different (and bit in the image (not a byte but a single bit). The
outdated) world view. relationsist says they are different because bit to bit,
byte to byte comparison says they are not identical.
To much specializing encourages binary thinking Even a checksum would say they are different (apply
which prevents us from breaking out of the thinking an MD5 checksum to it). Now ask the person again if
patters we get stuck in, and prevent us from seeing they are the same. The answer is yes because its
a different world view and even appreciate it is there, highly unlikely that a person can differentiate a 1 bit
we have to usually be dragged into it kicking and change in the image.
screaming.
It gets even more challenging when you introduce
Another analogy. In the 2D geometric world the error correction, in which case the pixel
fundamental rule is that parallel lines never meet. It representation is identical. In fact changing a bit in a
can be mathematically proven. Extend your world JPEG image might not change the pixel
view to 3D and now we have a situation where representation. Print out two images, one is a 10Mb
parallel lines meet, and yet the still conform to the Tiff, and the other 1Mb JPG extracted from the
2D definition of what a parallel line is. Lets now original tiff. Print them out on A4 and ask a person to
replace parallel lines with Foreign Keys and using see if they are different. Two fundamentally different
them (data integrity) is the rule that parallel lines images, different sizes, different types, yet as far as
never meet. Lets now go to the 3D world, the real the person comparing them is concerned, they are
world and introduce Multimedia. identical. Because they are perceived as identical. In
Parallel lines can meet, the use of the Multimedia world, the real world, in the end
Foreign Keys changes, we can comparisons like this have to be subjective. And with
break the 2D world and yet still subjective comparison you get issues with personal
maintain the core rule. Add a views, external lighting factors and a whole range of
web based Multimedia world view other issues that can result in two different people
and the rules change yet again. stating that the images are the same or not the
When you can grasp the same. There is no right or wrong answer. Similar
implications of this analogy you begin to see why situation for training neural nets. The training
Foreign Keys are a symptom and not a cure of bad determines the outcome. So to ask the question, are
thinking. Lets look at this point further by daring to these two images identical is not the right question
to ask. If asked this it just eludes to binary thinking.

Page 16 / 37
You need to start asking different questions and
taking a completely different view when answering
them, and this means moving down the fuzzy path.

This scenario gets to the heart of data integrity and


the argument of Foreign Keys. The relational world
wants mathematical purity, the ability to prove a SQL
statement, to remove anomalies because they upset
our mathematical model. Now there is nothing wrong
with this, but everything is wrong with this
statement. We need the mathematical model to
ensure the optimizer can work well and that SQL
statements return correct results. The real world
doesn't work that way. You need to switch between
the 2D and 3D world, just like we switch from
drawing sketches on paper (2D) to making a model
(3D). Both can be used, both are right, and even
both overlap, both are very different with different
rules.

Move into the Multimedia world and you need to more flags might be raised. Banks round cents up or
abandon traditional binary logic and embrace fuzzy down or not at all. Two banks talking to each other
system and deal with neural networks. You deal with might have totals mismatching by a cent. The
modeling to produce answers to complex questions, programs factor this in, because they know they
but like chaotic systems, slight changes in the initial cannot achieve full data integrity. Does this mean
starting points can produce dramatically different banks should throw out Foreign Key's? - No. As I am
results. stating, Foreign Keys should be implemented on a
case by case and used where it make sense and
The die hard foreign key loving relationsist would say where the implication and cost of using them are
binary is just different. Lets then look at something understood.
more close to home, lets only look at relational data,
lets see some flaws in this model. What is more Specialization vs Cross pollination. Specializing
important - having the address field correct for a produces improvements within the world view,
client or having an orphaned record? Let's say a cross pollination allows us to produce radical
client has 3 records, but because of a bad program, change in direction and achieve new directions
a record is orphaned, so now we only have 2 records e.g. Einstein only achieved his theory of relativity
that we can see (lets say for argument sake the (physics) when he saw a particular geometrical
foreign key points to a non existent primary key). mathematical solution to a geometric issue (cross
What is worse situation - having an invalid customer pollination). Specializing in physics would never
address, a mistyped email address for a client or an have allowed him to achieve this.
orphaned record? The answer depends on your
business priority. There is no right answer. In an Relational databases attempted to use semantic
Internet business dependent on emailing customers, management as a way of dealing with issues of
an invalid email address means lost business. An address, but they never came close to dealing with
orphaned record, depending on where its orphaned the issue of ensuring an email address is correct
could mean a variety of things. There is a good (think how you would achieve it, and using regular
chance I can fix the Foreign Key, I might be able to expressions is not the full solution, it involves a very
use flashback to read an old value and determine complex methodology). The technology today really
where it should go, I might just delete it. Its not the doesn't deal with correctly validating an email
end of the world, but if the email is wrong a address, one which can't be overcome by robots or
customer cold be lost, business lost and money lost. crooks. This example on its own shows a flaw in
relational, a flaw so big warning signs should be
In the real world banking systems rely on data flashing warning us about the dangers
accuracy, the figure must add up, the books must
tally. But in reality they don't, so most bank systems
deal with figures not adding up 100% provided they
are accurate to the 99.9% mark. If they are out

Page 17 / 37
For more on this issue please refer to: Discussion 8: Logical and Physical Design
http://www.eternal-
donut.com/downloadp/004_intro.pdf The traditional strategy for application design has
(podcast also available, and available on iTunes). been one where a data architect comes in and based
on user requirements designs a logical or conceptual
This paper discusses the Multimedia terminology and model of the application. This then is given to the
covers Multimedia DBAs and converted to a physical design.
definitions and then
goes into greater details With improvements in technology and more due to
about how to link lack of time and implementation pressures, the
Multimedia types. physical design step is skipped and the logical model
is put straight on the database (possibly with some
See the site: storage parameters thrown in).
http://www.eternal-
donut.com/ and to learn Starting with a logical design and then moving to
more about core Multimedia concepts go to : physical intuitively seems to make sense. Its a
http://eternal-donut.blogspot.com/ and look at natural progression. The problem is that the real
historic notes. I go into great detail about Multimedia world does not work this way. In fact, I am happy to
basics from a technical view. state that starting with logical and moving to physical
or starting with logical and applying it straight to the
So is there another world view beyond database is flawed and is not the right direction to go
Multimedia? The answer is most definitely yes, and in. It encourages reactive tuning and I'll cover this
its one which challenges our thinking patterns and shortly, but first where do foreign keys fit in with
moves us in a completely different direction. The this?
concept is way to threatening to most, and just
getting people to understand Multimedia is the first The foreign keys come from the process of doing the
stage. To try and go beyond that is too challenging. logical design of the relational model. As previously
mentioned, the Oracle database is Object/Relational
and numerous other models can be used for design
that do not require foreign keys. The relational model
is dated and even though its based on sound
mathematical principles, those principles are not
relevant in the current technology environment. They
are part of the 2D world when we are in the 3D
world. They are relevant in their world, but as the
playing field has changed, they represent obsolete
thinking and concepts.

By starting with a logical model (and if we are


fortunate moving to a physical model), always means
the DBAs are put on the back foot and are forced to
react to the design given to them, and are
constrained by that design for tuning. Reactive
tuning is the mind set most DBAs are trained to
understand. Most DBAs would not be even aware
that there are better ways of doing things. They are
forced to firefight and deal with performance issues
as they appear. They are not in a position to prevent
them.

The physical model dictates the logical. The two


must be done together. The hardware the application
runs on, the limitations of the database and the
programming language, the finite capacity of disks,
memory, CPU and the network are all critical for the
design. They control it, they influence it. Maintaining
performance might involve core changes in how

Page 18 / 37
constraints are managed. As covered early with the database. This myth is exposed in :
delayed delete. The core design was changed to http://www.eternal-
ensure efficient performance with deletes. donut.com/downloadp/003_intro.pdf

There are no models at the moment that can By believing the database is just a commodity SQL
efficiently deal with a combined physical and logical engine has also resulted in a lot of middleware
model. There should be, and that model should also applications appearing. These applications split the
deal with the multitude of data types including XML, data (database – back end tier) from the application
Spatial, Objects and Multimedia. So in the meantime logic (middleware – tier).
we deal with the models we have, and because we
do, we implement architectures that are inherently In the current environment of 2009, there is
inefficient from the very beginning. justification for this architecture. It works well with
legacy systems and on paper it looks good.
So I propose the following as an intermediate step to Unfortunately unless it is designed correctly it can
resolving it. result in applications that will never be able to scale
(see Appendix).
i. All Data Architects should have active DBA
skills that are no more than six months old. One of the strengths of having a 3 tier application
This will ensure that when they design they was that it simplified the development. The data was
are aware of scalability and performance separate from the application logic. It also offered
issues and can factor it into the design. The scalability. Within 5 years the rules that formed this
data architect will also be aware of database argument will be redundant because of the changes
features that can be used and work well, and in technology. Today a server with 8Gb of Memory, 8
can be factored into the design. cores and a gigabit ethernet is a common standard
ii. All DBAs need programming experience. for hardware. It has natural limitations, so for sites
They need to be skilled in programming in all that wish to scale above this, a middleware layer
languages uses by the developers. That way looks attractive.
they are a in position to review code and
know when it is not going to be used In five years, the common server standard will be
efficiently. 128Gb of Memory and 64 cores (maybe 128). It
iii. All Developers need active DBA experience. might be at 10Gb Ethernet, but likely still at a
They need security, scalability and gigabit. So the server will have improved in
performance techniques ingrained in their performance by a factor of 8 (or greater) but the
core thinking, so when they build network connection might not have. To scale, the
applications, these three crucial elements are bottleneck will not be the hardware but the
not ignored. interaction between the application and the
iv. DBAs need to be involved in the design of database. It will make sense to store these two close
any new application from the very beginning. to each other. Which means moving the application
They need to constantly review the design logic into the database. For Oracle this means
for any performance, security and scalability building in Java or PL/SQL. With PL/SQL now being
flaws. able to be compiled to C, it will run faster than Java
and scales better.
A Data Architect that proposes that the data design
should be database independent, is committing a So within five years we will start to see the move
design flaw that will result in the worst possible away from 3 tier and back to a 2 tier model (but not
design coming about. The notion that the design has the client/server model) for most applications. It will
to be separate from the database was pushed during prove to be cheaper, easier to manage and will be
the 90's as a commercial ploy to try to even the scalable. The middleware layer will move into the
playing field across all database vendors. It pushed database and the database server will become more
the notion that SQL is SQL and all databases are now sophisticated.
identical.
The data model used will again need to change to
Nothing can be further from the truth. The idea that deal with this concept and physical design will
the database is just a SQL engine has resulted in become even more important.
very bad application designs appearing across a
number of vendors. Most have learnt the hard way
and now provide port specific solutions for each

Page 19 / 37
Discussion Point 9: Looking at the arguments performance, having the constraint enabled make it
for having Foreign keys work - I mean really, is this more an instruction or
guide for the optimizer? This is a point worth further
All rules are meant to be broken including this one. discussion.
Rules exist to serve, not enslave.
- software programmer's axiom If I preface Foreign Key with the phrase - these two
tables are linked together in a consistent manner, is
1. “Your data wont be accurate” - or also commonly that the same thing? Lets call it a linked index. What
stated as “don't you want data integrity?” I am saying is the performance for the optimizer is
not because you have a foreign key its because there
The first one is a threat, the second comment is an is a relationship between the tables. In the object
implied threat designed to intimidate. The answer world I could have a ref. Oracle replication works
they want you to say being “Off course I want data better and is easier to manage if you use a Primary
integrity, I would be mad not to want it”. And that is Key. Previously it used rowid, but Primary Keys are
the end of the argument. easier to work with, so we put a Primary Key on the
table to make it easer to manage replication, not
As have previously discussed, foreign keys do not because we are following relational to the letter. I
guarantee data integrity, but lets look at this issue put a Primary Key on my spatial and text tables, not
from a different angle. because I think relational is the best things ever, but
because these types require their usage.
So if I turn off the constraint, is my data accurate?
Could I adopt the Java model of garbage collection The Relational model doesn't require you to put
and remove orphaned records at my own leisure? Its indexes on a table (in the physical model that is a
a fuzzy concept a bit radical in thinking, So now to different story). The logical model ignores
answer the question, I can state “yes I want data performance (hence why its called logical). In other
integrity, but I don't need to be anal (slang) about words we use relational constructs such as Primary
having it consistent all the time – are you a difficult Key, Foreign Key for performance and not because
person? You don't want to be difficult do you?” they are relational and the perceived correct way to
go. This is a complete reversal in thinking about their
Foreign Keys enforce this concept and mentality that usage.
by having them, I have data integrity.

Of course we can find exceptions and cases where it


is absolutely crucial to maintain the relationship, I
can name a number of cases. Though from my
perspective I would rather want the programs to be
written correctly to ensure there were no breaks with
the relationship.

Having the Foreign Key is the simple, quick and


reactive way of enforcing the relationship. Remove
the Foreign Key and you are now forced to check
programs, ensure correct programming, maybe even
get the DBAs to review the security and performance
of the code at the same time. Again, the culture of
the organization prevents the correct methodology
from being used.

The counter argument is that we don't have time to


do this, its quicker and is simpler to put in the
Foreign Key. So then, would you rather be in an
elevator which only used an emergency break or one
where the actual up/down cables were made from Having a Primary Key/Foreign Key enabled is as good
steel and not rope. Some would just take the stairs. as having an index or optimizer hint. Its just a tool to
give the database better performance. But
2. Optimizer works better with constraints remember, the relational logical model doesn't give
me better performance. The point I am trying to
Does the definition of a constraint result in better make is when you are using Foreign Keys to

Page 20 / 37
seemingly obtain better performance, you are using issues. Having to spend a lot of time managing and
them for the wrong reason. It would be better if dealing with Foreign Keys I realize that. DBAs suffer
Oracle used a different construct to achieve this. a lot because of this strategy and have had to suffer
in silence.
Its this core change in thinking that drives home the
point I am trying to make about Foreign Keys and We have Foreign Keys because of a relational purists
our use of them. They are just a component of the point of view, and too much effort is spent focusing
logical relational model, and a component that loses on a model that has been superseded by the
its significance as we move to a model that uses object/relational and has been proven time and time
objects and multimedia. again not to work in real world situations.

Another analogy, Newton's law of universal I am sure you can come up with a scenario where it
gravitation. It works well, its been used in physics is crucial that the PK/FK relationship must exist.
and taught in schools as a base principle. So it must Great. With my DBA hat on, I will say wonderful, I
be right. Yes, it is right, if we apply it to the world we am happy for you. I can also come up with situations
are in (relational), but once we look at the general where its critical, but I don't say - and this is
universe (multimedia) the rule fails. We now have to important - that all relationships must be PK/FK. To
use Einsteins Theory of Relativity. Newtons law only me a PK/FK is a rare exception, like having a
works as an approximation, but it gives us the partition or a materialized view. Its only introduced
illusion that it is exact. when it is crucial that it be implemented. Its a tool
that is used as required and in conjunction with the
Foreign key adoption and the blinkered, unrelenting, DBAs. Its not a be all and end all, its not blind
blind adherence to the their usage is the adherence.
representative case for all this. Its why I focus on it.
People who unquestioningly stick to using foreign A worthwhile discussion point is to cover why there is
keys in the physical model reinforce the view that the the thinking that DBAs have to put in foreign keys to
IT world is full of close minded individuals unable to protect the database from the developer. This view is
perceive change. It shows how the ability to wrong on so many fronts.
question,review and change our perceptions as
technology and the rules change is an unknown “Many companies employ under-skilled and lazy
concept. It shows how locked in DBAs and developers that will always take the quick route.“ -
developers can become. Foreign Key usage anon
represents a wake up call. If I can highlight, shake
such a core concept of Foreign Keys, If I can critique When I saw this, my view was that by hiring the
it, prove its not core, its not central as its made out worst possible developers, the DBAs have to then
to be, then think, what else is in the database should spend a lot of time protecting the database from
we focus on and review? them. They are reactive and the result will usually be
a poorly written system. My view is that rather than
The answer is everything. When we step back and hiring five under skilled developed who might build
move into the more general case world of an application that will end up costing the company
Multimedia, the rules change, everything needs to be more because what they built will not scale and be
reviewed. expensive to maintain, its better to spend more and
hire one competent developer who can do it right.
3. Maintenance
This is a section of concern to DBAs and it must be This highlights a point I make a lot of times, and that
said that most Developers and Managers would like is that management are the cause of most
to turn a blind eye to it. The performance issues in the database (see Appendix
perception is that maintenance A).
isn't part of the application
management equation.
So what do we have for DBAs? A
methodology pushed by
developers and data architects
trying to take a logical model and apply it to the
physical without change, and under the illusion that
by having them, their data is safe? Any DBA knows
that safety and integrity cover a whole range of

Page 21 / 37
A Personal History had to be flexible and easy to maintain, so easy
that maintaining it can be done from anywhere in
Working in the multimedia world has given me a the world (and have done this on many occasions).
different perspective on things. I quickly learned
that Multimedia is a paradigm shift involving I have had to do compilation and integration of
tuning, database management, scoping, creating open source code across platforms and yet am
objects and user interaction. Even screen design constantly made to feel inadequate because I
and how you tackle working in a multimedia haven't yet done a complete recompile of a Linux
environment is different. I have learnt that kernel. I have learnt that the Internet and
multimedia is a threatening concept to all DBAs multimedia is still the wild west and there are no
(not most). They just don't want to know about it. right ways, just new technologies that one needs to
Developers are nearly as bad, but most treat adapt to very quickly. I have worked in a team, had
multimedia as just an external file that needs to be many ideas scrutinized and crushed, have had to
accessed. Dealing with multimedia breaks all the make many design decisions to meet sales
rules. You are taken out of your comfort zone. To objectives and be able to switch direction
make it even harder, most Oracle tools and overnight. I have written parsing engines,
features either ignore or just do not properly compilers in PL/SQL, Ajax client interfaces,
support multimedia and the object types around integrated with over ten different merchant banks,
them. It is improving, but since first introduced in linked with iTunes, Google and a host of relational
Oracle8, it has been improving at a snail pace. Its and non relational databases. I have described the
also worthwhile noting that XML and Spatial can be syntax for the circa data type and the fuzzy logic
considered to be subtypes of multimedia. required to search on it. And dealt with so many
Virtualizations that am building a new type for
When dealing with multimedia, Oracle so Virtualizations can be stored inside a
you deal with objects, abstract database.
data types and XML. I have also
dealt with RAC, heterogeneous I have worked in industries that Oracle has never
gateways, PHP and a host of heard of or even looked at. Done pioneering work
open source tools. On the in areas that are only now being seen, and I have
PL/SQL side I have pushed Mod PL/SQL to its limits written papers on topics that are so new most
and am trying to take it further. Yet go to any Oracle users don't understand or care about it. If
Conference and outside of Apex, all talk on PL/SQL you are in multimedia you are in a niche. The thing
is on DML code constructs and triggers. The focus is, when you look at the big picture, its not a niche
is not in the multimedia environment. but an industry.

Since 1995 I have been promoting Mod PL/SQL as a


viable and powerful tool for web development.
Integrate it with multimedia and you have a
platform that makes it very efficient to access the
database and deliver dynamic pages. I have had to
build ftp clients, servers, email management
routines and raw HTTP requests. I have had to learn
how to build networks, enable the application to
work over dial-up, configure routers, firewalls,
install SAN's and work with all versions of windows,
most Linux variations and Solaris. I have had to
integrate with LDAP and SSO and configure
Apache and the embedded gateway. I have built
and rebuilt 3 times an e-commerce engine. Each
time learning from the previous and trying to adapt
the rapidly changing Internet environment without
becoming locked into a model or trend that will
become outdated.

I have built a search engine and then rebuilt it


more than four times that is flexible, uses dynamic
SQL with dynamic bind variables, and can access
all object types as well as query external databases
and Google. I have created an architecture that has

Page 22 / 37
types of queries (to me that is the exception rather
than the rule).

Discussion point 10: My top ten reasons for not #6 Are they value for money? If they never existed
using foreign key constraints: and then someone invented them, would we use
them? I think not. I think it would be a hard sell to
This is an extract from a paper I wrote 2 years ago convince anyone they were needed. And then can we
addressing the issue of Foreign Keys. I have included really trust them if we do have them? Look at
it here because it provides a useful summary. constraints enabled without validation. Just by
allowing them it means we can create a constraint
#10 Relational might say we need foreign keys, but that might not be valid. It says, “I can live with
it doesn’t say when it should be enforced. If you say possible invalid data, just enforce anything new”.
immediately, then you must enforce them always
and immediately meaning: #5 The whole notion of having data accuracy is the
- You can never use constraints enabled without “golden fleece” we are all trying to chase. This is not
validation correct. When you start looking at the concept of
- Warehouse applications, bulk loading, data fuzzy logic you begin to realize that perfect data
management are all compromised accuracy is not only a myth it can be inefficient
trying to achieve it. Let’s look at the humble date
#9 It’s a hidden layer. Just like triggers, when they value.
are enabled they add a hidden layer that can make it
hard to diagnose and debug when problems occur. When we enter in a date, should we store the
Let’s say you add a foreign key to an existing seconds, the sub seconds, the milli-seconds? How
application and problems occur inside the accurate do we need it? In theory it should be as
application. Is it because the foreign key has accurate as the computer allows, because that is
returned errors to the program in an area of code it exact.
was never expecting it? Same for tuning. When
looking at a complex update statement, you have to When we store someone’s name, do we enforce it to
look at the constraints around it as well as the actual make sure it is spelt correctly? What about the
SQL to tune it. street? Can we live with typing errors in comment
fields made by users? If we want perfect data then
Triggers are one those blessings and curses. They we have to be as anal about keeping the data
can solve a lot of problems because they are accurate here, as we are about keeping foreign keys
transparent and hidden, but they can cause a lot of accurate.
problems because of this. If you are no aware they
are there, tuning can become very difficult and goes There is no difference. The foreign key concept is
into a multi-layered approach. In this case tuning one which we are locked into thinking as having to
isn’t simple it’s very complex, takes longer and is be perfect, whereas spell check errors in comment
more frustrating to resolve. fields are ones that we can live with because it is
what we are used to. My view is that spelling
#8 Don’t you trust your developers? Are they there mistakes in a comment field is more dangerous than
as a safeguard because developers can’t write an orphaned record. There is a balance that needs to
efficient code? And if developers think they are so be achieved when enforcing data accuracy and it
great, just check to see if they use them in their depends on the application. For a financial system,
development environment. I bet they are turned off data accuracy might be seen as crucial, no mistakes
because they get in the way of development. A key allowed, so we should pull out all stops to maintain
database feature disabled because it gets in the it, even at the expense of application management.
way? They mustn’t think highly of it them. If the
developers want it so much, enforce them to the #4 Maintenance. Once foreign keys are enabled
letter in all environments. Then when the screaming then the flexibility of the design of the application is
stops, review their use. weakened. Try using transportable tablespaces,
oracle streams (data guard), replication, materialized
#7 Performance problems. Inserts, updates, deletes, views and RAC. With some blood, sweat and many
checking integrity, updating indexes. All to be done tears you might get foreign keys working, but they
within the transaction. It can slow down transactions. will always be a point of failure, which will add
In some cases, I begrudgingly accept that having additional costs to the maintenance of an application,
the constraint there can help the optimizer for some and increase the length of time to do maintenance.

Page 23 / 37
And on the issue of maintenance. Lets say a DBA has Other constructs that need a good review and
to do maintenance on application. They want to paper titles:
move
data out from a table to another area. They delete 1. The use of Documentation in code
the data first (as they can’t do a truncate because of
the constraints), and unknown to them a cascade 2. Developers who think they can program
fires. They then do their maintaining and re-insert and yet cannot write a sort routine.
the data. They have just caused major data failure
and not even known it. To prevent this from 3. Debugging code techniques
happening, safeguards have to be put in place, but
all DBAs are now living in a minefield, where the 4. Reviewing code for security holes
wrong move when doing legitimate maintenance can
result in the destruction of the application. 5. Low fidelity design and building an
application that can adapt to the changing
skill set of a user.
Even moving a table between tablespaces (dropping
it, even setting up partitioning) can be hampered
6. Middleware vs two-tier architectures
by constraints.
7. Treating developers as just coders and why
#3 Those that want them are usually not the same automatic code generation tools have not
group of people as those that have to manage them. lived up to their hype.
Try doing database maintenance on a 300 table
application with foreign keys. Try reorganizing a 8. PL/SQL features needed to turbo charge it
table, maintaining changes to it, even trying to move for Internet usage
copies of data in it. You soon realize that you have to
disable the keys. In some cases disabling the keys
means dropping indexes or disabling the indexes.
Which causes problems when they are re-enabled.
Time. And do this at 3am when you only have a 2
hour window to do the work and the pressure is on.
Then let’s see how keen on foreign keys you are.

#2 Locking. Has anyone seriously looked at how


expensive it is to manage a locking strategy with
cascade deletes? And for a RAC environment, how
inefficient are multiple locks, across multiple blocks
in RAC? This is a complex and lengthy topic and I will
not go into it further.

#1 They encourage relational thinking, when the


database is object relational. I mean honestly, using
object oriented structures in the database can be
more efficient than using a traditional relational
structure. We should be developing applications that
use the best of both relational and objects (and use
them that work well.)

And that brings up another side issue, why are


people so resistant to using the object
oriented features inside the Oracle database? It’s not
a battle between relational and OO, it’s not that
one is better than the other, it’s not one is right the
other wrong, and one should not just be using
relational because you don’t understand or are not
comfortable with the OO features. Ignorance is
dangerous, especially from a performance
perspective.

Page 24 / 37
Discussion Point 11: Constraints in the application accessing the data (and there will only be
database or in the client? one) and its a Mod PL/SQL application. So now all
the arguments for having constraints enforced in the
Since constraints were first introduced this question database are weakened. Creating a friendly interface
has always been asked and is worth a discussion in is more important so I am going to built JavaScript to
its own right. Do we let the database enforce enforce key constraints.
integrity or do we let the client enforce it?
But I could equally be in an environment where there
This question has been asked over and over again is no client interface. It could be a batch processing
for the last 15 years, I know because I can recall environment. In which case the notion of constraints
discussions on it in the mid 1990s. My view point on in the client is meaningless. Or what if I am in data
the answer has changed over time to the point were warehouse? What if I am getting data from XML DB?
the only answer was the question should be changed
from an or to an and. Constraints should be The initial question raised actually implies we need to
enforced in both because each side can't trust the enforce all constraints. This would be Primary Key,
other. Foreign Key, Check constraints, as well as database
triggers and a host of other structures required to
I have since changed my view on this again in light ensure data integrity. We should also consider
of looking at the Multimedia world view. constraints imposed by Object/Relational structures
(think XML which is actually hierarchical). We need to
I now believe the underlying question isn't correct add on top of this security checks. We also need to
and we need to look at this from a completely put in checks to ensure database calls do not cripple
different angle. the database because they are badly written or
return too much data (see Appendix topic on SOA
Lets review some of the arguments. By having and Web Services).
constraints enforced in the client application we:
By thinking a little fuzzy and realizing that depending
• Minimize CPU usage for this check on the on the architecture maybe not all constraints are
server. needed, we can now review the actual question.
• Can create client friendly applications
because the client gets an instant response The question is better re framed as: If I use this
that they have made a mistake, rather than architecture (fill in the gap with your architecture of
being told on insert. choice) which if any constraints are needed to
achieve data integrity?
This is even more pronounced if you have a 200
column table, its more efficient to get the errors There are so many different architectures being used
fixed as they go in, and not on that the answer is now more complex and will be
insert/update/commit. Mainly because some errors different for each architecture chosen. This question
can compound causing errors in subsequent also lets us review the physical model in comparison
columns. to the logical and let us check which constraints
make sense to use.
By having them enforced in the database we:
• protect the data from all applications trying And remember, when you start working with
to access it. We are not relying on all Multimedia the notion of data accuracy and integrity
applications to be correct and pass down changes.
valid data.
• Oracle features rely on these constraints to The initial question is dated and can only be asked
work. when working in the relational model. Even then its
• You only need to do it once and not for still hard to consider it a valid question. When this
every application accessing it. question is asked, it highlights that the world view
for most DBAs and Developers is still caught in the
Both sides of the argument are valid and raise very limited relational world and we now need to escape
powerful points, the problem occurs with the from it.
question being asked in that it tries to apply a logical
rule to real world situations.

I might be in an environment where there is only one

Page 25 / 37
Discussion Point 12: Escaping the world view
The antagonists ensured that all the obvious issues
Edward De Bono wasn't just known for the saying were covered, that nothing was missed, the heart of
“lateral thinking”, he proposed a whole new the show focused around when there was a “house
methodology for thinking moment”. Which was a moment where some random
patterns. Unfortunately this term event, some comment allowed House to escape his
was attached to his name and it world view, see the problem anew and come up with
stuck. Too many times I hear the solution. He could escape his box. How did
people say when it comes to House do it then, what was the methodology he
problem solving and tuning, you used to allow his character to achieve lateral
need to think laterally. Problem is no one ever thinking? Its the same one which when used
explained how to think laterally. How is it achieved? effectively can also be used for debugging, tuning
How can some DBAs come up with radically different and coming up with completely new methodologies.
ideas and concepts whilst others are struggling to
work out the time of day? The solution to this covers Unfortunately its a technique that is so chaotic, so
the foreign key debate and highlights why its so hard foreign to most people that they cannot do it, and
to change your world view and think “outside the mostly achieve it by sheer accidental luck and not
box”. through a controlled thinking process.

The general thinking is that truth comes through So yes, in addition to discussing foreign keys I also
debate, or finding truth through confrontation. This raise the issue that the debating methodology is
is highlighted by looking at the legal system, you get fundamentally flawed, which in my opinion
two sides, one for, one against and let them battle it represents the thinking strategies and patterns of all
out. In the end the truth emerges. Except it doesn't, those in IT at all levels. So to understand why I
as debates do not allow us to escape the box and constantly pursue the issue of Foreign keys and why
think outside it. they are bad, is to realize that I am tackling the issue
of the thinking patterns of everyone in IT, and why
Debates are part of the thinking process, they are new strategies need to be adopted. For the last 30
not the thinking process. The analogy as De Bono years, the same issues confronted by IT with tuning,
cited, was that water is part of soup, its not soup. Its with design, with development are not being
become all to easy to land in the world of debates resolved, not being addressed, just painted over and
and confrontation to talk about and resolving issues, re branded. Things are not changing because the
because that is our comfort zone, its what we are world view has changed. What we are seeing is the
used to. De Bono also raised a 6 thinking hats the same methodologies being rehashed and even
concepts as a way of challenging people to view though computers are running faster and improving
discussions as just more than confrontations, each year, our own development philosophy and
unfortunately the perception was that wearing mentality has not changed to keep pace with this.
colored hats in meetings was a silly idea and its lost
favor. The 6 thinking hats was never about that, but My experience has shown that the presenters,
it didn't come up with strategies for how to think academics, proponents, have come up with their
laterally, it just covered how to manage ideas and powerful insights, not through good thinking habits
ensure meetings, discussions, didn't degenerate into or using a well proven methodology, but rather
debates. through reactive methods, caused when due to
issues they have had to face and address, forced
In the well know TV show House, you might be them to view the issues in a new light. The well
forgiven in thinking that if you surround yourself with known presenters have had the pleasure of working
antagonists you would end up with solving the in a large number of environments and thus given
problem. By surrounding yourself with “no” people them access to more problems and thus more
(rather than “yes” men), you are able to discover solutions. This isn't a bad thing to have happen, its
and get to the truth of the problem. If you believe just that currently it seems to be the only
this, then you will have missed the point of the show. methodology being used to come up with new
A person who is used to debating as the only method directions and resolution of problems.
for getting to the truth of a matter, might rightly
believe this strategy is a good one, but if you recall I am equally guilty, its happened to me many a time.
in the show, having a group of people around him I encounter a tuning problem, realize that feature x
only helped him resolve the facts, at no stage did is bad, or feature y solves it, then write a paper on it.
they help him come to a resolution of the problem. It produces results, and if you attend conferences,

Page 26 / 37
you will see most papers presented cover these
results. Some ways of doing this are mentioned in the
appendix, but these are not the only way. So treat
So to reiterate, this thinking strategy isn't necessarily this discussion point as a teaser, and if you are eager
bad, it's just reactive, unstructured and random in it to understand what the answer is, first focus on why
its nature, which typifies the chaotic architectures we I am so constantly headstrong in always talking
see today. The religious arguments which center about Foreign Keys even though it has consistently
around which product is better, or the rehashing of raised the ire of all those we hear it. In doing so, you
old ideas as new, shows this is constantly happening. will understand why focusing on Multimedia allows
you to escape from the current relational model
So how did House come up with his out of the box world view.
solution to problems? He would have solved more
problems, more quickly if he realized where his House was right though in pointing out that all
insights came from and used a better thinking people lie at different levels, but addressing that and
methodology to resolve them, but then the TV show how it relates to tuning is a completely different
would have been over in 5 minutes and where would paper.
the fun have been in that? Let's just say that the
solution to how to come up with out of the box ideas For more on lateral thinking methods see:
consistently and turn your environment from being http://www.scribd.com/doc/7545855/Creativity-
reactive to proactive is beyond the scope of this Lateral-Thinking
paper and worth further discussion in another forum.

Conclusion

For those that have read this paper and believe that this paper is just a debate on the relative merits of
foreign keys, then I will leave you with this conclusion.

So by all means use Foreign Keys, but don't blindly use them. Use them when it makes sense, when it solves
a tuning issue or when maintaining a relationship between tables is crucial to the application running. Just
don't use them because that is what you were taught and the relational model says you need them. And
don't think you have achieved data integrity in your application if you do use them. You never will achieve it,
all you will achieve is different grades of adherence. As you move to the Multimedia world be prepared to
adopt a new world view, a fuzzy view and a view that will challenge your core concepts.
My view is that the way to best way to tune and ensure optimal database performance is for management to
create a culture that promotes excellent Developer/DBA practices.

For those that recognize that this paper is more than just a paper on foreign keys, then I invite you to read
the appendix where you will hopefully get a better understanding of some of the actual real reasons for why
this paper was written.

If through this paper I can get you to question and rethink about a pillar of the computing world, then
hopefully I can get you to break out of flat world you are in and adopt a new thinking strategy. I also hope
that you will see Multimedia in a new light and realize that there is a whole new world out there awaiting
discovery.

This Paper with a discussion blog can be Found at : http://foreignkeys.blogspot.com/


Links to further papers can be found at : http://www.eternal-donut.com/

Page 27 / 37
Appendix A: Why donuts are the solution to the tuning problem.

If this hasn't got you thinking, then nothing will. I'll answer the “what the?” question at the end of this
discussion, but first lets see how one can come to this conclusion.

Moving to a new world view. Step back and look at tuning. Keep stepping back and lets see if we can get to
the root cause of what causes tuning issues. DBAs when they step back might end up saying - write better
SQL. Managers would say get a faster box. Developers would say - improve the design. There are many
dimensions to resolving tuning, but when you really look at it, when you take the big step back and say,
what fundamentally is the cause of all tuning issues, then in my experience, and this is shown and proven to
me time and time again, is that database performance issues are caused by management. The best way to
tune is to educate management.

This is a fundamental principle of proactive tuning (a concept I have been advocating for over 15 years, and
have written papers on). Yet it is such a foreign concept for most DBAs/Developers it is inconceivable. The
thought processes do not even allow them to contemplate this, so ingrained are they in their world view.
And yet they all know it, they all talk about it, and elude to it as being the core problem in papers and
discussions.

Management are responsible for the culture of an organization. That is their primary goal. Forget about
management being there to sign forms, be a leader and buy things. Establishing the culture is really the only
true reason why management exist. Management are not productive. They don't build anything, write code,
do design work, answer problems, work with users, debug, tune. They make decisions. Decisions to hire,
fire, buy and manipulate. In short they control the culture of an organization through the decisions they
make. In turn the culture dictates how people work, how they interact.

Let's try and highlight this by looking at some cases. All these cases are based on real world scenario's, ones
you have likely encountered. The goal is to highlight how bad practices have led to a culture promoting
inefficiency. This is similar to Foreign Key usage.

Case 1: Separate Buildings


DBAs and Developers just do not get on. A good DBA will act as an auditor and review what the developers
are doing with the aim to expose tuning, design and security flaws. A good DBA can do this in a way that
doesn't make the developer feel inadequate. Problem is, this rarely happens and conflict ensues. Given time
something nasty will happen. The management mentality is to see that its best to keep the two groups
apart. This just makes it harder to communicate and slower. Developers will avoid dealing with the DBAs
and some will hope they can sneak through code without it being reviewed. Performance/Security issues
occur, the DBAs are forced to reactively tune and the two groups are now in a conflict scenario.
Management at this point will then move the DBAs to separate floors, maybe different buildings. In some
cases different cities. The smart thinking manager might even outsource one or both groups. Management
set the culture. They don't see what is happening from a technology viewpoint they see it from a point
where two groups are fighting and we need to keep them separate. Change management is then put in
place to allow them to communicate. The culture is now set in place. Its ingrained in the psyche of everyone
in the organization. DBAs and Developers are never to speak or meet unless there is legal representation. So
many sites have gone down this path and yet this cultural setup is a leading cause of performance problems.
Why?

When DBAs and Developers can work with each other, then they can solve performance and security
problems before they occur. DBAs can review code quickly and identify security holes immediately and
ensure they are fixed before they become part of the core code. The developers, if they know a DBA is
reviewing their code and reviewing it well will more than likely attempt to write efficient SQL and use
efficient practices. If you know you are being audited but audited in a good way you are likely to do good
work. Such a concept of having the DBAs and Developers working together leads to an environment of
proactive performance and tuning.

So why doesn't this happen? As mentioned, the culture imposed by management has insisted this cannot
happen. If you are in a site where the DBAs belong to one department and the developers to another, then

Page 28 / 37
its nearly impossible to arrange for them to work together. Also, DBAs and Developers are not trained to
understand how by working together its in their best interest. They are not trained in dealing with
personalities and handling clashes when they occur. They are not training in the basics of psychology and
personality management. They are not trained in conflict management and basic communication skills.
Management tend to view all DBAs and Developers as the same. Usually just coders, some with bigger ego's
than others. See Case 5.

How many environments outsource solutions to other places because its less hassle than dealing with their
own internal structure - most. And why, because of the culture.

Case 2: Nice Testers


Testers who are nice are dangerous. Testers who are afraid of hurting the developers
feelings because they can see they have put so much time and effort into a program are not
doing their job. Management who hire nice testers and who do not setup a culture where the
testers can make honest comments without fear of retribution or fear of hurt are ensuring
that code is not properly tested. Testers should be able to comment about poor screen
design, potential inefficient application navigation issues and should be able to test outside
the scope of what they are told to look at.

Also at some point in the testing, DBAs and Developers should be able to review and see how the testers
use the application. How they do searches, how they move around the application, and check for potential
bottlenecks or tuning holes caused by a bad interface.

This culture of testers, DBAs and Developers all working together and yet being able to critique inefficient
areas of the application is a concept so foreign to most sites that the notion cannot even be raised for fear
of being laughed at (which also highlights another management cultural issue).

Case 3: The cheap solution (or the expensive solution)


When management buy a hardware or software solution because its either cheap and can just do the job, or
its very expensive and must be good then, they put in place the ground rules for how the Developers and
DBAs have to work. If the DBAs and Developers, both cannot be involved in the decision from the beginning
(and not the situation where management say please pick Option A or B) then they are putting the DBAs in

Pyschology Fuzzy Logic Binary Logic


User Interface Code SQL Security Configuration

Users Management Developers DBAs Database Hardware

The Tuning Scale


an environment where they now have to reactively tune. A good DBA, a good Developer will know where in
the hardware the slow points are. These are likely based on the architecture and model they are working in.
They should know what works and what will not work. They should be advising management on the
hardware that should be bought and giving them the choice, not the other way around.

Case 4: The next trend


How many times has this happened in your environment. Management are sold on a new technology
because its seen as the next trend or saviour to all their problems. SOA is he way to go, all screens must be
carousel, ADF will solve all our issues, Apex will solve all our issues. Java will solve all our issues. You might
hear the justification from management (or developers/DBAs influencing management) by using the adage
"everyone is using it, so should be". We should use PHP, we should use Middleware, we should use Ruby. Or

Page 29 / 37
maybe the negative situation happens. Lets not use PL/SQL because we don't want to be locked into Oracle.
Or maybe they justify a trend based on an impractical concept. Lets build our application so it works against
any database because we don't want to be locked in.

Has it also ever happened that management have adopted a new technology, usually on the basis that its
cheaper, requires less developers, is simpler, purely so they can be seen as smart in their own managers
eyes and thus hope to get promoted and move up the corporate chain? I have seen this happen on a
number of occasions and the results have always been disastrous.

Managers who establish a culture where they determine the direction, are creating an environment where
the DBAs and Developer are forced to react. The development tool chosen might not work efficiently with
the database, the methodology chosen might encourage inefficient programming habits. In the end the DBA
will likely be put in a situation where they have to spend a lot of time doing reactive tuning or having to
jump through hoops to get the application to scale (if that is possible), all because of poor management
decisions, decisions resulting because of the culture of the environment prevented the correct decision from
being made.

Case 5: Personality driven environments


Have you been in an work environment where the
Developers are in control? In this case, it is an environment where a dominant developer has been given
control. In worst case scenario's they will take the developer they like the least and make them the DBA.
This is referred to as a puppet DBA. The developer then assigns the DBA the role of doing backups and
taking their table parameters and adding storage parameters to them.

In this case the DBA is not a real DBA. The developer then has free reign to impose whatever trend, design
methodology they want. In this case what will happen is that the developers will push through their own
architectures, designs and impose their own methodologies without looking at the tuning or security
ramifications. They will also likely use their puppet DBA to sign of on all designs so when performance and
scalability is not reached they can blame the DBAs because they have a proven paper trail.

The reverse can also happen, where an obnoxious DBA, one with a persecution complex, one who cannot
handle being criticized grabs the reigns of power and then ruthlessly imposes a strict regime of terror over
the developers.

This DBA normally holds contempt for developers and views them as coders who have no idea how to write
efficient code. The DBA then imposes stringent practices and change management procedures ensuring that
requests come in at a trickle and any perceived ignorance of the developers is responded to with anger or
heavy handed tones.

The developers are then hamstrung and cannot work efficiently. The DBA makes them tune statements
which either do not require it or are run so infrequently its not worth the effort to focus tuning resources on
them.

In both these cases the issue can be seen that the Developer or DBA are at fault and what has this to do
with management? Management are responsible for the culture and they are responsible for the
personalities and how the teams work with each other. If they have just sat back and watched these
scenarios unfold, then they are guilty of incompetence through neglect. In these situations management
have been intimated by the dominant players and rather than addressing the conflict have more than likely
supported it by ensuring the dominant personalities get their way. No where have they tried to train, teach
or advise them to change their behavior. Nowhere have they tried to create a culture where the DBAs and
Developers work together and not at each other.

Case 6: User database creep


Have you been in an environment where users sneak in their own databases because its too hard to work
within their own organization? You now get a situation where data is stored in a variety of open source and
and free databases which are not backed up, secure or allow the organization to create a global view of all
data. All because the users who need to store the database cannot deal with the players in their own

Page 30 / 37
internal structure. All because the culture imposed by management prevents this from happening. Change
the culture, change the habits and turn things around so the users want to put the data in a controlled and
central data store.

DBAs and Developers keep looking at a granular focus to find the solutions to performance issues. If I
improve locks, adjust the latches the tuning issue will be solved. Its the mentality that fundamental tuning
improvements can be made by going more granular.

All that will happen is you get inversely geometric improvements. It can take 10hrs of effort to tune it from
80-90% optimal, but 100hrs to get it from 90-95%. No one is encouraged to think the other way, to look at
the bigger picture - and why not? The thinking is that its management's job - but management have no
idea. Simply they are clueless (Scott Adams really has this right). We are so used to reactive tuning its
become part of the core culture of most organizations, and as mentioned management are responsible for
the culture. So DBAs and developers should start to go the other way and tune management. More so to
encourage them to establish the correct culture so that the correct habits are put in place. The sites that I
have seen that have efficient and well tuned systems actually do this. In most cases they might not even
realize they have done it, but a good, competent manager will establish the culture and then watch as good
habits, practices and efficient tuning architecture naturally form.

Lets now look at one of the most insidious methodologies created, one that if seen highlights that the
culture is suffering, and that is change management. Which brings things back to foreign keys. If you see
them being used is just like seeing change management in an environment. Its symptomatic of a bigger
issue.

Its very rare to see a site using change management correctly. A number of times I have heard the
comment from people that they feel they invented change management, because it solved a core
communication issue in their environment. For them change management was the solution.
Why pick on change management here? All it achieves is slowing down the whole process to the slowest
point. It converts everyone into a protocol, and an inefficient one at that. Seriously, how many people think
their change management architecture can run faster? Change management has a place when used
intelligently. In most cases it highlights that the culture of an organization is so bad, that DBAs and
Developers can only communicate with each other via official communication protocols.

Now lets jump to foreign keys in this analogy. They represent a stage in an otherwise inefficient process.
They represent a failure in our tuning architecture.
Lets go back to change management. It might be seen as slow and cumbersome, so what is the solution
that most people will come up with? They will look at each step and see where improvements can be made.
Maybe introduce an automated change management system, maybe better use Service Level Agreements to
get faster throughput, maybe even let the DBAs meet the developers in an officially controlled meeting. All
you are doing is spending more time to make more efficient steps in a process that is flawed and created as
a result of poor management who did not setup a culture to encourage better communication.

“In my environment, I threaten people and say if you/we don't change our behavior I'll be forced to bring in
change management - ooh the bogeyman cometh”.

Management are usually so ignorant about culture in their environment that when one gets it right, its an
achievement worth celebrating. But so often all that is made are excuses. Yes, we could do it that way, but
my manager won't let me, or they have their own agenda.

So in the end, because of bad culture (bad decisions made), the DBAs and Developers are put on the back
foot and asked to solve issues that don't lend them well to tuning and performing well. So the environment
becomes reactive.

So what is a solution to this? This where the donuts come in. Donuts are the universal communication
device. Everyone knows about them. Most know they are bad. Most love to eat them, most feel guilty for
eating them, and nearly everyone will cut a donut in half or quarters and then eat each piece thinking that
doesn't count for calorie intake. They are equally loved and loathed and they are a great way for breaking

Page 31 / 37
down barriers. Walk into a room with donuts and everyone can be your friend. DBAs
and Developers should be using this device as a way of re-establishing
communication and contact and solving their issues in a more friendly environment,
rather than through cumbersome change management. Managers should remove all
barriers and get the groups working with each other, only then will an effective and
efficient environment be seen.

So when tuning, if you do not factor in the culture of an organization, the psychology and relationship
between the DBAs, Developers, Managers, Users and other groups. If you do not factor in the usage of
tools, the methodologies used, and insist on only focusing on the obvious performance issues and look at
only the database, then you are not tuning, you are just applying a band aid.

Can one go more general than management? Potentially one can, but that depends on the organization
type. Government employees can generalize to politically appointed ministers, private companies to the
CEO, board or even shareholders.

Page 32 / 37
Appendix B: From Bind Variables to Middleware

After sitting in a presentation run by a well known presenter, I came out of it very disappointed. Not with
the presenter, not with how it was presented, nor the technical nature of the topic. Its accuracy wasn't to be
questioned. What was disappointing was that after 15 years we haven't changed.

The topic was on why its crucial to use bind variables for performance and as a safe guard against SQL code
injection. I distinctly recall when Mod PL/SQL first came out in the 1995 time frame, the notion of URL
modification and doing SQL Code injection was raised. In fact at the time it was an obvious issue because
the support tools that Oracle provided all had this issue. The tools provided were designed to be user
friendly and fast to develop. It took some convincing but eventually the security holes in the packages were
cleaned up and the holes patched. At the time, the notion of using bind variables was sensible, it was
obvious, you just did it and PL/SQL encouraged their use.

So after 15 years, to be present in a conference where this issue is seen as so serious that it deserves a
topic in its own right makes me wonder, how did this happen, why are we still having to drill home such
obvious constructs? Is it because there is a huge turn around of programmers? Is it because developers are
not being trained properly, or most programmers just don't care?

Lets now look at why from some angles. During the “which is the best database wars” a very dangerous
notion was formed. The notion centered around the idea that databases are commodity items and all
perform the same. Programmers were taught that if you knew SQL you could program against any database.
It didn't matter how you programmed, how you wrote SQL, the database optimizer would sort it out. What
we now know is that this notion is dangerous and when adopted leads to poorly performing applications and
ones with gaping security holes in them. The idea that if the database isn't smart enough, then we should
get a faster box or a different database, has led to all sorts of problems. Its basically held back efficient
application development.

When developers started to program without understanding tuning and scalability, it was then that the PC
Mentality was ingrained in the common culture. This is when a developer thinks that if my program runs OK
on a computer it will run fine with ten users or one hundred users. The PC mentality thinks that the
database should be hidden and treated as just a data store. The PC mentality has resulted in simple
applications like a word processor still not being able to scale to handle more than 60 pages, and takes 60
seconds to start up. The PC mentality results in it taking 60 seconds to login to an internal network over a
high speed line and doing over five minutes to do a directory listing with over ten thousand files in it. The PC
mentality is a serious issue that has been reinforced by a poor culture.

Lets now try and look at this without focusing on the developers and ask could it be the tools that they use?
Do the programming languages used enforce efficient use of the database, encourage good performance
and security practices? Its well established and proven that PL/SQL has this built into it. It can take more
effort to use dynamic SQL than it is to use a cursor with bind variables (try retrieving a large set of rows
using the execute immediate statement).

But what about other tools like Java, PHP, Perl, C and VB? Do they have built in constructs that make it easy
to use bind variables? Do they encourage good access to the database? I don't believe so. I have
programmed in all these environments, and each has great strengths and capabilities but all lack any smarts
for dealing with a database. Most are database agnostic and all encourage inefficient database access. If you
are not convinced on this point, there is a large amount of information on all the features built into PL/SQL
which tightly integrate it with the database. Once you see these features you realize what features the
others programming languages are really missing.

Is this article just a propaganda tool pushing PL/SQL? It might look that way, but I would rather focus on
the core issue raised as to why we still focus on bind variables usage. Why aren't we focusing on the culture
and changing it to ensure these tools when used are used efficiently? They can be. It takes more effort, but
it is possible.

Page 33 / 37
Are bind variables and SQL code injection the only
issue? Interestingly no. 15 yrs ago the issue of
SQL code injection was seen as a trivial issue to
address. A good DBA would spot the issue when
doing a code review and get the developer to fix it.
In today's environment, this level of code review
rarely happens. Most DBAs wouldn't even know
where to look. Once again, a development culture
that is not established correctly is leading to
security and performance issues.

So are there other issues? Yes, and there are


more serious ones. In particular how do you stop
someone who is correctly authorized from
modifying the URL (or changing a POST) and
accessing data they are not allowed to? This isn't
SQL code injection but standard URL modification.

How do you stop someone from invoking a


procedure call from the URL line they are not
meant to be able to call? How do we secure Web
Service calls and ensure that all data and requests passed to the database are valid? How do we prevent
denial of service attacks or stop data flooding (when a large number of legitimate requests are done which
fill the database with useless data e.g. someone adds a million items to their shopping basket).

There are more important and more serious issues that need to be addressed, and being forced to keep
reiterating the basics is very disappointing. It shows that the development community is locked into a world
view that prevents them for seeing out. It needs to change.

As for Middleware, I will finish with the observation that most sites do not require Middleware. Scalability
can be achieved by efficient database programming and ensuring the database logic and objects reside next
to each other. Middleware only offers solutions for sites that require serious scalability or have legacy
systems. Generally sites using a Middleware layer will produce a culture that encourages inefficient
programming habits because they insist on hiding the database from the programmer (who might not even
know what the database is) and adopt the mentality that the bandwidth between the middle tier and the
database is infinite. This will not lead to a scalable application but rather the opposite. Only when used
correctly and intelligently will a 3 tier architecture scale correctly. This topic is worth further discussion which
I am sure will happen.

Oh, and yes, there are really good solutions to the issues raised above, but that is for another time.

Page 34 / 37
Appendix C: A very dangerous direction: Reviewing SOA with Web Services.

This is a such a potentially serious issue regarding performance that it deserves a


mention in its own right. Like foreign keys, the move to SOA is now progressing down
a path where it is considered a trend and the new direction.

SOA is a double edged sword. Its theoretical side is a good direction to be in. By
using Web Services it becomes possible to integrate systems in different companies and in different places
of the world. It allows a new direction where data can be shared between companies (whether that is good
is another topic) and I believe it is a healthy and good way to go.

The potential for catastrophic failure is there if the SOA architecture is misused, and misused it will be. This
is guaranteed to happen because the lessons from history say it will happen. Within five years we will be
reading papers and going to conferences where the talk is how to tune and get working SOA because it runs
so slow and doesn't scale. Consultants from major companies will be brought into sites to explain how to
develop a scalable SOA. It will be a new industry.

The reason why I highlight the potential for catastrophic failure, and not just your run of the mill failure, is
that in SOA, the DBAs are really out of the loop on the design side. They can only be brought in when it is
shown not to scale, and all they will be able to say is, “I can't fix it”. The reason is because with SOA
compared to other architectures, is that scalability and performance with SOA hinge on the network
bandwidth.

The bleak picture that I am going to paint is that the Web Services used by SOA are going to be
automatically generated by tools that have no notion of performance and assume infinite bandwidth. When
you factor in that the bandwidth on the Internet is not only not infinite but rather slow, the methodology
required to implement SOA requires a complete change in thinking. Tools that automatically generate Web
Services should be banned from an organization. Do not use them. Avoid them. Unfortunately, managers
and developer will fall for the allure of tools claiming they can generate these services quicker than it takes
to ask the DBA if they are any good. This will happen because it has happened in the past. How many times
have we heard companies claim that the tool they have allows rapid development? I recall this claim was
made with Forms 2.0, then 2.3 and a complete set of tools after that, some which don't exist any more.
When you see product managers showing how quick it is to create a one to many relationship on a screen
using point and click and then claim its the new best thing, you realize there is no hope, it will just happen.
No amount of warnings, no amount of shouting and going don't, don't, will help. The big companies will
push it and they will not care about performance (I must acknowledge a glimmer of hope that exists with
the Google philosophy where they acknowledge the performance issue and have made performance a core
part of their user usage strategy. This is seen by Chrome and the new Google O/S).

The potential for achieving abysmal performance is now assured as we move down this path. The ray of
sunshine is that the companies providing early Web Services now have realized this and are working to build
intelligent Web Services that scale. They have to, their business depends on it.

Now here is the important point. Scaling with Web Services doesn't involve getting a faster box, putting
more memory in it, or even moving to a cluster. As mentioned the bottleneck, the slow point is the network.
Even though performance on the Internet is improving, the rise in Spam and multimedia downloads is going
to grow to saturate this growth. Achieving scalability with Web Services involves a multi-pronged attack and
it involves building smart , compact Web Services. Its a very simple solution but requires hard work to
achieve it.

The current thinking strategy is that if we have a schema with tables, all I need to do is just put Web
Services over them, and that's that. The first round of automatic Web Service generating tools will do just
this. The end result will be a set of Web Services that are hard to use and understand, and which will require
a lot of calls to achieve simple requests. The next round after that of automatic generation tools might move
up a layer and allow PL/SQL or Java program call to be Web Serviced. This is still going to fail in achieving
scalability. Its also going to fail in usability and writing efficient Web Service calls. At this point we will start

Page 35 / 37
to see automatic generation tools use WSDL to talk automatically with each other. With the automatic
generation of WSDL we can be assured that the problem of performance is now going to be hidden behind
the scenes and near impossible to fix. The solution will not be to get a better, faster network, the result is a
core component failing abysmally because it cannot be accessed due to insufficient network bandwidth. With
Internet applications, you can't demand that the customer gets a faster network to use your interface. Its
naive thinking and will result in lost business.

The realization why its a problem can be explained by looking at this scenario.
Lets say a Web Service is built and because its automatically generated, there is a lot of junk surrounding it
as well as comments embedded in it as well as obtuse words. A snippet of the call might look like:

<mylargenamespace:averylongtagname>T
<mylargenamespace:moredatarepeated>value</mylargenamespace:moredatarepeated>
<mylargenamespace:moredatarepreate>value</mylargenamespace:moredatarepeated>
</mylargenamespace:averylongtagname>

What we will see happening is programming invoking Web Service calls that return a large amount of
information when only a single value was needed. In the above example, lets say all I was interested in was
the first value.

If a Web Service call is 100K in size, and one person requests it, then the amount of data to return is 100K
+ overhead. Time to retrieve grows to handle TCP/IP packet management. Additional overhead will likely be
required as packets go through firewalls and are inspected.

Lets say 100 people do this call. That is then 100K x 100. But what is more realistic is that the program will
not be doing one call, but might be doing 10 to 100 in the one page display, with each request doing a
different call. It might get more complex if there is a one to many relationship, and a call is requested for
every result in the many relationship. In Forms when this happens, the DBA might have a chance of
detecting and possible tuning this situation, but in Web Service calls its going to be really difficult. So we
now have 100k x 100 x 100, and our network has reached capacity with only 100 users. What happens
when we get to 1000 or 10,000 or more? The architecture fails and fails abysmally.

So to achieve efficient and scalable Web Services there are some core concepts we have to adopt and
remember:

1. Do not use automatic Web Service generation tools. Never use them. Ever.

2. Parsing XML does not scale linearly. A 10K Web Service could take 1ms to compile. A 100K Web Service
will not take 10 x 1ms, but will take longer and consume more memory. It might take 15ms, and a 1Mb Web
Service will take longer again. The growth is a possible logarithmic scale. This is when all the XML is parsed
up front. If you point out that the DOM parser is different to the SAX parser, keep in mind what is the total
time to parse. One parser use the CPU cycles up front, the other as you navigate, but to parse the whole
structure still takes time. The solution is to keep the Web Services as small as possible with data volume.

In the above example, if the Web Service was written as <m a=”value” b=”value”>T</m> The total size is
dramatically reduced. Add an intelligent layer to it, and it could be possible to shorten it to just <m v=”T”/>
only if the application requested a compact version.

Enable calling programs to request compact or detailed data views.

3. If there are two Web Service calls done in the one global page call, then combine both Web Services. If a
program has to make a call to retrieve data, then do a call to retrieve another set of data, it is more efficient
to get both sets of data in the one call.

This is where automatic Web Service generation tools fail. They cannot factor this in. Its just too complex to
deal with this and requires the skill of an intelligent programmer.

Page 36 / 37
4. If one value (or a small subset) is required to be returned, then that is all that should be returned. Do not
send back junk data unless the calling program requests it.

5. If the server can do smarts efficiently, then let it do it. Is it best to return data back to the program
unsorted because sorting is seen as an extravagant waste of CPU cycle or is it better to send it back sorted?
Its better for it to be returned sorted, and this is because scalability occurs at both the client and server end.
If the program has to do a call and then do its own sort program to deal with it, its just not efficient. This
moves into the area where the Web Services generated must take into consideration the performance and
usability aspect of the client calling program.

The mentality of SOA drives home the point that we are providing a service and you either live with it or
else.

6. Prevent large volumes of data from being returned unless requested. This is equivalent to the optimizer
ALL_ROWS and FIRST_ROWS. If the calling program only wants an initially set of rows then return a small
set. Don't employ the strategy of here's all the data, you deal with it. This mentality lends itself to massive
volumes of data being returned and saturating the network.

Sometimes its efficient to return a large set of rows and avoid repeated calls. In this case compact the data.

Review these rules every 12 months.

These concepts might be new to some, but are obvious to anyone programming against Web Services and
are a blessing when sites support these concepts.

Adopt these concepts and you will achieve scalability of your application on the Internet by using SOA. And
remember SOA Web Service usage is a double edged sword. You can either get great performance results
by doing the hard work, or suffer catastrophic failure by adopting the easy option and using automatic
generation tools.

Methods that can be used to create an environment that encourages efficient Web Service creation (ones
that can be used to establish a healthy culture that focuses on network efficiency) :

A. Setup the environment so all developers are forced to work over a modem speed for all Web Service calls.
Follow the rule that ten concurrent users must be able to run the application over a shared modem line and
achieve good response times. This will ensure that if inefficient calls are built, the developer will immediately
know about it and feel the pain.

B. Setup a virtual charge system for the developers. Allocate a virtual salary to the developers. For every
web service call deduct $100 from their salary. For every 1K of data beyond a 10K call, charge $10. Treat
multimedia data in a separate category. At the end of the project compensate the developers using the
virtual salary as a guide. Use a reward system. If they build a proven, functional Web Service call that is
under 1K in size then they achieve a bonus.

These two simple methods will change the culture of development and will ensure that scalable Web
Services are built. It will motivate the developers to think that the network is not an infinite resource but a
limiting one that must be respected.

-------- 00 --------

Page 37 / 37

Você também pode gostar