Você está na página 1de 77

Fundamentals of Relational

Database Design and


Database Planning
J.Trumbo
Fermilab
CSS-DSG

1
Outline
Definitions
Selecting a dbms
Selecting an application layer
Relational Design
Planning
A very few words about Replication
Space
2
Definitions
What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.

3
Definitions
Instance
A database instance, or an instance is
made up of the background processes
needed by the database software.
These processes usually include a
process monitor, session monitor, lock
monitor, etc. They will vary from
database vendor to database vendor.

4
Definitions
What is a schema?
A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT
A SCHEMA.
A database instance controls 0 or more databases.
A database contains 0 or more database application
schemas.
A database application schema is the set of database
objects that apply to a specific application. These
objects are relational in nature, and are related to each
other, within a database to serve a specific
functionality. For example payroll, purchasing,
calibration, trigger, etc. A database application schema
not a database. Usually several schemas coexist in a
database.
A database application is the code base to manipulate
and retrieve the data stored in the database
application schema.
5
Definitions Cont.
Primary Definitions
Table, a set of columns that contain data. In
the old days, a table was called a file.
Row, a set of columns from a table reflecting a
record.
Index, an object that allows for fast retrieval of
table rows. Every primary key and foreign key
should have an index for retrieval speed.
Primary key, often designated pk, is 1 or more
columns in a table that makes a record unique.

6
Definitions Cont.
Primary Definitions
Foreign key, often designated fk, is a
common column common between 2 tables
that define the relationship between those 2
tables.
Foreign keys are either mandatory or optional.
Mandatory forces a child to have a parent by
creating a not null column at the child.
Optional allows a child to exist without a
parent, allowing a nullable column at the child
table (not a common circumstance).

7
Definitions Cont.
Primary Definitions
Entity Relationship Diagram or ER
is a pictorial representation of the
application schema.

8
Er Example
MODULE REVISION
STATUS # MODULE_ID # REV_ID
# STAT_ID may have o REV_NAME
describes * MODULE_NAME
o STATUS_NAME * REV_DATE
* CREATE_DATE
* CREATE_DATE associated with
has * CREATE_USER * CREATE_DATE
* CREATE_USER ... * CREATE_USER
... o UPDATE_DATE
o UPDATE_USER
associated with
have

own part of
PARAMETER
OWNER # PAR_ID
# OWNER_ID
* PAR_NAME
* FIRST_NAME
* TEXT UNIT
* LAST_NAME # UNIT_ID
* VALUE
* PASSWORD
o UPPER_LIMIT have * UNIT_NAME
* EMAIL * CREATE_DATE
o LOWER_LIMIT describes
* USERNAME * CREATE_USER
* SRC
* CREATE_DATE * UPDATE_DATE
* DOCUMENTATION
* CREATE_USER o UPDATE_USER
o DRAWINGS
...
* CREATE_DATE
* CREATE_USER
creates
...

has

has describes
HISTORY
# HIST_ID
* DATE_CHANGED
* REASON
* CREATE_DATE
*
o
CREATE_USER
UPDATE_DATE
9
o UPDATE_USER
Definitions Cont.
Primary Definitions
Constraints are rules residing in the
databases data dictionary governing
relationships and dictating the ways
records are manipulated, what is a
legal move vs. what is an illegal
move. These are of the utmost
importance for a secure and
consistent set of data.
10
Definitions Cont.
Primary Definitions
Data Manipulation Language or
DML, sql statements that insert,
update or delete database in a
database.
Data Definition Language or DDL,
sql used to create and modify
database objects used in an
application schema.
11
Definitions Cont.
Primary Definitions
A transaction is a logical unit of work
that contains one or more SQL
statements. A transaction is an atomic
unit. The effects of all the SQL
statements in a transaction can be either
all committed (applied to the database)
or all rolled back (undone from the
database), insuring data consistency.

12
Definitions Cont.
Primary Definitions
A view is a selective presentation of
the structure of, and data in, one or
more tables (or other views). A view
is a virtual table, having predefined
columns and joins to one or more
tables, reflecting a specific facet of
information.

13
Definitions Cont.
Primary Definitions
Database triggers are PL/SQL, Java, or C
procedures that run implicitly whenever a
table or view is modified or when some user
actions or database system actions occur.
Database triggers can be used in a variety of
ways for managing your database. For
example, they can automate data generation,
audit data modifications, enforce complex
integrity constraints, and customize complex
security authorizations. Trigger methodology
differs between databases.

14
Definitions Cont.
Primary Definitions
Replication is the process of copying and
maintaining database objects, such as tables, in
multiple databases that make up a distributed
database system.
Backups are copies of the database data in a format
specific to the database. Backups are used to
recover one or more files that have been physically
damaged as the result of a disk failure. Media
recovery requires the restoration of the damaged
files from the most recent operating system backup
of a database. It is of the utmost importance to
perform regularly scheduled backups.

15
Definitions Cont.
Mission Critical Applications
An application is defined as mission critical,
imho, if
1. there are legal implications or financial loss to
the institution if the data is lost or unavailable.
2. there are safety issues if the data is lost or
unavailable.
3. no data loss can be tolerated.
4. uptime must be maximized (98%+).

16
Definitions Cont.
large or very large or a lot
Seems odd, but large is a hard definition to
determine. Vldb is an acronym for very large
databases. Its definition varies depending on
the database software one selects. Very large
normally indicates data that is reaching the
limits of capacity for the database software, or
data that needs extraordinary measures need
to be taken for operations such as backup,
recovery, storage, etc.

17
Definitions Cont.
Commercial databases do not a have a practical
limit to the size of the load. Issues will be
backup strategies for large databases.
Freeware does limit the size of the databases,
and the number of users. Documentation on
these issues vary widely from the freeware
sites to the user sites. Mysql supposedly can
support 8T and 100 users. However, you will
find arguments on the users lists that these
numbers cannot be met.

18
Selecting a DBMS

Many options, many decisions,


planning, costs, criticality.
For lots of good information, please
refer to the urls on the last slides.
Many examples of people choosing
product.

19
Selecting a DBMS
How do I Choose?
Which database product is appropriate for my
application? You must make a requirements
assessment.
Does you database need 24x7 availability?
Is your database mission critical, and no data loss
can be tolerated?
Is your database large? (backup recovery methods)
What data types do I need? (binary, large objects?)
Do I need replication? What level of replication is
required? Read only? Read/Write? Read/Write is
very expensive, so can I justify it?

20
Selecting a DBMS
How do I Choose? Cont.
If your answer to any of the above is yes, I would
strongly suggest purchasing and using a commercial
database with support. Support includes:
24x7 assistance with technical issues

Patches for bugs and security

The ability to report bugs, and get them resolved in

a timely manner.
Priority for production issues

Upgrades/new releases

Assistance with and use of proven backup/recovery


methods

21
Selecting a DBMS
The Freeware Choice
Freeware is an alternative for applications.
However, be fore warned, support for
these databases is done via email to a ad
hoc support group. The level of support
via these groups may vary over the life of
your database. Be prepared. Also expect
less functionality than any commercial
product. See http://www-
css.fnal.gov/dsg/external/freeware/
22
Selecting a DBMS
The Freeware Choice
Freeware is free.
Freeware is open source.
Freeware functionality is improving.
Freeware is good for smaller non-
mission critical applications.

23
Selecting an Application
Layer
Again, planning takes center stage. In the
end you want stability and dependability.
How many users need access?
What will the security requirements be?
Are there software licensing issues that
need consideration?
Is platform portability a requirement?
Two tier or three tier architecture?

24
Selecting an Application
Layer
Direct access to the database layer?
(probably should be avoided)
Are you replicating? How? Where? With
what?
There are no utilities that will port data
from 1 database to another (i.e., postgres
to mysql). if database portability is a
requirement, an independent code must
be written to satisfy this requirement.
25
Selecting an Application Layer
Cont.
Application maintenance issues
People availability, working with users as a team, talent, and
turnover? (historically a huge issue)
A known or common language?
Freeware? Bug fixes, patchesare they important and timely?
Documentation? Set standards, procedures, code reviews making
sure the documentation exists and is clear.
Is the application flexible enough to easily accommodate business
rule changes that mandate modifications?
The availability of an ER diagram at this stage is invaluable. We
consider it a must have.
There are no utilities to port data from 1 type of db to another.
This lack of portability means a method to move data between
databases
must be written independently.

26
Selecting an Application
Layer
Misc. application definitions
This presentation is not an application
presentation, but I will mention a few terms you
may hear.
Sql the query language for relational databases.
A must learn.
ODBC, open database connectivity. The software
that allows a database to talk to an application.
JDBC, java database connectivity.

27
Relational Design

The design of the application schema


will determine the usability and
query ability of the application. Done
incorrectly, the application and users
will suffer until someone else is
forced to rewrite it.

28
Relational Design
The Setup
The database group has a standard 3 tier
infrastructure for developing and deploying
production databases and applications. This
infrastructure provides 3 database instances,
development, integration and production.
This infrastructure is applicable to any
application schema, mission critical or not. It
is designed to insure development, testing,
feedback, signoff, and an protected
production environment.
Each of these instances contain 1 or more
applications.

29
Relational Design
The Setup
The 3 instances are used as follows:
1. Development instance. Developers
playground. Small in size compared
to production. Much of the data is
invented and input by the
developers. Usually there is not
enough disk space to ever refresh
with production data.
30
Relational Design Cont.
The Setup
2. The integration instance is used for
moving what is thought to be complete
functionality to a pre production
implementation. Power users and
developers work in concert in integration
to make sure the specs were followed.
The users should use integration as their
sign off area. Cuts from dev to int are
frequent and common to maintain the
newest releases in int for user testing.
31
Relational Design Cont.
The Setup
3. The production instance, real data. Needs
to be kept pure. NO testing allowed. Very
few logons. The optimal setup of a
production database server machine has ~3
operating system logons, root, the database
logon (ie oracle), and a monitoring tool. In a
critical 24x7 supported database,
developers, development tools, web servers,
log files, all should be kept off the
production database server.

32
Relational Design Cont.
The Setup
Lets talk about mission critical & 24x7 a bit.
1. To optimize a mission critical 24/7 database, the
database server machine should be dedicated to
running the database, nothing else.
2. All software products need maintenance and
downtime. Resist putting software products on the
db server machine so that their maintenance does
not inhibit the running of the database. Further, if
the product breaks, it could inhibit access to the
database for a long period. Example, a logging
application, monitoring users on the db goes wild,
fills all available space and halts the database. If
this logging app. were not on the dbserver machine,
the db would be unaffected by the malfunction.

33
Relational Design Cont.
The Setup
3. All database applications and database software
require modifications. Most times these
modification require down time because the
schema or data modifications need to lock entire
tables exclusively. If you are sharing your database
instance with other many other applications, and 1
of those applications needs the database for an
upgrade, all apps may have to take the down time.
Avoid this by insuring your 24/7 database
application is segregated from all other software
that is not absolutely needed. In that way you
insure any down times are specific to your cause.

34
Our 1st relational example
A cpu can
house
1 or more
databases Databases schema
on d0ora2 applications in
CPU d0ofprd1
(d0ofprd1,
(d0ora2) (sam, runs, calib)
d0ofint1)

An database can
accommodate 1 or
schema
more instances An instance may applications in
contain 1 or more
application d0ofint1
schemas
(sam, runs, calib) 35
What is a schema?
It is It is not
Tables (columns/datatypes) having The environment (servers, OS)
Constraints (not null, unique, The results of queries, I.e objects
foreign & primary keys) Application Code
Triggers
Indexes
etc.
Accounts
Privileges & Roles
Server side processes

One implements a schema by running


scripts. These scripts can be run against
multiple servers and should be archived.

36
Relational Design
Getting Started
Using your design tool, you will begin by relating objects that
will eventually become tables. All the other schema objects
will fall out of this design.
You will spend LOADS of time in your design tool, honing,
redoing, reacting to modifications, etc.
The end users and the designers need to be working almost at
the same desk for this process. If the end user is the
designer, the end user should involve additional users to
insure an unbiased and general design.
It is highly suggested that the design be kept up to date for
future documentation and maintainers.
Tables are related, most frequently in a 0 to many relationship.
Example, 1 run will result in 0 or more events. Analyzing
and defining these relationships results in an application
schema.

37
What will a good schema
design buy you?
I am afraid the 80% planning 20% implementation
rule applies. Gather requirements.
Discovery of data that needs to be gathered.

Fast query results

Limited application code maintenance

Data flexibility

Less painful turnover of application to new

maintainers.
Fewer long term maintenance issues.

38
Relational Design
Lets get started
Write a requirements document.
You will not be able to anticipate all
requirements, but a document will be a start. A
well designed schema naturally allows for
additional functionality.
Who are the users? What is their mission?
Identify objects that need to be stored/tracked.
Think about how objects relate to each other.
Do not be afraid to argue/debate the
relationships with others.

39
Relational Design
So how do you get there?
Design tools are available, however, they do not think
for you. They will give you a clue that you are doing
something stupid, but it wont stop you. It is highly
recommended you use a design tool.
A picture says 1000 words. Create ER, entity
relationship, diagrams.
Get a commitment from the developer(s) to see the
application through to implementation. We have
seen several applications redone multiple times. A
string of developers tried, left the project, and left a
mess. A new developer started from scratch
because there was no documentation or design.

40
Relational Design
How do I get there?
Adhere to the recommendations of your database
vendor for setup and architecture.
Dont be afraid to ask for help or to see other
examples.
Dont be afraid to pilfer others design work, if it is
good, if it closely fits your requirements, then use it.
Ask questions, schedule reviews with experts and
users.
Work with your hardware system administrators to
insure you have the hardware you need for the
proposed job to be done.

41
Relational Design
Common Mistakes
Mistakes we see ALL the time
Do not design your schema around your
favorite query. A relational design will
enable all queries to be speedy, not only
your favorite.
Dont design the schema around your
narrow view of the application. Get other
users involved from the start, ask for input
and review.
42
Relational Design
Common Mistakes
Create a relational structure, not a
hierarchical structure. The ER diagram
should not necessarily resemble a tree
or a circle. It is the logical building of
relationships between data.
Relationships flow between subsets of
data. The resulting ER diagrams
look is not a standard by which one
can judge the quality of the design.
43
Relational Design
Common Mistakes
Do not create 1 huge table to hold 99% of
the data. We have seen a table with 1100+
columnsunusable, unqueryable, required
an entire application rewrite, took over a
year, made 80 tables from the 1 table.
Do not create separate schemas for the
same application or functions within an
application.
Use indices and constraints, this is a
MUST!

44
Relational Design
Examples of Common Mistakes
Using timestamp as the primary key assumes
that within a second, no other record will be
inserted. Actually this was not the case, and an
insert operation failed. Use database
generated sequences as primary keys and
NON-UNIQUE index on timestamp.
A table with more than 900 columns. Such
design will cause chaining since each record is
not going to fit in one block. One record
spanning many blocks, thus chaining, hence
bad performance.

45
Relational Design
Examples of Common Mistakes
Do not let the application control a generated
sequence. Have seen locking issues, and
duplicate values issues when the application
increments the sequence. Have the database
increment/lock/constrain the sequence/primary
key. That is why the databases have sequence
mechanisms, use them.
Use indices! An Atlas table with 200,000 rows,
halted during a query. Reason? No indices.
Added a primary key index, instantaneous
query response. Indices are not wasted space!

46
Relational Design
Examples of Common Mistakes
USE DATABASE CONSTRAINTS!!!!!!
Have examples where constraints were not
used, but implemented via the api. Bugs in
the api allowed data to be deleted that should
not have been deleted, and constraints would
have prevented the error. Have also seen apis
error with cannot delete errors. They were
trying to force an invalid delete, luckily the
database constraints saved the data.

47
Entity Relationship Diagrams
1 to many
PARENT have CHILD
# PARENT_ID belong to # CHILD_ID

A have B
# A_ID # B_ID
belong to

C have D
# C_ID # D_ID
belong to

E have F
# E_ID # F_ID
belong to

48
Entity Relationship Diagrams
many to many
define H
G
# G_ID # H_ID
owned by

define G2H2 map to


G2 H2
# G2_ID map to # H2_ID
define

define
I J
# I_ID relate to # J_ID

I2 define map to
I2J2 J2
# I2_ID # J2_ID
map to define

49
Entity Relationship Diagrams
1 to 1

K define L
# K_ID # L_ID
relate to

M define
N
# M_ID # N_ID
relate to

O define
# O_ID
P
relate to # P_ID

50
Relational Design
The Good

CALIB_TYPE CALIBRATION
# CALIB_TYPE_ID # CALIBRATION_ID
* DESCRIPTION define * TSTART
o TEND
be defined by

Calibration type might have 3 Each calibration record will be


rows, drift, pedestal, & gain Defined by drift, pedestal or gain.
In addition to start and end
This is a parent table.
times.
This is a child table.

51
Relational Design
The Bad

CALIBRATION
# CALIBRATION_ID
* TSTART
o TEND

define define define

relate to relate to relate to

DRIFT_CALIB PEDESTAL_CALIB GAIN_CALIB


# DRIFT_CALIB_ID # PEDESTAL_CALIB_ID # GAIN_CALIB_ID
* TSTART * TSTART * TSTART
o TEND o TEND o TEND

You have now created 3 different children, all reporting the same information, when 1 child would
suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2.
52
Relational Design
The Ugly

CALIBRATION CALIBRATION(2) CALIBRATION(3)


# CALIBRATION_ID # CALIBRATION_ID # CALIBRATION_ID
* TSTART * TSTART * TSTART
o TEND o TEND o TEND

defines defines
defines

relate to relate to relate to

PEDESTAL_CALIB DRIFT_CALIB GAIN_CALIB


# PEDESTAL_CALIB_ID # DRIFT_CALIB_ID # GAIN_CALIB_ID
* TSTART * TSTART * TSTART
o TEND o TEND o TEND

Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables.
Extra code, extra testing, extra maintenance.

53
Relational Design
The Goodlets recap

CALIB_TYPE CALIBRATION
# CALIB_TYPE_ID # CALIBRATION_ID
* DESCRIPTION define * TSTART
o TEND
be defined by

AHHH, back to normal, or normalization as we refer to it.

54
Relational Design
What to expect from a design
tool
An entity relationship diagram
The ability to create the ddl (data
definition language) needed
The ability to project disk space usage
Ddl in a format to allow you to enter
the code into a code library (cvs), and
that will allow you to run against your
database
55
Relational Design Why bother?
Experience from RunII
TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!
Personnel consistency does not exist. Application
developers come and go regularly. The
documentation that a design product provides will
the next developer an immediate understanding of
the application in picture format.
Application sharing is enhanced when others can
look at your design and determine whether the
application is reusable in their environment. Sam
is a good example of an application that 3
experiments are now using.

56
Relational Design
Why bother? Cont.
When an application is under construction,
the ER diagram goes to every application
meeting, and quite possibly the wallet of
the application leader. It is the pictorial
answer to many issues.
Planning for disk space has been an issue,
the designer tool should assist with this
task.

57
Planning
Overall
What do I need to plan for?
People, hardware, software, obsolescence,
maintenance, emergencies.
How far out do I need to plan?
Initially 2-4 years.
How often do I need to review the plans?
Annually.
What if my plan fails or looks undoable?
Nip it in the bud, be proactive, come up
with options.
58
Planning
Overall
Disk space requirements. My experience is all the
wags, (wild guesses) fall short of what is needed.
It is hard to predict the number of rows in a table.
It would be easier if we knew the amount and
results of the science ahead of time! Remember,
10x what you think the data will take.
Hardware requirements. Experience tells us that
the database machine should serve 1 master (if it
is a large database or mission critical), the
database, nothing else. Ideally there will be root,
a database monitor user and a database user,
oracle for example. No apache, no log file areas,
no applications, etc .

59
Planning
Overall
Growth and obsolesce. Plan for 3-4 years before
needing to replace hardware. Hardware and
software become obsolete. New/upgraded software
gives addition functionality that you will want/need.
Maintenance. Do you change the oil in your car?
Plan on 1 morning per month downtime for caring for
the hardware and software. Security patches could
mandate additional stoppages. I cannot stress how
important this is. Fire walling will not protect you
from bugs and obsolescence. If the downtime is not
needed, it will not be taken. Planning maintenance
time is as important as planning to buy disks.

60
Planning
User Requirements
Will user requirements influence your
hardware & software decisions?
Do you need replication?
What architecture is your api going to
be?
How many users will be loading the
database and hardware?

61
Planning
Maintenance
Database/Operating system software
need upgrades. One always hopes
one can get on a stable version of
something and not upgrade. That is a
fallacy. Major version upgrades
provide needed and new functionality.
Bug patches and security patches
are a never ending fact of life.

62
Planning
Backup and Recovery
Backup and recovery procedures of vldb (very
large databases) are difficult at best. Vldb is
normally defined as mulitple Gig or tera byte
databases. This is probably the most
sensitive area when choosing a freeware
database.
Hardware plays a part here as well. Insure when
planning for hardware there is plan for backup
and recovery. Disk and tape may be needed.

63
Planning
Good Practices with a Hammer
Make a standards document and enforce
its use. When dbas and developers
are always on the same page, life is
easier for both. Expectations are clear
and defined. Anger and
disappointment are lessened.
System as well as database standards
need to be followed and enforced.

64
Planning
Failover

Yikes, we are down!


Everyone always wants 24x7 scheduled
uptime. Until they see the cost.
Make anyone who insists on real 100% uptime
to justify it (and pay for it?). 98-99% uptime
can be realized at a much lower cost.
Uptime requirements will influence, possibly
dictate, database choices, hardware
choices, fte requirements.

65
Planning
Failover

The cheapest method of addressing a failure is


proactive planning.
Make sure your database and database software are
backed up. Unless you are using a commercial
database with roll forward recovery, assume you will
lose all dml since your last backup if you need to
recover. This should dictate your backup schedule.
Do not forget tape backups as a catastrophic recovery
method.
Practice recovery on your integration and development
databases. Practice different scenarios, delete a
datafile, delete the entire database.

66
Replication

Replication is the process of copying


and maintaining database objects in
multiple databases that make up a
distributed database system.
Replication can improve the
performance and protect the
availability of applications because
alternate data access options exist.
67
Replication Cont.
Oracle Supports 3 types of replication READ ONLY Snapshots
(Materialized views), Advanced Replication and streams based
replication.
Streams allows ddl modifications made to the master
automatically.
Streams can be configured in uni-directional ( Single Source and
one or more than targets) or master to master where updates
can happen to any participant database.
Advanced replication also supports master to master . But
streams based replication is recommended.

READ ONLY Snapshots replication from a Sun box to a Sun & Linux
box(s) is being done in CDF. When a replica is under
maintenance there is failover to another replica. The replicas
are up and running in read only mode if the master is down for
maintenance.

68
Replication cont.
Oracle master to master replication allows for
updates on both the master and replica sides.
Master to master is a complex and a high
maintenance replication. It seems to be the 1 st
option the unwitting opt for. Both Cern and Fermi
dbas have requested firm justification before
considering this type of replication request.
Every link in the multi master would be required to
be a fully staffed, as downtime will be critical.

69
Replication cont.
1. Disk Space for Archives. If receiving site is down for extended
period of time, then source db should be tuned enough to hold the
archives logs, otherwise, one has to reinstantiate the replication.
Reasonable downtime for target depends upon archive area being
generated on source. Space, space and more space.
2. Conflict Resolution In Master to Master, conflict resolution may be
challenge. Rules should be well defined to resolve the data
conflicts.
3. Design of Data Model if Primary Keys are populated by
sequences , there is very much chance of overlapping the
sequences and will cause integrity constraints. Data Model should
be designed very carefully.
4. DB Support In Master to Master Replication, all master sites should
be in 24*7 support mode. Otherwise , sync up of data will be
challenge or one may lead to reinstantiation of replication.
Reinstantiation is not unplug and play type of situation.

70
Freeware Replication

MySQL has replication in the last stable


version (3.23.32, v4.1 is out). It is
master-slave replication using binary
log of operations on the server side.
It is possible to build star or chain
type structures.
There is a PostgreSQL replication tool.
We have not tested it yet.
71
Lost in Space
Space is the 1 area consistently under estimated
in every application I have seen. Imho,
consistently, data volume initial estimates were
undersized by a factor of 2 or 3. For example,
RunII events were estimated at 1 billion rows.
This estimate was surpassed Feb. 2004. We will
probably end up with 4-5 billion event rows.
That is a lot of disk space.
Disk hardware becomes unsupported, and
obsolete in what seems to be a blink of an eye.

72
Lost In Space cont.
N Gb 8 x N Gb Unexpected?

AllData
databases
Index useRedo
disk toRollback
store data.

Data Index Backup Replication


mirror mirror

Goodruleofthumb:
Youneed10xthedisktoholdagivenamountofdatainanRDB.
Operatein2yearcycles:
First2yearsstorageavailableonday1.
Evaluategrowthatendofyear1,beginprepofnext2yr.
73
Lost in Space, cont.

You will use as much disk space as you


purchase, and then some.
Database indices will take MINIMALLY at
least as much space as the tables,
probably considerably more.
Give WIDE lead time to purchase disk
storage. New disks are not installed
and configured over night. They
require planning, downtime and $.
74
Additional References
**WARNING some of these may be database specific.
Intro to database design

http://www.cc.gatech.edu/classes/AY2000/cs4400_s
pring/cs4400a/
Intro to Oracle tutorial

http://w2.syronex.com/jmr/edu/db/
Evolutionary Database Design http://

www.martinfowler.com/articles/evodb.html
mentions 1 dba for atlas
Sql course http://sqlcourse.com/

75
Additional References

***Highly recommended reading, db


comparatives
http://www-css.fnal.gov/dsg/external/free
ware/

db infrastructure standard, support levels,


etc. for fermi computing http://www-
css.fnal.gov/dsg/external/oracle_admin /
76
Additional References
Oracle Designer tutorial http://www-
css.fnal.gov/dsg/internal/ora_adm/index.h
tm#designer
(choose Oracle Designer tutorial or Oracle
Designer Short Cuts and Lessons Learned)

Btev specific additional information


http://www-
css.fnal.gov/dsg/external/BTeV/index.
html 77

Você também pode gostar