Você está na página 1de 7

Joining Files

Programming - General
Written by Ted Holt
Saturday, 30 June 1990 18:00

Access information from more than one file at a time with this modern database technique
by Ted Holt
If you moved into the System/38-AS/400 world after working with the System/34-36, as I did,
you may feel overwhelmed at the difference in the two environments. Many of the differences are
simply extensions of what you already know, such as using data areas. Other differences just
involve using new techniques, such as treating RPG indicators as data fields.
But the AS/400 is a database machine, built upon new concepts, so the majority of the differences
from your old environment proceed from a change in philosophy. If we consider the topic of this
article on the join file to simply be a new technique, we fail to understand not only what joining is
all about, but also what the AS/400 is all about. Joining files is not a new tool to add to our
coder's toolbox - it is part of the philosophy of relational database management systems.
What is Joining?
Joining is combining data from two or more files into one record format, based on common
fields. In other words, to the user (a program, or a human using a query tool) each record appears
to have come from one file although it really contains data from two or more files.
Joining doesn't retrieve any information that the programmer couldn't get just by reading multiple
input files. It is common to read a primary file sequentially, and then use the RPG CHAIN
operation or the COBOL random READ to get additional information from other files. Reading
multiple input files doesn't increase system overhead by adding another logical file for the system
to maintain. So why join?
To help us understand the rationale behind joining, let's review two relational database concepts
you've probably heard of before: normalization and view.
Normalization is the process of breaking up some types of records into smaller records, without
losing the ability to retrieve the data that was contained in the original record. Normalization will
eliminate data redundancies and maximize the flexibility and useability of the data. With
normalization, you'll find the number of physical files increasing - one file may become two or
three. It may seem like quite a mess, but nothing you can't clean up with views.
The term "view" refers to the fact that a user doesn't have to see the data in exactly the way it is
stored in the database. A logical file which includes only certain fields is a view. A logical file
which includes only certain records (through select and omit criteria) is also a view. Logicals
which concatenate fields into other fields are views. Views allow each user to see the data in the
way that he wants to see it.
Your users don't care if you store their data in one big file or a dozen small ones. They do care if
you don't give them all the information they need. If you give them too much information, you
may lose your job. Views let your database appear differently to different people.
You should join files for two reasons: one, to be able to normalize your database; and two, to
present data to users (both programs and humans) in the way that they wish to see it.
Types of Joins
Relational database textbooks are full of pages explaining many types of joins. For our purposes,
we need to consider only three: the inner join, the partial outer join, and the exception join. These
are distinguished by what happens when a record from the primary join file has no matching
records in the secondary files. Figures 1 and 2 contain data from two files, customer master and
sales history, which we'll use in our illustrations.
If you have joined files with Query/36, you're already familiar with the inner join. Inner joins
create resultant records for records with matching values in both files. Let's join the sales history
file to the customer master file with an inner join. The records we get are shown in 3. Notice that
order 50013 is not in the resultant file because the customer master file has no records with
customer number 106.
If you have joined files with Query/36, you're already familiar with the inner join. Inner joins
create resultant records for records with matching values in both files. Let's join the sales history
file to the customer master file with an inner join. The records we get are shown in Figure 3.
Notice that order 50013 is not in the resultant file because the customer master file has no records
with customer number 106.
If we want to see all history records, we'll have to use a partial outer join. Like the inner join, the
partial outer join retrieves records with matching values. Unlike the inner join, the records which
have no matching value in the secondary file will be joined to a null record, a nonexistent record
of default values.
4 illustrates the same join as before, but using a partial outer join instead of an inner join. Order
50013 has been joined to a nonexistent blank customer record, which has a blank customer name
and zero sales rep number. Default values for character fields are blanks and for numeric fields,
zero. To give a field a different default value, use the DFT keyword in the DDS.
Figure 4 illustrates the same join as before, but using a partial outer join instead of an inner join.
Order 50013 has been joined to a nonexistent blank customer record, which has a blank customer
name and zero sales rep number. Default values for character fields are blanks and for numeric
fields, zero. To give a field a different default value, use the DFT keyword in the DDS.
The e 5 shows the results of an exception join. Since order 50013 is the only one with no
matching customer number in the customer master file, it is the only order returned by the join.
Another way of looking at it is thus: inner + exception = partial outer. Those that join
successfully, plus those that don't, yield all records.
Join Logical Files
Join logical files are the easiest way to define joins on the S/38 and AS/400. Once you define and
compile a join logical, your programs can use it as they do any file, with a few restrictions.
Before we get into the limitations of join logicals, let's study the DDS required to join the sales
history file and the customer master file (shown in 6).
Before we get into the limitations of join logicals, let's study the DDS required to join the sales
history file and the customer master file (shown in Figure 6).
The first keyword in this file, even before the record format definition, is JDFTVAL (Join Default
Value). If this keyword is present in the DDS, the join will be a partial outer join. Otherwise, the
join is an inner join.
The JFILE keyword in the record format definition tells which files to join. The first file listed, in
this case the SLSHIST file, is the primary file. All other files are secondary.
A join specification, coded with a J in column 17, tells how the files are to be joined. One join
specification is required for every pair of files. (Joining three files would require two join specs;
four files would require three, and so on.) The JOIN keyword, which is optional if only two files
are being joined, tells which pair of files this specification is joining. The JFLD (Join Field)
keyword tells which fields to match. Here, the customer number has the same name in both files.
The fields to be included in the join logical must be listed next. If an included field is found in
more than one file, you must indicate which file's value is to be used. JREF(1) (Join Reference)
means that the value of CUSNBR is to be taken from the first file, SLSHIST.
Defining a key field lets us read it in a sorted sequence. All key fields must be from the primary
file.
You may also include select/omit specs, but if you do, you must add the DYNSLT (Dynamic
Select) keyword at the file level (before the R spec).
Join logicals have a few limitations which you need to keep in mind. One, you can't do exception
joins with join logicals. Two, the key fields must all come from the primary file. Three, join
logicals can only join physical files. (Ideally, you shouldn't have to worry about whether a file is
a physical or a logical.) Four, you can't update through a join logical file. Instead, you must
update through a physical or a non-join logical. You can get around all of these limitations by
using Open Query File (OPNQRYF), except for the last one.
Joining files with OPNQRYF
If you want to get around limitations one and three, you'll have to let OPNQRYF join the files.
OPNQRYF handles inner, partial outer, and exception joins with the JDFTVAL parameter, which
looks like its DDS counterpart, but differs in that it takes a parameter value. JDFTVAL(*NO), the
default, is for inner joins. You can get a partial outer join with JDFTVAL(*YES). If you want an
exception join, specify JDFTVAL(*ONLYDFT). If any of the files being joined is itself a join
logical, it must agree with this parameter.
To overcome the second limitation, when you need to order a join logical by a field from the
secondary file, specify the key field on an OPNQRYF command. 7 illustrates how we could sort
the CUSHIST file on sales rep number, which comes from the secondary file.
To overcome the second limitation, when you need to order a join logical by a field from the
secondary file, specify the key field on an OPNQRYF command. Figure 7 illustrates how we
could sort the CUSHIST file on sales rep number, which comes from the secondary file.
You'll also have to work with the FILE, JFLD, FORMAT, and MAPFLD parameters when
joining with OPNQRYF. List the files you are joining, primary first, in the FILE parameter. Use
the FORMAT parameter to tell OPNQRYF how the high-level language expects the data to look.
Put the name of any file here, and if the file is a multi-format file, add the name of the format to
use. (If no file has exactly the fields you need, create a physical file with no members in it.) Be
sure to code the same file in your HLL program as well. When you override the format file to the
primary join file, the program will not read the actual data, but the query.
The JFLD parameter, like the JFLD keyword in DDS, tells which fields to use to join the files. Its
syntax is different, in that one JFLD parameter is used in place of several JFLD keywords.
If the same field name is found in more than one file, use the MAPFLD parameter to indicate
from which file to retrieve the field value.
8 illustrates these keywords. WORK is a file which has all of the fields needed for the HLL
program. The actual data will come from the sales rep master file and the join logical called
CUSHIST. Since CUSHIST is a partial outer join logical, OPNQRYF must also indicate a partial
outer join with JDFTVAL(*YES). The SLSREP field is used to make the join. The MAPFLD
tells OPNQRYF to get the value of SLSREP from the sales rep master file. (If we told it to get the
value of SLSREP from the CUSHIST file, it would return a zero sales rep number for sales reps
with no sales.)
Figure 8 illustrates these keywords. WORK is a file which has all of the fields needed for the
HLL program. The actual data will come from the sales rep master file and the join logical called
CUSHIST. Since CUSHIST is a partial outer join logical, OPNQRYF must also indicate a partial
outer join with JDFTVAL(*YES). The SLSREP field is used to make the join. The MAPFLD
tells OPNQRYF to get the value of SLSREP from the sales rep master file. (If we told it to get the
value of SLSREP from the CUSHIST file, it would return a zero sales rep number for sales reps
with no sales.)
For more examples of joining with OPNQRYF, see the CL Reference manual and my articles in
the May and June issues of Midrange Computing.
Be aware that OPNQRYF does have its performance problems. It does tend to hog resources
(files, memory, and CPU cycles). OPNQRYF creates temporary access paths as needed, which
slows the job as well as the system.
Use join logicals whenever possible for jobs that run frequently. For those jobs that run
infrequently, use OPNQRYF so the system doesn't have to maintain a rarely used logical file.
Some frequently run jobs may require features not available with join logicals, so you will have
to use OPNQRYF. If you do, build supporting logicals so that OPNQRYF won't have to create
access paths every time these jobs run.
Try It, You'll Like It!
If you can code a join with DDS and OPNQRYF, you'll have no problem with joins in other
products, such as SQL/400, AS/400 Query, or ASC's SEQUEL. Actually, you'll find that learning
how to join files is easy. The hard part is making yourself do it. We RPG programmers love the
CHAIN op code, and the idea of writing programs that read one (and only one) input file is, well,
different. But that is the shape of things to come.
You'll never be convinced of the importance of views, including joins, until you see what they
can do for you, so here is an experiment you can try. Pick a user who bugs you to death with
requests for new reports. Build a join logical which includes every piece of information he needs.
Give him a query package, or write a generic RPG report program with a prompt screen and
OPNQRYF over it, to let him pick the sort sequence and the selection of records. Each time I've
used this approach, the users ended up generating more of their reports, leaving me with more
time for more important things.
So on top of learning about data queues, subsystems, and job descriptions, you've got a different
way to write applications. But you didn't pay all that money for an AS/400 so you could keep
doing everything the same way.
A QUICK REVIEW OF THE FIRST THREE NORMAL FORMS
Normalization is a set of guidelines for file design. These guidelines, known as normal forms,
help prevent redundant data in your database.
Suppose you must store some employee data for a personnel department. For each employee you
must record employee number, name, the number and name of the department in which he works,
the state in which he works, the code for the plant where he works, the numbers and names of
training courses he has completed, and the dates he completed them.
You might try to store all the data in one file, like this:
File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE,
PLANT,COURSE#1,COURSE_NAME1,COURSE_
DATE1,COURSE#2,COURSE_NAME2,COURSE_
DATE2,COURSE#3,COURSE_NAME3,COURSE_ DATE3
The underlined field is the key. (This should not be confused with access path. In relational
database terminology, the key is the group of one or more fields which distinguishes one record
from another. On the S/38 and AS/400, relational keys are implemented as access paths with the
UNIQUE keyword.)
You can already see some problems with this approach. For how many courses should you leave
room? If management changes the name of Department 4 from Personnel to Human Resources,
will all Department 4 records get updated? Where do you store the name of a new course no one
has yet completed?
Let's look at how normalization takes care of such problems.
FIRST NORMAL FORM
A file is in first normal form (1NF) if it doesn't have repeating groups. The obvious violation of
1NF in this example is the course information. We can convert this file to first normal form by
placing each completed course into a record of another file, whose key will be a composite of
employee number and course number.
In 1NF, the data in our example is stored in two files:
File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE,PLANT
File 2: #EMP#,COURSE#,COURSE_NAME,COURSE_DATE
This is better than before for two reasons. One, we can store as many or as few courses per
employee as we like. Two, to find out which employees have completed course 434, we search
one field, not three.
SECOND NORMAL FORM
Our new database design still isn't perfect. If someone enters the wrong course name in a record,
there will be more than one course name for the same course number, and the database will be
contaminated. There really is no need to store the course name for every employee who has
completed it. Because course name is functionally dependent on (can be determined from, is a
fact about) only part of a composite key (a key made up of two or more fields), this file is not in
second normal form. We need to put the course name in another file, with course number as the
key.
In 2NF, the data in our example is stored in three files:
File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE,PLANT
File 2: #EMP#,COURSE#,COURSE_DATE
File 3: COURSE#,COURSE_NAME
THIRD NORMAL FORM
The design of our employee database is improving, but still has one problem. The department
name in the first file is functionally dependent on a non-key field, department number. We need
to move department name to another file to achieve third normal form.
In 3NF, the data in our example is stored in four files:
File 1: EMP#,EMP_NAME,DEPT#,STATE,PLANT
File 2: #DEPT#,DEPT_NAME
File 3: EMP#,COURSE#,COURSE_DATE
File 4: COURSE#,COURSE_NAME
This database design allows us to store each non-key data value one time. Key values are stored
redundantly in order to join files. We can retrieve any information in the database from a single
file or from a join file of two or more files.
OTHER NORMAL FORMS
There are other normal forms, but these are the most important. If your database is consistent
with 3NF, you should be able to store and retrieve anything with no problem. And your database
probably will be in fourth or even fifth normal form without any effort on your part.
IT'S COMMON SENSE
Some people wonder why the big fuss over normalization. "It's just common sense," they argue.
Well, maybe. I've worked on systems that violated all three of these rules, and you probably have,
too. With the technology we had at the time, we often had to violate these rules.
Most of us would have probably designed a course master file and a department master file
before writing the first line of code for our fictitious personnel system. Let that encourage you. It
means that normalization is not really as new or as difficult as it sounds.
DDS Reference (SC21-9620) Data Base Guide (SC21-9659) Control Language Reference (SC21-
9778)
Joining Files
Figure 1 Customer master file CUSMAST
Figure 1. Customer master file CUSMAST Cust Cust Sales No. Name Rep 100 BAKER
41 101 JONES 85 102 SMITH 7 103 DOE 41 104 BLACK 7

Joining Files

Figure 2 Sales history file SLSHIST


Figure 2. Sales history file SLSHIST Order Cust Date Item Unit No. No. Of
Sale No. Qty Price 50011 104 90/06/04 T310 4 100.00 50012 100 90/06/04 M099 1
500.00 50013 106 90/06/04 T314 3 250.00 50014 104 90/06/05 T310 8 95.00

Joining Files

Figure 3 Inner join with sales history and customer


master
Figure 3. Inner join with sales history and customer master Order Cust Date
Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50011 104 90/06/04
T310 4 100.00 BLACK 7 50012 100 90/06/04 M099 1 500.00 BAKER 41 50014 104
90/06/05 T310 8 95.00 BLACK 7

Joining Files

Figure 4 Partial outer join, sales history and customer


mst
Figure 4. Partial outer join with sales history and customer master Order
Cust Date Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50011
104 90/06/04 T310 4 100.00 BLACK 7 50012 100 90/06/04 M099 1 500.00 BAKER 41
50013 106 90/06/04 T314 3 250.00 0 50014 104 90/06/05 T310 8 95.00 BLACK 7

Joining Files

Figure 5 Exception join with sales history and customer


mst
Figure 5. Exception join with sales history and customer master Order Cust
Date Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50013 106
90/06/04 T314 3 250.00 0

Joining Files

Figure 6 DDS for join logical file CUSHIST


Figure 6. DDS for join logical file CUSHIST A* JOIN SALES HISTORY & CUSTOMER
MASTER FILE A A JDFTVAL A R CUSHISTR JFILE(SLSHIST CUSMAST) A J JOIN(SLSHIST
CUSMAST) A JFLD(CUSNBR CUSNBR) A ORDNBR A CUSNBR JREF(1) A SLSDAT A ITMNBR A
QTY A UPRICE A CUSNAM A SLSREP A K ORDNBR

Joining Files

Figure 7 Ordering a join LF on field from secondary file


Figure 7. Ordering a join logical on a field from a secondary file OVRDBF
FILE(CUSHIST) SHARE(*YES) OPNQRYF FILE((CUSHIST)) KEYFLD((SLSREP)) CALL HLLPGM
CLOF OPNID(CUSHIST) DLTOVR FILE(CUSHISTR)

Joining Files

Figure 8 Sample join with OPNQRYF


Figure 8. Sample join with OPNQRYF OVRDBF FILE(WORK) TOFILE(REPMAST)
SHARE(*YES) OPNQRYF FILE((REPMAST) (CUSHIST)) + FORMAT(WORK) + JFLD((SLSREP
SLSREP)) + JDFTVAL(*YES) + MAPFLD(REPMAST/SLSREP) CALL HLLPGM CLOF
OPNID(REPMAST) DLTOVR FILE(WORK)

Você também pode gostar