Escolar Documentos
Profissional Documentos
Cultura Documentos
2015
question did not themselves develop. This, itself, constitutes a new fundamental
technical skill that is one of the six that you will now find in this book.
And So, And Without Further Ado: Here Are The Six
As I said in the Introduction, the following is a list of what I consider to be six
fundamental technical skills that every computer programmer needs to know.
Technical means that these are things which you need to know and which you will apply
when crafting (and troubleshooting) the computer software that you write, and/or that you
(and your team) maintain, for your client or employer. These skills are not particular to
any single size, type, or brand of computer hardware, and for the most part also are not
limited to any computer programming language or tool. These also are not social nor
organizational skills. (That would be a separate list, entirely.)
The List
1. Internal memory management and data structures.
2. Objects.
3. SQL Database Queries and Concepts.
4. Precise specification, strategy, and implementation.
5. Front-End, Back-End. User-interfaces and frameworks.
6. Pragmatic Debugging skills.
This list is not in any particular order, although I will choose to address it in the
sequence given. My treatment of each topic will also not be extremely detailed. Please
understand that I am seeking to provide you with a 30,000-foot view, and to point you in
specific directions from which you can pursue additional research on your own.
This list is also not a primer, and I emphatically do not by using this phrase intend
any negative slight to you. The topics that I will present here might well, section-bysection, require re-reading. (And, they might require clarification. Since this is an e-book,
we can do that.)
David Intersimone, the original director of development a (now, long-defunct )
Borland International, referred to this experience as a sip from the fire-hose. I find
myself unable now to acknowledge that your superficial experience with regards to the
forthcoming material might well be the same. However, as your Gentle Author, I hope
that you will not in fact expect anything less from the text that you are about to consume.
And so, with all that now said: Let us begin.
programs must, sufficiently for their own purposes (whatever they are ) construct a
foundation which is sufficient for whatever-it-is that they are supposed to do.
This means, essentially, two things:
1. Any incoming (or computed) data used by the program must be stored in such
a way that the program, during its execution, can (of course ) obtain it
again.
2. The program must be capable of handling a variable and unpredictable
quantity of data. There must be no pre-conceived limits as to just how
many copies of data might be stored (or, storable).
Ordinarily, these chores are delegated to pre-existing storage strategies which are an
intrinsic part of whatever language system is being used. There are usually two types:
those which store a value under a particular key (requiring that exact key to be provided
in order to retrieve it again), and those which store an arbitrarily-sized list of (zero or
more) values. These two are frequently used in combination, to allow zero or more
values to be referenced by any unique key: each element of the keyed data-store (such as
a hash or tree), refers to a separate list.
Let us now consider what sort of things can go wrong with these arrangements.
What sort of things can cause a program to misbehave, or fail? Here are the mostcommon culprits:
1. Stack Overrun, caused by endless recursion: As I said earlier, the
stack is the portion of memory thats used to manage subroutine calls. When
a subroutine is called, information is stored in the stack to facilitate returning
from the subroutine, and the subroutine instances local variables are also
stored there. The stack is of a limited size. Therefore, it is possible to overrun
the boundaries of the stack if too many nested subroutine calls are made.
Pragmatically, this means that subroutines are calling themselves, so-called
recursively, without ever returning from any of those calls. The effect of
this sort of bug on a program is instantly fatal, but relatively easy to debug.
2. Heap Corruption (the infamous Double Free): This always-fatal
problem is caused by corruption of the internal data structures which manage
the allocation and return of memory in the Heap. There are two routines: a
malloc() routine, which requests a block of memory of a specified size
(returning an address), and a free() routine, which releases a block of memory
at a specified address (which must have previously been obtained from
malloc() ). Programs are required to free() only addresses that they obtained
from malloc(), and to free() any particular address only once. They are also
required to constrain their memory-modifications only to the range of
addresses given, never modifying any adjacent bytes. Most modern
programming languages protect from these types of problems by managing
the low-level malloc() and free() calls themselves.
3. Heap Exhaustion (Memory Leaks): Programs are required to timely
release any storage that they are no longer using. Most modern programming
languages take care of this chore through some mechanism which detects
automatically when a particular storage block is no longer being referenced,
but so-called leaks can occur when, for instance, a series of storage-blocks
all contain references to one another, but there remain no other references
elsewhere to any of those blocks. (Since all of the blocks are still
referenced, they never get released.) Heap exhaustion can also be caused
by inefficient program design.
4. Failure to detect when a storage-allocation request could not be satisfied:
When a malloc() request cannot, for whatever reason, obtain the amount of
storage requested, it will typically return zero a special value also known
as NULL. Programs should be detect if this occurs, and respond accordingly,
but they rarely do.
5. Exhaustion of fixed-size storage arrays: Some early programming
languages do not allow storage to be dynamically allocated from the heap.
Instead, the programmer must specify a fixed size for the structure. Programs
are supposed to determine if the space within these fixed structures has been
exhausted, but they rarely do. Programming languages also usually do not
detect that a reference has been attempted which lies outside of the proscribed
boundaries of the structure. The usual consequence is stack or heap
corruption.
Memory issues are a common source of problems in software that is in the process of
being developed, but they are much less common in programs that are in production.
Two: Objects
Many of the influential books in computer science literature share a common
characteristic: they are small. Certainly one of the most important of these was entitled
Algorithms + Data Structures = Programs, by Dr. Niklaus Wirth.8
Truly, the title of this book says it all. Any computer program consists of
algorithms (the step-by-step execution of instructions that is called-for), applied to data
structures such as the ones alluded to in the preceding section of this book.
In early programming languages, these two concerns (algorithm and data) were
addressed separately, in different parts of the program. (The COBOL language was a
particular example, defining all (fixed ) data structures in the so-called DATA
DIVISION, and all algorithms in the PROCEDURE DIVISION.) This is not so much
of a problem with regard to the local variables that might be associated with a particular
procedure or function, but it is a very vexing concern with regard to the global storage
that is used by multiple procedures and functions throughout a program.
In a word, the problem is that the two things are separated: the data structures which
are manipulated by algorithms, are separate from the algorithms which manipulate the
data structures. If decisions need to be made as to which algorithm should be applied to
which data, these decisions wind up being redundantly scattered throughout the entire
program.
To address this concern, the notion of objects was invented.
An object, for our purposes, is a self-describing piece of storage, allocated from
the heap. It contains, not only space for the individual values (properties) which might
need to be stored there, but also additional descriptive data (metadata) which serves to
directly associate the object with the procedural code (methods) that are designed to
operate in conjunction with it.
Significantly, given a particular object and a request to apply a particular function
against it (a so-called method call), the computer is able to determine which function is
the correct one to call based only on the metadata contained within the object itself.
The exact mechanisms by which this determination is made are concealed from the
programmer, but they are very efficient.
The paradigm that is usually quoted is: Hey, you! Do this! Whereas, in a
conventional programming language, a specified subroutine would be called and a
reference to the data would be supplied to it as a parameter, in an object-oriented
programming language the primary reference is to the object (Hey, you!) which is then
instructed to call one of its methods (Do this!). The actual sequence of events that
subsequently takes place may vary from object to object, and from one method-call to the
next, because the decision is made literally on-the-fly.9
Since late-binding is the fundamental characteristic of any object-oriented
programming system, there are many approaches that existing languages use to obtain it.
Some languages are designed strictly from the ground up to use an object-oriented
runtime cost. But programs are subject to certain design difficulties once they have been
in service for a number of years, mostly due to the inheritance schemes aforementioned.
As long as the business requirements do not change in any way that is not perfectly
reflected in the object inheritance stratagem originally devised for the program, such
programs can have a very long service-life, indeed. However, if requirements do change
fundamentally, inheritance can become an intractable form of coupling between the
various subclasses which are derived from a common ancestor.
There are three types of joins that can be used: inner joins, which return only rows
which have identical values in both tables, and left or right outer joins, which
always return all of the rows from one of the tables or the other. (An inner join against
Orders and Customers would return Orders that are associated with Customers, while an
outer join might return [left outer-join] all Orders, with or without Customers, or
[right ] all Customers, with or without Orders.)
Since SQL queries do not specify how the database engine is to obtain the specified
results, it is very important to understand how your queries will be interpreted. It is
certainly possible to write two different queries that will produce the same results, but that
will do so in dramatically more- or less-efficient ways. Most database systems provide an
EXPLAIN command which will tell you (in rather arcane, system-specific terms) exactly
how the database engine would go about carrying out a particular query.
A very significant problem with SQL queries, in too-typical deployed applications, is
that the web server (or, whoever is issuing a particular query ) can do anything and
everything. Every SQL server has some kind of permissions-system which specifies
exactly what any user is and is not permitted to do. If some web-site hacker is, by
whatever means, able to persuade your web-server to issue the DROP TABLE (or even
the DROP DATABASE(!!)) command, and your web-server is authorized to issue such a
command, then (at least a very-significant part of) your database just disappeared.
Nuff said.
When you deal with SQL databases, you must also deal with the issue of
concurrency. On a typical database server, hundreds of queries might be executing at the
same time, and these queries may or may not be specifically concerned with what the
other queries are doing. (For instance, if a user is merely browsing a product catalog,
its essentially a certainty that the catalogs contents wont be changing at the time.
Accounting data, however, is a different matter.) SQL database systems have a specific
strategy for dealing with this issue: transactions.10
A transaction is defined as a single unit of work it could be a set of
modifications, deletions, and/or updates, or it could simply be a set of queries which is
considered to be atomic. That is to say, a single, indivisible group. In the case of
modifications, either the entire set of modifications happens, or, none of them do. In
any case, a transaction has some specified degree of isolation from every other transaction
that is occurring at the same time. For example, an accounting report might need to secure
a snapshot view of a very-busy database as it existed at a particular instant in time.
the not-so visible supporting software infrastructure which must be built under the surface.
It is also very easy to fall in love with what youve done, only to discover that a
different approach or presentation might work better. Once again, these discoveries are
often made at inopportune moments, and require sometimes deep-seated and far-reaching
changes to the system, which quickly de-stabilize it.
Debugging, unabashedly, is detective work. But these two techniques that I have
now discussed will greatly improve the effectiveness of this process. By making the
program suspicious of its own behavior, you improve the odds that defects will be
discovered and corrected early. By building a chronology of what happened recently,
you make it easier to discover the internal-state of the system which enabled the defective
behavior to occur.
All production systems should also be accompanied by a comprehensive test suite,
the purpose of which is to exercise and re-exercise components of the system at various
levels. The test-suite is run and re-run constantly, and both successful and unsuccessful
outcomes are logged. If a source-code change is introduced (or is about to be introduced)
which causes a test-case to fail well, forewarned is forearmed.
In Closing:
Well then, there you have it. My six. And, along the rambling way, my pragmatic
recommendations. Most if not all of the topics that I have quickly described in this little
book will call for further exploration on your part, and I hope that I have succeeded in
setting the stage of understanding for you to do so.
Computer programming has changed enormously over the past sixty years and
counting, but in many ways it has changed not-at-all. Were still writing instructions for
electronic machines to (unthinkingly) carry out, and the process requires a lot of human
thought. Capturing the big picture, in spite of the myriad details, is something that can
easily become lost in the shuffle. I hope that these words have helped you in some small
way, and welcome your comments, reviews and feedback.
Vince North
1 The observation, attributed to Dr. Gordon Moore, that semiconductors would double in speed and density every two
year.
2 A term most-likely originally coined by Charlotte Bront in her book, Jane Eyre: Gentle Reader, may you never feel
what I then felt! May your eyes never shed such stormy, scalding, heart-wrung tears as poured from mine Oh yes,
Ms. Bront had quite the way with words.
3 Wikipedia defines craft as: a pastime or a profession that requires particular skills and knowledge of skilled work.
5 All right, all right. I have no pragmatic choice, at this point, but to impose an important technical term: process,
where previously I said, executing program. On almost any computer today, you can run more-than-one copy of the
same program at the same time, just as easily as you can run different programs. Operating systems routinely call
each distinct instance of <any program>, running here a process.
6 The operating system is the foundational layer of software which governs the operation of the entire computer
system. Unix, Linux, OS/X, Z-OS, etc. are all examples of this. These create and manage the operating
environment under which all processes ultimately operate, and define and implement the entire world that is available to
them.
7 A thread is an independent thread of execution running within the auspices of a single process. For our purposes
8 ISBN: 978-0-13-022418-7.
10 Not every SQL database system supports transactions. Most do, but some have strings attached. For instance, the
ever-popular MySQL database only supports transactions if the InnoDB physical file-format is used.
11 North, Vincent P., Managing the Mechanism: Why Software Projects Arent Like Any Other Project Youve Ever Tried