Você está na página 1de 75

It may seem obvious, but one of the things I need to cover at my technology-training

workshops is the basic what, why, and how of keyboarding. Without the basic ability to
type quickly and accurately, getting your ideas and data into a computer can take a lot of
time and can be frustrating. Who really wants to use the hunt-and-peck method of
inputting data for the rest of their lives?

Sure, someday we may have foolproof voice-recognition software, which will eliminate
the need for typing, but it's not readily available today. So, to use a computer with ease,
being able to type is still an important skill. Once students learn to keyboard and learn
basic word processing skills, the integration of the computer into all disciplines is much
easier.

Technology skills outlined in the No Teacher Left Behind Act require that students be
technology literate by the end of the eighth grade. Expectations are that students create
reports on a word processor, use a spreadsheet for calculations, and use a presentation
tool for demonstrating new knowledge. However, many students have never been taught
the basics and continue to use the computer as if it were a typewriter.

Keyboarding should be taught in the early grades -- before students acquire bad habits.
Free typing programs can be found on the Internet, and software packages can be
purchased. The tried-and-true teacher-taught method -- the method by which most of us
learned to keyboard -- is one way to ensure students learn to correctly input data.

While students are learning to keyboard, other basic skills can be taught, such as

• use of a mouse (click, double-click, left click, right/control click, click and drag)
• opening a new document.
• saving a document (proper naming and location for saving).
• standard fonts, such as Times New Roman, Arial, Georgia, Comic Sans.
• appropriate size of font for print and presentations.
• one space after all punctuation, including periods.
• alignment (left, center, right).
• printing.
• closing a document and an application.

As students become comfortable with these basics, other skills can be taught. Many skills
can be incrementally learned in the third and fourth grades. The left and right margins in
Microsoft Word by default are unusually wide; therefore, students should be taught to
change the margins (and even reset the default, if desired).

Another underused function of the computer is the setting of tabs. To get from one place
to another place on a page, many times students will consecutively press the space bar or
the preset tab. Because the typewriter had only one kind of tab, the different kinds of tabs
on a computer (left, right, center, decimal) are little known. Students must be given
examples of when each of these tabs are used, such as

• left tab: indentation of a paragraph.


• Center tab: in headers/footers and certain kinds of poetry.
• right tab: in headers/footers and to place the name, date at top of paper.

The proper use of font styles are also important. For example, underlining on a computer
is discouraged because the underline token breaks a font descender (for example, the
word young). The bold style is more commonly used for headings. The italics style, not
the underline, is used to denote book titles and the like.

Once a student has learned to click and drag the mouse, the commands to copy, cut, and
paste, as well as the use of the delete (and backspace) keys, can be taught. Other useful
skills include, but are not limited to,

• undo and redo typing.


• bullets and numbering.
• headers and footers, including page numbering.
• tables.

Other word processing skills, such as columns, breaks, sections, borders, and word count,
can be taught in middle school.

Read another post of mine, which answers many of your questions and gives links to free
resources on keyboarding and word processing skills.

• Patsy Lanclos's Blog

word processor (more formally known as document preparation system) is a computer


application used for the production (including composition, editing, formatting, and
possibly printing) of any sort of printable material.

Word processor may also refer to a type of stand-alone office machine, popular in the
1970s and 1980s, combining the keyboard text-entry and printing functions of an electric
typewriter with a dedicated processor (like a computer processor) for the editing of text.
Although features and design varied between manufacturers and models, with new
features added as technology advanced, word processors for several years usually
featured a monochrome display and the ability to save documents on memory cards or
diskettes. Later models introduced innovations such as spell-checking programs,
increased formatting options, and dot-matrix printing. As the more versatile combination
of a personal computer and separate printer became commonplace, most business-
machine companies stopped manufacturing the word processor as a stand-alone office
machine. As of 2009 there were only two U.S. companies, Classic and AlphaSmart,
which still made stand-alone word processors.[1] Many older machines, however, remain
in use.

Word processors are descended from early text formatting tools (sometimes called text
justification tools, from their only real capability). Word processing was one of the
earliest applications for the personal computer in office productivity.

Although early word processors used tag-based markup for document formatting, most
modern word processors take advantage of a graphical user interface providing some
form of What You See Is What You Get editing. Most are powerful systems consisting of
one or more programs that can produce any arbitrary combination of images, graphics
and text, the latter handled with type-setting capability.

Microsoft Word is the most widely used word processing software. Microsoft estimates
that over 500,000,000 people use the Microsoft Office suite,[2] which includes Word.
Many other word processing applications exist, including WordPerfect (which dominated
the market from the mid-1980s to early-1990s on computers running Microsoft's MS-
DOS operating system) and open source applications OpenOffice.org Writer, AbiWord,
KWord, and LyX. Web-based word processors, such as Google Docs, are a relatively
new category.

Contents
[hide]

• 1 Characteristics
• 2 Document statistics
• 3 Typical usage
o 3.1 Business
o 3.2 Education
o 3.3 Home
• 4 History
• 5 See also
• 6 References

• 7 External links

[edit] Characteristics
Word processing typically implies the presence of text manipulation functions that extend
beyond a basic ability to enter and change text, such as automatic generation of:
• batch mailings using a form letter template and an address database (also called
mail merging);
• indices of keywords and their page numbers;
• tables of contents with section titles and their page numbers;
• tables of figures with caption titles and their page numbers;
• cross-referencing with section or page numbers;
• footnote numbering;
• new versions of a document using variables (e.g. model numbers, product names,
etc.)

Other word processing functions include "spell checking" (actually checks against
wordlists), "grammar checking" (checks for what seem to be simple grammar errors), and
a "thesaurus" function (finds words with similar or opposite meanings). Other common
features include collaborative editing, comments and annotations, support for images and
diagrams and internal cross-referencing.

Word processors can be distinguished from several other, related forms of software:

Text editors (modern examples of which include Notepad, BBEdit, Kate, Gedit), were the
precursors of word processors. While offering facilities for composing and editing text,
they do not format documents. This can be done by batch document processing systems,
starting with TJ-2 and RUNOFF and still available in such systems as LaTeX (as well as
programs that implement the paged-media extensions to HTML and CSS). Text editors
are now used mainly by programmers, website designers, computer system
administrators, and, in the case of LaTeX by mathematicians and scientists (for complex
formulas and for citations in rare languages). They are also useful when fast startup
times, small file sizes, editing speed and simplicity of operation are preferred over
formatting.

Later desktop publishing programs were specifically designed to allow elaborate layout
for publication, but often offered only limited support for editing. Typically, desktop
publishing programs allowed users to import text that was written using a text editor or
word processor.

Almost all word processors enable users to employ styles, which are used to automate
consistent formatting of text body, titles, subtitles, highlighted text, and so on.

Styles greatly simplify managing the formatting of large documents, since changing a
style automatically changes all text that the style has been applied to. Even in shorter
documents styles can save a lot of time while formatting. However, most help files refer
to styles as an 'advanced feature' of the word processor, which often discourages users
from using styles regularly.

[edit] Document statistics


Most current word processors can calculate various statistics pertaining to a document.
These usually include:

• Character count, word count, sentence count, line count, paragraph count, page
count.
• Word, sentence and paragraph length.
• Editing time.

Errors are common; for instance, a dash surrounded by spaces — like either of these —
may be counted as a word.

[edit] Typical usage


Word processors have a variety of uses and applications within the business world, home,
and education.

[edit] Business

Within the business world, word processors are extremely useful tools. Typical uses
include:

• legal copies
• letters and letterhead
• memos
• reference documents

Businesses tend to have their own format and style for any of these. Thus, versatile word
processors with layout editing and similar capabilities find widespread use in most
businesses.

[edit] Education

Many schools have begun to teach typing and word processing to their students, starting
as early as elementary school. Typically these skills are developed throughout secondary
school in preparation for the business world. Undergraduate students typically spend
many hours writing essays. Graduate and doctoral students continue this trend, as well as
creating works for research and publication.

[edit] Home

While many homes have word processors on their computers, word processing in the
home tends to be educational, planning or business related, dealing with assignments or
work being completed at home, or occasionally recreational, e.g. writing short stories.
Some use word processors for letter writing, résumé creation, and card creation.
However, many of these home publishing processes have been taken over by desktop
publishing programs specifically oriented toward home use. which are better suited to
these types of documents.

[edit] History

Toshiba JW-10, the first word processor for the Japanese language (1971-1978 IEEE
milestones)

Examples of standalone word processor typefaces c. 1980-1981

Brother WP-1400D editing electronic typewriter (1994)


The term word processing was invented by IBM in the late 1960s. By 1971 it was
recognized by the New York Times as a "buzz word".[3] A 1974 Times article referred to
"the brave new world of Word Processing or W/P. That's International Business
Machines talk... I.B.M. introduced W/P about five years ago for its Magnetic Tape
Selectric Typewriter and other electronic razzle-dazzle."[4]

IBM defined the term in a broad and vague way as "the combination of people,
procedures, and equipment which transforms ideas into printed communications," and
originally used it to include dictating machines and ordinary, manually-operated Selectric
typewriters.[5] By the early seventies, however, the term was generally understood to
mean semiautomated typewriters affording at least some form of electronic editing and
correction, and the ability to produce perfect "originals." Thus, the Times headlined a
1974 Xerox product as a "speedier electronic typewriter", but went on to describe the
product, which had no screen[6], as "a word processor rather than strictly a typewriter, in
that it stores copy on magnetic tape or magnetic cards for retyping, corrections, and
subsequent printout."[7]

Electromechanical paper-tape-based equipment such as the Friden Flexowriter had long


been available; the Flexowriter allowed for operations such as repetitive typing of form
letters (with a pause for the operator to manually type in the variable information)[8], and
when equipped with an auxiliary reader, could perform an early version of "mail merge".
Circa 1970 it began to be feasible to apply electronic computers to office automation
tasks. IBM's Mag Tape Selectric Typewriter (MTST) and later Mag Card Selectric
(MCST) were early devices of this kind, which allowed editing, simple revision, and
repetitive typing, with a one-line display for editing single lines.[9]

The New York Times, reporting on a 1971 business equipment trade show, said

The "buzz word" for this year's show was "word processing," or the use of
electronic equipment, such as typewriters; procedures and trained personnel to
maximize office efficiency. At the IBM exhibition a girl [sic] typed on an
electronic typewriter. The copy was received on a magnetic tape cassette which
accepted corrections, deletions, and additions and then produced a perfect letter
for the boss's signature....[3]

In 1971, a third of all working women in the United States were secretaries, and they
could see that word processing would have an impact on their careers. Some
manufacturers, according to a Times article, urged that "the concept of 'word processing'
could be the answer to Women's Lib advocates' prayers. Word processing will replace the
'traditional' secretary and give women new administrative roles in business and
industry."[3]

The 1970s word processing concept did not refer merely to equipment, but, explicitly, to
the use of equipment for "breaking down secretarial labor into distinct components, with
some staff members handling typing exclusively while others supply administrative
support. A typical operation would leave most executives without private secretaries.
Instead one secretary would perform various administrative tasks for three or more
secretaries."[10] A 1971 article said that "Some [secretaries] see W/P as a career ladder
into management; others see it as a dead-end into the automated ghetto; others predict it
will lead straight to the picket line." The National Secretaries Association, which defined
secretaries as people who "can assume responsibility without direct supervision," feared
that W/P would transform secretaries into "space-age typing pools." The article
considered only the organizational changes resulting from secretaries operating word
processors rather than typewriters; the possibility that word processors might result in
managers creating documents without the intervention of secretaries was not considered
—not surprising in an era when few but secretaries possessed keyboarding skills.[4]

In the early 1970s, computer scientist Harold Koplow was hired by Wang Laboratories to
program calculators. One of his programs permitted a Wang calculator to interface with
an IBM Selectric typewriter, which was at the time used to calculate and print the
paperwork for auto sales.

In 1974, Koplow's interface program was developed into the Wang 1200 Word
Processor, an IBM Selectric-based text-storage device. The operator of this machine
typed text on a conventional IBM Selectric; when the Return key was pressed, the line of
text was stored on a cassette tape. One cassette held roughly 20 pages of text, and could
be "played back" (i.e., the text retrieved) by printing the contents on continuous-form
paper in the 1200 typewriter's "print" mode. The stored text could also be edited, using
keys on a simple, six-key array. Basic editing functions included Insert, Delete, Skip
(character, line), and so on.

The labor and cost savings of this device were immediate, and remarkable: pages of text
no longer had to be retyped to correct simple errors, and projects could be worked on,
stored, and then retrieved for use later on. The rudimentary Wang 1200 machine was the
precursor of the Wang Office Information System (OIS), introduced in 1976, whose
CRT-based system was a major breakthrough in word processing technology. It displayed
text on a CRT screen, and incorporated virtually every fundamental characteristic of
word processors as we know them today. It was a true office machine, affordable by
organizations such as medium-sized law firms, and easily learned and operated by
secretarial staff.

The Wang was not the first CRT-based machine nor were all of its innovations unique to
Wang. In the early 1970s Linolex, Lexitron and Vydec introduced pioneering word-
processing systems with CRT display editing. A Canadian electronics company,
Automatic Electronic Systems, had introduced a product with similarities to Wang's
product in 1973, but went into bankruptcy a year later. In 1976, refinanced by the Canada
Development Corporation, it returned to operation as AES Data, and went on to
successfully market its brand of word processors worldwide until its demise in the mid-
1980s. Its first office product, the AES-90[11], combined for the first time a CRT-screen, a
floppy-disk and a microprocessor,[citation needed] that is, the very same winning combination
that would be used by IBM for its PC seven years later.[citation needed] The AES-90 software
was able to handle French and English typing from the start, displaying and printing the
texts side-by-side, a Canadian government requirement. The first eight units were
delivered to the office of the then Prime Minister, Pierre Elliot Trudeau, in February
1974.[citation needed] Despite these predecessors, Wang's product was a standout, and by 1978
it had sold more of these systems than any other vendor.[12]

The phrase "word processor" rapidly came to refer to CRT-based machines similar to
Wang's. Numerous machines of this kind emerged, typically marketed by traditional
office-equipment companies such as IBM, Lanier (marketing AES Data machines, re-
badged), CPT, and NBI.[13] All were specialized, dedicated, proprietary systems, with
prices in the $10,000 ballpark. Cheap general-purpose computers were still the domain of
hobbyists.

Some of the earliest CRT-based machines used cassette tapes for removable-memory
storage until floppy diskettes became available for this purpose - first the 8-inch floppy,
then the 5-1/4-inch (drives by Shugart Associates and diskettes by Dysan).

Printing of documents was initially accomplished using IBM Selectric typewriters


modified for ASCII-character input. These were later replaced by application-specific
daisy wheel printers (Diablo, which became a Xerox company, and Qume -- both now
defunct.) For quicker "draft" printing, dot-matrix line printers were optional alternatives
with some word processors.

With the rise of personal computers, and in particular the IBM PC and PC compatibles,
software-based word processors running on general-purpose commodity hardware
gradually displaced dedicated word processors, and the term came to refer to software
rather than hardware. Some programs were modeled after particular dedicated WP
hardware. MultiMate, for example, was written for an insurance company that had
hundreds of typists using Wang systems, and spread from there to other Wang customers.
To adapt to the smaller PC keyboard, MultiMate used stick-on labels and a large plastic
clip-on template to remind users of its dozens of Wang-like functions, using the shift, alt
and ctrl keys with the 10 IBM function keys and many of the alphabet keys.

Other early word-processing software required users to memorize semi-mnemonic key


combinations rather than pressing keys labelled "copy" or "bold." (In fact, many early
PCs lacked cursor keys; WordStar famously used the E-S-D-X-centered "diamond" for
cursor navigation, and modern vi-like editors encourage use of hjkl for navigation.)
However, the price differences between dedicated word processors and general-purpose
PCs, and the value added to the latter by software such as VisiCalc, were so compelling
that personal computers and word processing software soon became serious competition
for the dedicated machines. Word Perfect, XyWrite, Microsoft Word and dozens of other
word processing software brands competed in the 1980s. Development of higher-
resolution monitors allowed them to provide limited WYSIWYG - What You See Is
What You Get, to the extent that typographical features like bold and italics, indentation,
justification and margins were approximated on screen.
The mid-to-late 1980s saw the spread of laser printers, a "typographic" approach to word
processing, and of true WYSIWYG bitmap displays with multiple fonts (pioneered by the
Xerox Alto computer and Bravo word processing program), PostScript, and graphical
user interfaces (another Xerox PARC innovation, with the Gypsy word processor which
was commercialised in the Xerox Star product range). Standalone word processors
adapted by getting smaller and replacing their CRTs with small character-oriented LCD
displays. Some models also had computer-like features such as floppy disk drives and the
ability to output to an external printer. They also got a name change, now being called
"electronic typewriters" and typically occupying a lower end of the market, selling for
under $200 USD.

MacWrite, Microsoft Word and other word processing programs for the bit-mapped
Apple Macintosh screen, introduced in 1984, were probably the first true WYSIWYG
word processors to become known to many people until the introduction of Microsoft
Windows. Dedicated word processors eventually became museum pieces.

Electronic mail, commonly called email or e-mail, is a method of exchanging digital


messages across the Internet or other computer networks. Email systems are based on a
store-and-forward model in which email server computer systems accept, forward,
deliver and store messages on behalf of users, who only need to connect to the email
infrastructure, typically an e-mail server, with a network-enabled device for the duration
of message submission or retrieval. Originally, email was transmitted directly from one
user's device to another user's computer, which required both computers to be online at
the same time.

An electronic mail message consists of two components, the message header, and the
message body, which is the email's content. The message header contains control
information, including, minimally, an originator's email address and one or more
recipient addresses. Usually additional information is added, such as a subject header
field.

Originally a text-only communications medium, email was extended to carry multi-media


content attachments, which was standardized in RFC 2045 through RFC 2049,
collectively called, Multipurpose Internet Mail Extensions (MIME).

The foundation for today's global Internet email services reaches back to the early
ARPANET and standards for encoding of messages were proposed as early as 1973
(RFC 561). An e-mail sent in the early 1970s looked very similar to one sent on the
Internet today. Conversion from the ARPANET to the Internet in the early 1980s
produced the core of the current services.

Network-based email was initially exchanged on the ARPANET in extensions to the File
Transfer Protocol (FTP), but is today carried by the Simple Mail Transfer Protocol
(SMTP), first published as Internet standard 10 (RFC 821) in 1982. In the process of
transporting email messages between systems, SMTP communicates delivery parameters
using a message envelope separately from the message (header and body) itself.
Contents
[hide]

• 1 Spelling
• 2 Origin
o 2.1 Host-based mail systems
o 2.2 LAN-based mail systems
o 2.3 Attempts at interoperability
o 2.4 From SNDMSG to MSG
o 2.5 The rise of ARPANET mail
• 3 Operation overview
• 4 Message format
o 4.1 Message header
 4.1.1 Header fields
o 4.2 Message body
 4.2.1 Content encoding
 4.2.2 Plain text and HTML
• 5 Servers and client applications
o 5.1 Filename extensions
o 5.2 URI scheme mailto:
• 6 Use
o 6.1 In society
 6.1.1 Flaming
 6.1.2 E-mail bankruptcy
o 6.2 In business
 6.2.1 Pros
 6.2.2 Cons
• 7 Problems
o 7.1 Attachment size limitation
o 7.2 Information overload
o 7.3 Spamming and computer viruses
o 7.4 E-mail spoofing
o 7.5 E-mail bombing
o 7.6 Privacy concerns
o 7.7 Tracking of sent mail
• 8 US Government
• 9 See also
o 9.1 Enhancements and related services
o 9.2 E-mail social issues
o 9.3 Clients and servers
o 9.4 Mailing list
o 9.5 Protocols
• 10 References
• 11 Further reading

• 12 External links

[edit] Spelling
There are several spelling variations that are occasionally the cause of vehement
disagreement.[2][3]

• email is the form required by IETF Request for Comments and working groups[4]
and is also recognized in most dictionaries.[5][6][7][8][9][10]
• e-mail is a form recommended by some prominent journalistic and technical style
guides.[11][12]
• mail was the form used in the original RFC. The service is referred to as mail and
a single piece of electronic mail is called a message.[13][14][15]
• eMail, capitalizing only the letter M, was common among ARPANET users and
early developers from Unix, CMS, AppleLink, eWorld, AOL, GEnie, and
Hotmail.[citation needed]
• EMail is a traditional form that has been used in RFCs for the "Author's Address",
[14][15]
and is expressly required "...for historical reasons...".[16]

[edit] Origin
Electronic mail predates the inception of the Internet, and was in fact a crucial tool in
creating it.

MIT first demonstrated the Compatible Time-Sharing System (CTSS) in 1961.[17] It


allowed multiple users to log into the IBM 7094[18] from remote dial-up terminals, and to
store files online on disk. This new ability encouraged users to share information in new
ways. E-mail started in 1965 as a way for multiple users of a time-sharing mainframe
computer to communicate. Although the exact history is murky, among the first systems
to have such a facility were SDC's Q32 and MIT's CTSS.

[edit] Host-based mail systems

The original email systems allowed communication only between users who logged into
the one host or "mainframe", but this could be hundreds or thousands of users within a
company or university. By 1966 (or earlier, it is possible that the SAGE system had
something similar some time before), such systems allowed email between different
companies as long as they ran compatible operating systems, but not to other dissimilar
systems.

Examples include BITNET, IBM PROFS, Digital Equipment Corporation ALL-IN-1 and
the original Unix mail.
[edit] LAN-based mail systems

From the early 1980s networked personal computers on LANs became increasingly
important. Server based systems similar to the earlier mainframe systems developed, and
again initially allowed communication only between users logged into the same server
infrastructure, but these also could generally be linked between different companies as
long as they ran the same email system and (proprietary) protocol.

Examples include cc:Mail, Lantastic, WordPerfect Office, Microsoft Mail, Banyan


VINES and Lotus Notes - with various vendors supplying gateway software to link these
incompatible systems.

[edit] Attempts at interoperability

• Novell briefly championed the open MHS protocol but abandoned it after
purchasing the non-MHS WordPerfect Office (renamed Groupwise)
• uucp was used as an open "glue" between differing mail systems
• The Coloured Book protocols on UK academic networks until 1992
• X.400 in the early 1990s was mandated for government use under GOSIP but
almost immediately abandoned by all but a few — in favour of Internet SMTP

[edit] From SNDMSG to MSG

In the early 1970s, Ray Tomlinson updated an existing utility called SNDMSG so that it
could copy files over the network. Lawrence Roberts, the project manager for the
ARPANET development, updated READMAIL and called the program RD. Barry
Wessler then updated RD and called it NRD.

Marty Yonke combined SNDMSG and NRD to include reading, sending, and a help
system, and called the utility WRD. John Vittal then updated this version to include
message forwarding and an Answer command to create replies with the correct address,
and called it MSG. With inclusion of these features, MSG is considered to be the first
modern email program, from which many other applications have descended.[19]

[edit] The rise of ARPANET mail

The ARPANET computer network made a large contribution to the development of e-


mail. There is one report that indicates experimental inter-system e-mail transfers began
shortly after its creation in 1969.[20] Ray Tomlinson is credited by some as having sent the
first email, initiating the use of the "@" sign to separate the names of the user and the
user's machine in 1971, when he sent a message from one Digital Equipment Corporation
DEC-10 computer to another DEC-10. The two machines were placed next to each other.
[21][22]
The ARPANET significantly increased the popularity of e-mail, and it became the
killer app of the ARPANET.
Most other networks had their own email protocols and address formats; as the influence
of the ARPANET and later the Internet grew, central sites often hosted email gateways
that passed mail between the Internet and these other networks. Internet email addressing
is still complicated by the need to handle mail destined for these older networks. Some
well-known examples of these were UUCP (mostly Unix computers), BITNET (mostly
IBM and VAX mainframes at universities), FidoNet (personal computers), DECNET
(various networks) and CSNET a forerunner of NSFNet.

An example of an Internet email address that routed mail to a user at a UUCP host:

hubhost!middlehost!edgehost!user@uucpgateway.somedomain.example.com

This was necessary because in early years UUCP computers did not maintain (or consult
servers for) information about the location of all hosts they exchanged mail with, but
rather only knew how to communicate with a few network neighbors; email messages
(and other data such as Usenet News) were passed along in a chain among hosts who had
explicitly agreed to share data with each other.

[edit] Operation overview


The diagram to the right shows a typical sequence of events[23] that takes place when
Alice composes a message using her mail user agent (MUA). She enters the e-mail
address of her correspondent, and hits the "send" button.

1. Her MUA formats the message in e-mail format and uses the Simple Mail
Transfer Protocol (SMTP) to send the message to the local mail transfer agent
(MTA), in this case smtp.a.org, run by Alice's internet service provider (ISP).
2. The MTA looks at the destination address provided in the SMTP protocol (not
from the message header), in this case bob@b.org. An Internet e-mail address is a
string of the form localpart@exampledomain. The part before the @ sign is the
local part of the address, often the username of the recipient, and the part after the
@ sign is a domain name or a fully qualified domain name. The MTA resolves a
domain name to determine the fully qualified domain name of the mail exchange
server in the Domain Name System (DNS).
3. The DNS server for the b.org domain, ns.b.org, responds with any MX records
listing the mail exchange servers for that domain, in this case mx.b.org, a server
run by Bob's ISP.
4. smtp.a.org sends the message to mx.b.org using SMTP, which delivers it to the
mailbox of the user bob.
5. Bob presses the "get mail" button in his MUA, which picks up the message using
the Post Office Protocol (POP3).

That sequence of events applies to the majority of e-mail users. However, there are many
alternative possibilities and complications to the e-mail system:

• Alice or Bob may use a client connected to a corporate e-mail system, such as
IBM Lotus Notes or Microsoft Exchange. These systems often have their own
internal e-mail format and their clients typically communicate with the e-mail
server using a vendor-specific, proprietary protocol. The server sends or receives
e-mail via the Internet through the product's Internet mail gateway which also
does any necessary reformatting. If Alice and Bob work for the same company,
the entire transaction may happen completely within a single corporate e-mail
system.
• Alice may not have a MUA on her computer but instead may connect to a
webmail service.
• Alice's computer may run its own MTA, so avoiding the transfer at step 1.
• Bob may pick up his e-mail in many ways, for example using the Internet
Message Access Protocol, by logging into mx.b.org and reading it directly, or by
using a webmail service.
• Domains usually have several mail exchange servers so that they can continue to
accept mail when the main mail exchange server is not available.
• E-mail messages are not secure if e-mail encryption is not used correctly.

Many MTAs used to accept messages for any recipient on the Internet and do their best to
deliver them. Such MTAs are called open mail relays. This was very important in the
early days of the Internet when network connections were unreliable. If an MTA couldn't
reach the destination, it could at least deliver it to a relay closer to the destination. The
relay stood a better chance of delivering the message at a later time. However, this
mechanism proved to be exploitable by people sending unsolicited bulk e-mail and as a
consequence very few modern MTAs are open mail relays, and many MTAs don't accept
messages from open mail relays because such messages are very likely to be spam.

[edit] Message format


The Internet e-mail message format is defined in RFC 5322 and a series of RFCs, RFC
2045 through RFC 2049, collectively called, Multipurpose Internet Mail Extensions, or
MIME. Although as of July 13, 2005, RFC 2822 is technically a proposed IETF standard
and the MIME RFCs are draft IETF standards,[24] these documents are the standards for
the format of Internet e-mail. Prior to the introduction of RFC 2822 in 2001, the format
described by RFC 822 was the standard for Internet e-mail for nearly 20 years; it is still
the official IETF standard. The IETF reserved the numbers 5321 and 5322 for the
updated versions of RFC 2821 (SMTP) and RFC 2822, as it previously did with RFC 821
and RFC 822, honoring the extreme importance of these two RFCs. RFC 822 was
published in 1982 and based on the earlier RFC 733 (see [25]).

Internet e-mail messages consist of two major sections:

• Header — Structured into fields such as summary, sender, receiver, and other
information about the e-mail.
• Body — The message itself as unstructured text; sometimes containing a signature
block at the end. This is exactly the same as the body of a regular letter.

The header is separated from the body by a blank line.

[edit] Message header

Each message has exactly one header, which is structured into fields. Each field has a
name and a value. RFC 5322 specifies the precise syntax.

Informally, each line of text in the header that begins with a printable character begins a
separate field. The field name starts in the first character of the line and ends before the
separator character ":". The separator is then followed by the field value (the "body" of
the field). The value is continued onto subsequent lines if those lines have a space or tab
as their first character. Field names and values are restricted to 7-bit ASCII characters.
Non-ASCII values may be represented using MIME encoded words.

[edit] Header fields

The message header should include at least the following fields:

• From: The e-mail address, and optionally the name of the author(s). In many e-
mail clients not changeable except through changing account settings.
• To: The e-mail address(es), and optionally name(s) of the message's recipient(s).
Indicates primary recipients (multiple allowed), for secondary recipients see Cc:
and Bcc: below.
• Subject: A brief summary of the topic of the message. Certain abbreviations are
commonly used in the subject, including "RE:" and "FW:".
• Date: The local time and date when the message was written. Like the From:
field, many email clients fill this in automatically when sending. The recipient's
client may then display the time in the format and time zone local to him/her.
• Message-ID: Also an automatically generated field; used to prevent multiple
delivery and for reference in In-Reply-To: (see below).
Note that the To: field is not necessarily related to the addresses to which the message is
delivered. The actual delivery list is supplied separately to the transport protocol, SMTP,
which may or may not originally have been extracted from the header content. The "To:"
field is similar to the addressing at the top of a conventional letter which is delivered
according to the address on the outer envelope. Also note that the "From:" field does not
have to be the real sender of the e-mail message. One reason is that it is very easy to fake
the "From:" field and let a message seem to be from any mail address. It is possible to
digitally sign e-mail, which is much harder to fake, but such signatures require extra
programming and often external programs to verify. Some ISPs do not relay e-mail
claiming to come from a domain not hosted by them, but very few (if any) check to make
sure that the person or even e-mail address named in the "From:" field is the one
associated with the connection. Some ISPs apply e-mail authentication systems to e-mail
being sent through their MTA to allow other MTAs to detect forged spam that might
appear to come from them.

RFC 3864 describes registration procedures for message header fields at the IANA; it
provides for permanent and provisional message header field names, including also fields
defined for MIME, netnews, and http, and referencing relevant RFCs. Common header
fields for email include:

• Bcc: Blind Carbon Copy; addresses added to the SMTP delivery list but not
(usually) listed in the message data, remaining invisible to other recipients.
• Cc: Carbon copy; Many e-mail clients will mark e-mail in your inbox differently
depending on whether you are in the To: or Cc: list.
• Content-Type: Information about how the message is to be displayed, usually a
MIME type.
• In-Reply-To: Message-ID of the message that this is a reply to. Used to link
related messages together.
• Precedence: commonly with values "bulk", "junk", or "list"; used to indicate that
automated "vacation" or "out of office" responses should not be returned for this
mail, e.g. to prevent vacation notices from being sent to all other subscribers of a
mailinglist.
• Received: Tracking information generated by mail servers that have previously
handled a message, in reverse order (last handler first).
• References: Message-ID of the message that this is a reply to, and the message-id
of the message the previous was reply a reply to, etc.
• Reply-To: Address that should be used to reply to the message.
• Sender: Address of the actual sender acting on behalf of the author listed in the
From: field (secretary, list manager, etc.).

[edit] Message body

This section needs additional citations for verification.


Please help improve this article by adding reliable references. Unsourced material may be
challenged and removed. (November 2007)
[edit] Content encoding

E-mail was originally designed for 7-bit ASCII.[26] Much e-mail software is 8-bit clean
but must assume it will communicate with 7-bit servers and mail readers. The MIME
standard introduced character set specifiers and two content transfer encodings to enable
transmission of non-ASCII data: quoted printable for mostly 7 bit content with a few
characters outside that range and base64 for arbitrary binary data. The 8BITMIME
extension was introduced to allow transmission of mail without the need for these
encodings but many mail transport agents still do not support it fully. In some countries,
several encoding schemes coexist; as the result, by default, the message in a non-Latin
alphabet language appears in non-readable form (the only exception is coincidence, when
the sender and receiver use the same encoding scheme). Therefore, for international
character sets, Unicode is growing in popularity.

[edit] Plain text and HTML

Most modern graphic e-mail clients allow the use of either plain text or HTML for the
message body at the option of the user. HTML e-mail messages often include an
automatically-generated plain text copy as well, for compatibility reasons.

Advantages of HTML include the ability to include in-line links and images, set apart
previous messages in block quotes, wrap naturally on any display, use emphasis such as
underlines and italics, and change font styles. Disadvantages include the increased size of
the email, privacy concerns about web bugs, abuse of HTML email as a vector for
phishing attacks and the spread of malicious software.[27]

Some web based Mailing lists recommend that all posts be made in plain-text[28][29] for all
the above reasons, but also because they have a significant number of readers using text-
based e-mail clients such as Mutt.

Some Microsoft e-mail clients allow rich formatting using RTF, but unless the recipient
is guaranteed to have a compatible e-mail client this should be avoided.[30]

In order to ensure that HTML sent in an email is rendered properly by the recipient's
client software, an additional header must be specified when sending: "Content-type:
text/html". Most email programs send this header automatically.

[edit] Servers and client applications


The interface of an e-mail client, Thunderbird.

Messages are exchanged between hosts using the Simple Mail Transfer Protocol with
software programs called mail transfer agents. Users can retrieve their messages from
servers using standard protocols such as POP or IMAP, or, as is more likely in a large
corporate environment, with a proprietary protocol specific to Lotus Notes or Microsoft
Exchange Servers. Webmail interfaces allow users to access their mail with any standard
web browser, from any computer, rather than relying on an e-mail client.

Mail can be stored on the client, on the server side, or in both places. Standard formats
for mailboxes include Maildir and mbox. Several prominent e-mail clients use their own
proprietary format and require conversion software to transfer e-mail between them.

Accepting a message obliges an MTA to deliver it, and when a message cannot be
delivered, that MTA must send a bounce message back to the sender, indicating the
problem.

[edit] Filename extensions

Upon reception of e-mail messages, e-mail client applications save message in operating
system files in the file-system. Some clients save individual messages as separate files,
while others use various database formats, often proprietary, for collective storage. A
historical standard of storage is the mbox format. The specific format used is often
indicated by special filename extensions:

eml
Used by many e-mail clients including Microsoft Outlook Express, Windows
Mail and Mozilla Thunderbird.[31] The files are plain text in MIME format,
containing the e-mail header as well as the message contents and attachments in
one or more of several formats.
emlx
Used by Apple Mail.
msg
Used by Microsoft Office Outlook and OfficeLogic Groupware.
mbx
Used by Opera Mail, KMail, and Apple Mail based on the mbox format.
Some applications (like Apple Mail) leave attachments encoded in messages for
searching while also saving separate copies of the attachments. Others separate
attachments from messages and save them in a specific directory.

[edit] URI scheme mailto:

The URI scheme, as registered with the IANA, defines the mailto: scheme for SMTP
email addresses. Though its use is not strictly defined, URLs of this form are intended to
be used to open the new message window of the user's mail client when the URL is
activated, with the address as defined by the URL in the To: field.[32]

[edit] Use
This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be
challenged and removed. (November 2007)

[edit] In society

There are numerous ways in which people have changed the way they communicate in
the last 50 years; e-mail is certainly one of them. Traditionally, social interaction in the
local community was the basis for communication – face to face. Yet, today face-to-face
meetings are no longer the primary way to communicate as one can use a landline
telephone, mobile phones, fax services, or any number of the computer mediated
communications such as e-mail.

Research has shown that people actively use e-mail to maintain core social networks,
particularly when others live at a distance. However, contradictory to previous research,
the results suggest that increases in Internet usage are associated with decreases in other
modes of communication, with proficiency of Internet and e-mail use serving as a
mediating factor in this relationship.[33] With the introduction of chat messengers and
video conference, there are more ways to communicate.

[edit] Flaming

Flaming occurs when a person sends a message with angry or antagonistic content.
Flaming is assumed to be more common today because of the ease and impersonality of
e-mail communications: confrontations in person or via telephone require direct
interaction, where social norms encourage civility, whereas typing a message to another
person is an indirect interaction, so civility may be forgotten.[citation needed] Flaming is
generally looked down upon by Internet communities as it is considered rude and non-
productive.

[edit] E-mail bankruptcy


Also known as "e-mail fatigue", e-mail bankruptcy is when a user ignores a large number
of e-mail messages after falling behind in reading and answering them. The reason for
falling behind is often due to information overload and a general sense there is so much
information that it is not possible to read it all. As a solution, people occasionally send a
boilerplate message explaining that the e-mail inbox is being cleared out. Stanford
University law professor Lawrence Lessig is credited with coining this term, but he may
only have popularized it.[34]

[edit] In business

E-mail was widely accepted by the business community as the first broad electronic
communication medium and was the first ‘e-revolution’ in business communication. E-
mail is very simple to understand and like postal mail, e-mail solves two basic problems
of communication: logistics and synchronization (see below).

LAN based email is also an emerging form of usage for business. It not only allows the
business user to download mail when offline, it also provides the small business user to
have multiple users e-mail ID's with just one e-mail connection.

[edit] Pros

• The problem of logistics: Much of the business world relies upon communications
between people who are not physically in the same building, area or even country;
setting up and attending an in-person meeting, telephone call, or conference call
can be inconvenient, time-consuming, and costly. E-mail provides a way to
exchange information between two or more people with no set-up costs and that is
generally far less expensive than physical meetings or phone calls.
• The problem of synchronisation: With real time communication by meetings or
phone calls, participants have to work on the same schedule, and each participant
must spend the same amount of time in the meeting or call. E-mail allows
asynchrony: each participant may control their schedule independently.

[edit] Cons

This section may contain original research. Please improve it by verifying the
claims made and adding references. Statements consisting only of original research
may be removed. More details may be available on the talk page. (June 2009)

Most business workers today spend from one to two hours of their working day on e-
mail: reading, ordering, sorting, ‘re-contextualizing’ fragmented information, and writing
e-mail.[35] The use of e-mail is increasing due to increasing levels of globalisation—
labour division and outsourcing amongst other things. E-mail can lead to some well-
known problems:

• Loss of context: which means that the context is lost forever; there is no way to
get the text back. Information in context (as in a newspaper) is much easier and
faster to understand than unedited and sometimes unrelated fragments of
information. Communicating in context can only be achieved when both parties
have a full understanding of the context and issue in question.
• Information overload: E-mail is a push technology—the sender controls who
receives the information. Convenient availability of mailing lists and use of "copy
all" can lead to people receiving unwanted or irrelevant information of no use to
them.
• Inconsistency: E-mail can duplicate information. This can be a problem when a
large team is working on documents and information while not in constant contact
with the other members of their team.

Despite these disadvantages, e-mail has become the most widely used medium of
communication within the business world.

[edit] Problems
This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be
challenged and removed. (November 2007)

[edit] Attachment size limitation

Main article: E-mail attachment

Email messages may have one or more attachments. Attachments serve the purpose of
delivering binary or text files of unspecified size. In principle there is no technical
intrinsic restriction in the SMTP protocol limiting the size or number of attachments. In
practice, however, email service providers implement various limitations on the
permissible size of files or the size of an entire message.

Furthermore, due to technical reasons, often a small attachment can increase in size when
sent,[36] which can be confusing to senders when trying to assess whether they can or
cannot send a file by e-mail, and this can result in their message being rejected.

As larger and larger file sizes are being created and traded, many users are either forced
to upload and download their files using an FTP server, or more popularly, use online file
sharing facilities or services, usually over web-friendly HTTP, in order to send and
receive them.

[edit] Information overload

A December 2007 New York Times blog post described information overload as "a $650
Billion Drag on the Economy",[37] and the New York Times reported in April 2008 that
"E-MAIL has become the bane of some people’s professional lives" due to information
overload, yet "none of the current wave of high-profile Internet start-ups focused on e-
mail really eliminates the problem of e-mail overload because none helps us prepare
replies".[38]

Technology investors reflect similar concerns.[39]

The email services are trying to provide maximum email inbox space to save the large
size documents(attachments).

[edit] Spamming and computer viruses

The usefulness of e-mail is being threatened by four phenomena: e-mail bombardment,


spamming, phishing, and e-mail worms.

Spamming is unsolicited commercial (or bulk) e-mail. Because of the very low cost of
sending e-mail, spammers can send hundreds of millions of e-mail messages each day
over an inexpensive Internet connection. Hundreds of active spammers sending this
volume of mail results in information overload for many computer users who receive
voluminous unsolicited e-mail each day.[40][41]

E-mail worms use e-mail as a way of replicating themselves into vulnerable computers.
Although the first e-mail worm affected UNIX computers, the problem is most common
today on the more popular Microsoft Windows operating system.

The combination of spam and worm programs results in users receiving a constant drizzle
of junk e-mail, which reduces the usefulness of e-mail as a practical tool.

A number of anti-spam techniques mitigate the impact of spam. In the United States, U.S.
Congress has also passed a law, the Can Spam Act of 2003, attempting to regulate such e-
mail. Australia also has very strict spam laws restricting the sending of spam from an
Australian ISP,[42] but its impact has been minimal since most spam comes from regimes
that seem reluctant to regulate the sending of spam.[citation needed]

[edit] E-mail spoofing

Main article: E-mail spoofing

E-mail spoofing occurs when the header information of an email is altered to make the
message appear to come from a known or trusted source. It is often used as a ruse to
collect personal information.

[edit] E-mail bombing

E-mail bombing is the intentional sending of large volumes of messages to a target


address. The overloading of the target email address can render it unusable and can even
cause the mail server to crash.
[edit] Privacy concerns

Main article: e-mail privacy

E-mail privacy, without some security precautions, can be compromised because:

• e-mail messages are generally not encrypted.


• e-mail messages have to go through intermediate computers before reaching their
destination, meaning it is relatively easy for others to intercept and read messages.
• many Internet Service Providers (ISP) store copies of e-mail messages on their
mail servers before they are delivered. The backups of these can remain for up to
several months on their server, despite deletion from the mailbox.
• the "Received:"-fields and other information in the e-mail can often identify the
sender, preventing anonymous communication.

There are cryptography applications that can serve as a remedy to one or more of the
above. For example, Virtual Private Networks or the Tor anonymity network can be used
to encrypt traffic from the user machine to a safer network while GPG, PGP, SMEmail,[43]
or S/MIME can be used for end-to-end message encryption, and SMTP STARTTLS or
SMTP over Transport Layer Security/Secure Sockets Layer can be used to encrypt
communications for a single mail hop between the SMTP client and the SMTP server.

Additionally, many mail user agents do not protect logins and passwords, making them
easy to intercept by an attacker. Encrypted authentication schemes such as SASL prevent
this.

Finally, attached files share many of the same hazards as those found in peer-to-peer
filesharing. Attached files may contain trojans or viruses.

[edit] Tracking of sent mail

The original SMTP mail service provides limited mechanisms for tracking a transmitted
message, and none for verifying that it has been delivered or read. It requires that each
mail server must either deliver it onward or return a failure notice (bounce message), but
both software bugs and system failures can cause messages to be lost. To remedy this, the
IETF introduced Delivery Status Notifications (delivery receipts) and Message
Disposition Notifications (return receipts); however, these are not universally deployed in
production.

There are a number of systems that allow the sender to see if messages have been opened.
[44][45][46]

[edit] US Government
The US Government has been involved in e-mail in several different ways.
Starting in 1977, the US Postal Service (USPS) recognized that electronic mail and
electronic transactions posed a significant threat to First Class mail volumes and revenue.
Therefore, the USPS initiated an experimental e-mail service known as E-COM.
Electronic messages were transmitted to a post office, printed out, and delivered as hard
copy. To take advantage of the service, an individual had to transmit at least 200
messages. The delivery time of the messages was the same as First Class mail and cost 26
cents. Both the Postal Regulatory Commission and the Federal Communications
Commission opposed E-COM. The FCC concluded that E-COM constituted common
carriage under its jurisdiction and the USPS would have to file a tariff.[47] Three years
after initiating the service, USPS canceled E-COM and attempted to sell it off.[48][49][50][51]
[52][53][54]

The early ARPANET dealt with multiple e-mail clients that had various, and at times
incompatible, formats. For example, in the system Multics, the "@" sign meant "kill line"
and anything after the "@" sign was ignored.[55] The Department of Defense DARPA
desired to have uniformity and interoperability for e-mail and therefore funded efforts to
drive towards unified inter-operable standards. This led to David Crocker, John Vittal,
Kenneth Pogran, and Austin Henderson publishing RFC 733, "Standard for the Format of
ARPA Network Text Message" (November 21, 1977), which was apparently not
effective. In 1979, a meeting was held at BBN to resolve incompatibility issues. Jon
Postel recounted the meeting in RFC 808, "Summary of Computer Mail Services Meeting
Held at BBN on 10 January 1979" (March 1, 1982), which includes an appendix listing
the varying e-mail systems at the time. This, in turn, lead to the release of David
Crocker's RFC 822, "Standard for the Format of ARPA Internet Text Messages" (August
13, 1982).[56]

The National Science Foundation took over operations of the ARPANET and Internet
from the Department of Defense, and initiated NSFNet, a new backbone for the network.
A part of the NSFNet AUP forbade commercial traffic.[57] In 1988, Vint Cerf arranged for
an interconnection of MCI Mail with NSFNET on an experimental basis. The following
year Compuserve e-mail interconnected with NSFNET. Within a few years the
commercial traffic restriction was removed from NSFNETs AUP, and NSFNET was
privatised.

In the late 1990s, the Federal Trade Commission grew concerned with fraud transpiring
in e-mail, and initiated a series of procedures on spam, fraud, and phishing.[58] In 2004,
FTC jurisdiction over spam was codified into law in the form of the CAN SPAM Act.[59]
Several other US Federal Agencies have also exercised jurisdiction including the
Department of Justice and the Secret Service.

A spreadsheet is a computer application that simulates a paper, accounting worksheet. It


displays multiple cells that together make up a grid consisting of rows and columns, each
cell containing alphanumeric text, numeric values or formulas. A formula defines how
the content of that cell is to be calculated from the contents of any other cell (or
combination of cells) each time any cell is updated. Spreadsheets are frequently used for
financial information because of their ability to re-calculate the entire sheet automatically
after a change to a single cell is made.

Visicalc is usually considered the first electronic spreadsheet (although this has been
challenged), and it helped turn the Apple II computer into a success and greatly assisted
in their widespread application. Lotus 1-2-3 was the leading spreadsheet when DOS was
the dominant operating system. Excel now has the largest market share on the Windows
and Macintosh platforms.[1][2][3]

OpenOffice.org Calc spreadsheet


Contents
[hide]

• 1 History
o 1.1 Paper spreadsheets
o 1.2 Early implementations
 1.2.1 Batch spreadsheet report generators
 1.2.2 LANPAR spreadsheet compiler
 1.2.3 Autoplan/Autotab spreadsheet programming language
 1.2.4 APLDOT modeling language
o 1.3 VisiCalc
o 1.4 Lotus 1-2-3 and other MS-DOS spreadsheets
o 1.5 Microsoft Excel
o 1.6 Apple Numbers
o 1.7 OpenOffice.org Calc
o 1.8 Gnumeric
o 1.9 Web based spreadsheets
o 1.10 Other spreadsheets
o 1.11 Other products
• 2 Concepts
o 2.1 Cells
 2.1.1 Values
 2.1.2 Automatic recalculation
 2.1.3 Real-time update
 2.1.4 Locked cell
 2.1.5 Data format
 2.1.6 Cell formatting
 2.1.7 Named cells
 2.1.7.1 Cell reference
 2.1.7.2 Cell ranges
o 2.2 Sheets
o 2.3 Formulas
o 2.4 Functions
o 2.5 Subroutines
o 2.6 Remote spreadsheet
o 2.7 Charts
o 2.8 Multi-dimensional spreadsheets
o 2.9 Logical spreadsheets
• 3 Programming issues
• 4 Shortcomings
• 5 See also
• 6 References
• 7 External links
o 7.1 History of spreadsheets

o 7.2 General information

[edit] History
[edit] Paper spreadsheets

The word "spreadsheet" came from "spread" in its sense of a newspaper or magazine item
(text and/or graphics) that covers two facing pages, extending across the center fold and
treating the two pages as one large one. The compound word "spread-sheet" came to
mean the format used to present book-keeping ledgers—with columns for categories of
expenditures across the top, invoices listed down the left margin, and the amount of each
payment in the cell where its row and column intersect—which were, traditionally, a
"spread" across facing pages of a bound ledger (book for keeping accounting records) or
on oversized sheets of paper ruled into rows and columns in that format and
approximately twice as wide as ordinary paper.

[edit] Early implementations

[edit] Batch spreadsheet report generators


A batch 'spreadsheet' is indistinguishable from a batch compiler with added input data,
producing an output report (i.e. a 4GL or conventional, non-interactive, batch computer
program). However, this concept of an electronic spreadsheet was outlined in the 1961
paper "Budgeting Models and System Simulation" by Richard Mattessich.[4] The
subsequent work by Mattessich (1964a, Chpt. 9, Accounting and Analytical Methods) and
its companion volume, Mattessich (1964b, Simulation of the Firm through a Budget
Computer Program) applied computerized spreadsheets to accounting and budgeting
systems (on mainframe computers programmed in FORTRAN IV). These batch
Spreadsheets dealt primarily with the addition or subtraction of entire columns or rows
(of input variables) - rather than individual 'cells'.

In 1962 this 'concept' of the spreadsheet (called BCL for Business Computer Language)
was implemented on an IBM 1130 and in 1963 was ported to an IBM 7040 by R. Brian
Walsh at Marquette University, Wisconsin. This program was written in Fortran.
Primitive timesharing was available on those machines. In 1968 BCL was ported by
Walsh to the IBM 360/67 timesharing machine at Washington State University. It was
used to assist in the teaching of finance to business students. Students were able to take
information prepared by the professor and manipulate it to represent it and show ratios
etc. In 1964, A book entitled Business Computer Language written by Kimball, Stoffells
and Walsh and both the book and program were copyrighted in 1966 and years later that
copyright was renewed [5]

In the late 60's Xerox used BCL to develop a more sophisticated version for their
timesharing system.

[edit] LANPAR spreadsheet compiler

Key invention in the development of electronic spreadsheets was made by Rene K. Pardo
and Remy Landau, who filed in 1971 U.S. Patent 4,398,249 on spreadsheet automatic
natural order recalculation algorithm in 1970. While the patent was initially rejected by
the patent office as being a purely mathematical invention, following 12 years of appeals,
Pardo and Landau won a landmark court case at the CCPA (Predecessor Court of the
Federal Circuit) overturning the Patent Office in 1983 - establishing that "something does
not cease to become patentable merely because the point of novelty is in an algorithm."
However, in 1995 the United States Court of Appeals for the Federal Circuit ruled the
patent unenforceable [6].

The actual software was called LANPAR - LANguage for Programming Arrays at
Random. This was conceived and entirely developed in the summer of 1969 following
Pardo and Landau's recent graduation from Harvard University. Co-inventor Rene Pardo
recalls that he felt that one manager at Bell Canada should not have to depend on
programmers to program and modify budgeting forms, and he thought of letting users
type out forms in any order and having computer calculating results in the right order.
The software was developed in 1969.[7]
LANPAR was used by Bell Canada, AT&T and the 18 operating telcos nationwide for
their local and national budgeting operations. LANPAR was also used by General
Motors. Its uniqueness was the incorporation of natural order recalculation,[8] as opposed
to left-to-right, top to bottom sequence for calculating the results in each cell that was
used by Visicalc, Supercalc and the first version of Multiplan. Without natural order
recalculation the users had to manually recalculate the spreadsheet as many times as
necessary until the values in all the cells had stopped changing.

The LANPAR system was implemented on GE400 and Honeywell 6000 online
timesharing systems enabling users to program remotely via computer terminals and
modems. Data could be entered dynamically either by paper tape, specific file access, on
line, or even external data bases. Sophisticated mathematical expressions including
logical comparisons and "if/then" statements could be used in any cell, and cells could be
presented in any order.

[edit] Autoplan/Autotab spreadsheet programming language

In 1968, three former employees from the General Electric computer company
headquartered in Phoenix, Arizona set out to start their own software development house.
A. Leroy Ellison, Harry N. Cantrell, and Russell E. Edwards found themselves doing a
large number of calculations when making tables for the business plans that they were
presenting to venture capitalists. They decided to save themselves a lot of effort and
wrote a computer program that produced their tables for them. This program, originally
conceived as a simple utility for their personal use, would turn out to be the first software
product offered by the company that would become known as Capex Corporation.
"AutoPlan" ran on GE’s Time-sharing service; afterward, a version that ran on IBM
mainframes was introduced under the name "AutoTab". (National CSS offered a similar
product, CSSTAB, which had a moderate timesharing user base by the early 70s. A major
application was opinion research tabulation.) AutoPlan/AutoTab was not a WYSIWYG
interactive spreadsheet program, it was a simple scripting language for spreadsheets. The
user defined the names and labels for the rows and columns, then the formulas that
defined each row or column.

[edit] APLDOT modeling language

An example of an early "industrial weight" spreadsheet was APLDOT, developed in


1976 at the United States Railway Association on an IBM 360/91, running at The Johns
Hopkins University Applied Physics Laboratory in Laurel, MD.[9] The application was
used successfully for many years in developing such applications as financial and costing
models for the US Congress and for Conrail. APLDOT was dubbed a "spreadsheet"
because financial analysts and strategic planners used it to solve the same problems they
addressed with paper spreadsheet pads.

[edit] VisiCalc
The spreadsheet concept became widely known in the late 1970s and early 1980s because
of Dan Bricklin and Bob Frankston's implementation of VisiCalc. VisiCalc was the first
spreadsheet that combined all essential features of modern spreadsheet applications, such
as WYSIWYG interactive user interface, automatic recalculation, status and formula
lines, range copying with relative and absolute references, formula building by selecting
referenced cells. PC World magazine has called VisiCalc the first electronic spreadsheet.
[10]

Bricklin has spoken of watching his university professor create a table of calculation
results on a blackboard. When the professor found an error, he had to tediously erase and
rewrite a number of sequential entries in the table, triggering Bricklin to think that he
could replicate the process on a computer, using the blackboard as the model to view
results of underlying formulas. His idea became VisiCalc, the first application that turned
the personal computer from a hobby for computer enthusiasts into a business tool.

Screenshot of VisiCalc, the first PC spreadsheet.

VisiCalc went on to become the first "killer app", an application that was so compelling,
people would buy a particular computer just to own it. In this case the computer was the
Apple II, and VisiCalc was no small part in that machine's success. The program was
later ported to a number of other early computers, notably CP/M machines, the Atari 8-bit
family and various Commodore platforms. Nevertheless, VisiCalc remains best known as
"an Apple II program".

[edit] Lotus 1-2-3 and other MS-DOS spreadsheets

The acceptance of the IBM PC following its introduction in August, 1981, began slowly,
because most of the programs available for it were translations from other from other
computer models. Things changed dramatically with the introduction of Lotus 1-2-3 in
November, 1982, and release for sale in January, 1983. Since it was written especially for
the IBM PC, it had good performance[citation needed] and became the killer app for this PC.
Lotus 1-2-3 drove sales of the PC due to the improvements in speed and graphics
compared to VisiCalc on the Apple II.

Lotus 1-2-3, along with its competitor Borland Quattro, soon displaced VisiCalc. Lotus
1-2-3 was released on January 26, 1983, started outselling then-most-popular VisiCalc
the very same year, and for a number of years was the leading spreadsheet for DOS.
[edit] Microsoft Excel

Microsoft had been developing Excel on the Macintosh platform for several years at this
point, where it had developed into a fairly powerful system. A port of Excel to Windows
2.0 resulted in a fully functional Windows spreadsheet. The more robust Windows 3.x
platforms of the early 1990s made it possible for Excel to take market share from Lotus.
By the time Lotus responded with usable Windows products, Microsoft had started
compiling their Office suite. Starting in the mid 1990s continuing through the present,
Microsoft Excel has dominated the commercial electronic spreadsheet market.

[edit] Apple Numbers

Numbers is Apple Inc.'s spreadsheet software, part of iWork. It focuses on usability and
the elegance of chart presentation. Numbers completed Apple's productivity suite,
making it a viable competitor to Microsoft Office. It lacks features such as pivot table
providing Table Categories as a simpler alternative.

[edit] OpenOffice.org Calc

OpenOffice.org Calc is a freely available, open-source program modelled after Microsoft


Excel. Calc can both open and save in the Excel (XLS) file format[11]. Calc can be
acquired as both an installation file and a portable program, capable of being run from a
device such as a USB memory drive. It can be downloaded from the OpenOffice.org
website.

[edit] Gnumeric

Gnumeric is a free spreadsheet program that is part of the GNOME Free Software
Desktop Project and has Windows installers available. It is intended to be a free
replacement for proprietary spreadsheet programs such as Microsoft Excel, which it
broadly and openly emulates. Gnumeric was created and developed by Miguel de Icaza,
and the current maintainer is Jody Goldberg.

Gnumeric has the ability to import and export data in several file formats, including CSV,
Microsoft Excel, HTML, LaTeX, Lotus 1-2-3, OpenDocument and Quattro Pro; its native
format is the Gnumeric file format (.gnm or .gnumeric), an XML file compressed with
gzip.[12] It includes all of the spreadsheet functions of the North American edition of
Microsoft Excel and many functions unique to Gnumeric. Pivot tables and conditional
formatting are not yet supported but are planned for future versions. Gnumeric's
accuracy[13][14] has helped it to establish a niche among people using it for statistical
analysis and other scientific tasks.[citation needed] For improving the accuracy of Gnumeric, the
developers are cooperating with the R Project.

[edit] Web based spreadsheets


With the advent of advanced web technologies such as Ajax circa 2005, a new generation
of online spreadsheets has emerged. Equipped with a rich Internet application user
experience, the best web based online spreadsheets have many of the features seen in
desktop spreadsheet applications. Some of them have strong multi-user collaboration
features. Some of them offer real time updates from remote sources such as stock prices
and currency exchange rates.

[edit] Other spreadsheets

• A list of current spreadsheet software


o IBM Lotus Symphony (2007)
o KSpread
o ZCubes-Calci
o Resolver One
• Discontinued spreadsheet software
o Advantage
o Lotus Improv[15]
o Javelin Software
o Lotus Jazz for Macintosh
o MultiPlan
o Borland's Quattro Pro
o SuperCalc
o Lotus Symphony (1984)
o Wingz for Macintosh
o Target Planner Calc for CP/M and TRS-DOS[16][17]

[edit] Other products

A number of companies have attempted to break into the spreadsheet market with
programs based on very different paradigms. Lotus introduced what is likely the most
successful example, Lotus Improv, which saw some commercial success, notably in the
financial world where its powerful data mining capabilities remain well respected to this
day. Spreadsheet 2000 attempted to dramatically simplify formula construction, but was
generally not successful.

[edit] Concepts
[edit] Cells

A "cell" can be thought of as a box for holding a datum. A single cell is usually
referenced by its column and row (A2 would represent the cell below containing the
value 10). Usually rows are referenced in decimal notation starting from 1, while
columns use 26-adic bijective numeration using the letters A-Z as numerals. Its physical
size can usually be tailored for its content by dragging its height or width at box
intersections (or for entire columns or rows by dragging the column or rows headers).
My Spreadsheet
A B C D
01 value1 value2 added multiplied
02 10 20 30 200

An array of cells is called a "sheet" or "worksheet". It is analogous to an array of


variables in a conventional computer program (although certain unchanging values, once
entered, could be considered, by the same analogy, constants). In most implementations,
many worksheets may be located within a single spreadsheet. A worksheet is simply a
subset of the spreadsheet divided for the sake of clarity. Functionally, the spreadsheet
operates as a whole and all cells operate as global variables within the spreadsheet ('read'
access only except its own containing cell).

A cell may contain a value or a formula, or it may simply be left empty. By convention,
formulas usually begin with = sign.

[edit] Values

A value can be entered from the computer keyboard by directly typing into the cell itself.
Alternatively, a value can be based on a formula (see below), which might perform a
calculation, display the current date or time, or retrieve external data such as a stock
quote or a database value.

The Spreadsheet Value Rule Computer scientist Alan Kay used the term value rule to
summarize a spreadsheet's operation: a cell's value relies solely on the formula the user
has typed into the cell.[18] The formula may rely on the value of other cells, but those cells
are likewise restricted to user-entered data or formulas. There are no 'side effects' to
calculating a formula: the only output is to display the calculated result inside its
occupying cell. There is no natural mechanism for permanently modifying the contents of
a cell unless the user manually modifies the cell's contents. In the context of
programming languages, this yields a limited form of first-order functional programming.
[19]

[edit] Automatic recalculation

A standard of spreadsheets since the mid 80s[citation needed], this optional feature eliminates
the need to manually request the spreadsheet program to recalculate values (nowadays
typically the default option unless specifically 'switched off' for large spreadsheets,
usually to improve performance). Some earlier spreadsheets required a manual request to
recalculate, since recalculation of large or complex spreadsheets often reduced data entry
speed. Many modern spreadsheets still retain this option.

[edit] Real-time update


This feature refers to updating a cell's contents periodically when its value is derived
from an external source - such as a cell in another "remote" spreadsheet. For shared, web-
based spreadsheets, it applies to "immediately" updating cells that have been altered by
another user. All dependent cells have to be updated also.

[edit] Locked cell

Once entered, selected cells (or the entire spreadsheet) can optionally be "locked" to
prevent accidental overwriting. Typically this would apply to cells containing formulas
but might be applicable to cells containing "constants" such as a kilogram/pounds
conversion factor (2.20462262 to eight decimal places). Even though individual cells are
marked as locked, the spreadsheet data is not protected until the feature is activated in the
file preferences.

[edit] Data format

A cell or range can optionally be defined to specify how the value is displayed. The
default display format is usually set by its initial content if not specifically previously set,
so that for example "31/12/2007" or "31 Jan 2007" would default to the cell format of
"date". Similarly adding a % sign after a numeric value would tag the cell as a percentage
cell format. The cell contents are not changed by this format, only the displayed value.

Some cell formats such as "numeric" or "currency" can also specify the number of
decimal places.

This can allow invalid operations (such as doing multiplication on a cell containing a
date), resulting in illogical results without an appropriate warning.

[edit] Cell formatting

Depending on the capability of the spreadsheet application, each cell (like its counterpart
the "style" in a word processor) can be separately formatted using the attributes of either
the content (point size, color, bold or italic) or the cell (border thickness, background
shading, color). To aid the readability of a spreadsheet, cell formatting may be
conditionally applied to data - for example, a negative number may be displayed in red.

A cell's formatting does not typically effect its content and depending on how cells are
referenced or copied to other worksheets or applications, the formatting may not be
carried with the content.

[edit] Named cells


Use of named column variables x & y in Microsoft Excel. Formula for y=x2 resembles
Fortran, and Name Manager shows the definitions of x & y.

In most implementations, a cell, or group of cells in a column or row, can be "named"


enabling the user to refer to those cells by a name rather than by a grid reference. Names
must be unique within the spreadsheet, but when using multiple sheets in a spreadsheet
file, an identically named cell range on each sheet can be used if it is distinguished by
adding the sheet name. One reason for this usage is for creating or running macros that
repeat a command across many sheets. Another reason is that formulas with named
variables are readily checked against the algebra they are intended to implement (they
resemble Fortran expressions). Use of named variables and named functions also makes
the spreadsheet structure more transparent.

[edit] Cell reference

In place of a named cell, an alternative approach is to use a cell (or grid) reference. Most
cell references indicate another cell in the same spreadsheet, but a cell reference can also
refer to a cell in a different sheet within the same spreadsheet, or (depending on the
implementation) to a cell in another spreadsheet entirely, or to a value from a remote
application.

A typical cell reference in "A1" style consists of one or two case-insensitive letters to
identify the column (if there are up to 256 columns: A-Z and AA-IV) followed by a row
number (e.g. in the range 1-65536). Either part can be relative (it changes when the
formula it is in is moved or copied), or absolute (indicated with $ in front of the part
concerned of the cell reference). The alternative "R1C1" reference style consists of the
letter R, the row number, the letter C, and the column number; relative row or column
numbers are indicated by enclosing the number in square brackets. Most current
spreadsheets use the A1 style, some providing the R1C1 style as a compatibility option.
When the computer calculates a formula in one cell to update the displayed value of that
cell, cell reference(s) in that cell, naming some other cell(s), cause the computer to fetch
the value of the named cell(s).

A cell on the same "sheet" is usually addressed as:-

=A1

A cell on a different sheet of the same spreadsheet is usually addressed as:-

=SHEET2!A1 (that is; the first cell in sheet 2 of same


spreadsheet).

Some spreadsheet implementations allow a cell references to another spreadsheet (not the
current open and active file) on the same computer or a local network. It may also refer to
a cell in another open and active spreadsheet on the same computer or network that is
defined as shareable. These references contain the complete filename, such as:-

='C:\Documents and Settings\Username\My spreadsheets\[main


sheet]Sheet1!A1

In a spreadsheet, references to cells are automatically updated when new rows or columns
are inserted or deleted. Care must be taken however when adding a row immediately
before a set of column totals to ensure that the totals reflect the additional rows values -
which often they do not!

A circular reference occurs when the formula in one cell has a reference that directly—or
indirectly, through a chain of references, each one pointing to another cell that has
another reference to the next cell on the chain—points to the one cell. Many common
kinds of errors cause such circular references. However, there are some valid techniques
that use such circular references. Such techniques, after many recalculations of the
spreadsheet, (usually) converge on the correct values for those cells.

[edit] Cell ranges

Likewise, instead of using a named range of cells, a range reference can be used.
Reference to a range of cells is typically of the form (A1:A6) which specifies all the cells
in the range A1 through to A6. A formula such as "=SUM(A1:A6)" would add all the
cells specified and put the result in the cell containing the formula itself.

[edit] Sheets

In the earliest spreadsheets, cells were a simple two-dimensional grid. Over time, the
model has been expanded to include a third dimension, and in some cases a series of
named grids, called sheets. The most advanced examples allow inversion and rotation
operations which can slice and project the data set in various ways.
[edit] Formulas

Animation of a simple spreadsheet that multiplies values in the left column by 2, then
sums the calculated values from the right column to the bottom-most cell. In this
example, only the values in the A column are entered (10, 20, 30), and the remainder of
cells are formulas. Formulas in the B column multiply values from the A column using
relative references, and the formula in B4 uses the SUM() function to find the sum of
values in the B1:B3 range.

A formula identifies the calculation needed to place the result in the cell it is contained
within. A cell containing a formula therefore has two display components; the formula
itself and the resulting value. The formula is normally only shown when the cell is
selected by "clicking" the mouse over a particular cell; otherwise it contains the result of
the calculation.

A formula assigns values to a cell or range of cells, and typically has the format:

=expression

where the expression consists of:

• values, such as 2, 9.14 or 6.67E-11;


• references to other cells, such as, e.g., A1 for a single cell or B1:B3 for a range;
• arithmetic operators, such as +, -, *, /, and others;
• relational operators, such as >=, <, and others; and,
• functions, such as SUM(), TAN(), and many others.

When a cell contains a formula, it often contains references to other cells. Such a cell
reference is a type of variable. Its value is the value of the referenced cell or some
derivation of it. If that cell in turn references other cells, the value depends on the values
of those. References can be relative (e.g., A1, or B1:B3), absolute (e.g., $A$1, or
$B$1:$B$3) or mixed row-wise or column-wise absolute/relative (e.g., $A1 is column-
wise absolute and A$1 is row-wise absolute).

The available options for valid formulas depends on the particular spreadsheet
implementation but, in general, most arithmetic operations and quite complex nested
conditional operations can be performed by most of today's commercial spreadsheets.
Modern implementations also offer functions to access custom-build functions, remote
data, and applications.

A formula may contain a condition (or nested conditions) - with or without an actual
calculation - and is sometimes used purely to identify and highlight errors. In the
example below, it is assumed the sum of a column of percentages (A1 through A6) is
tested for validity and an explicit message put into the adjacent right-hand cell.

=IF(SUM(A1:A6) > 100, "More than 100%", SUM(A1:A6))

A spreadsheet does not, in fact, have to contain any formulas at all, in which case it could
be considered merely a collection of data arranged in rows and columns (a database) like
a calendar, timetable or simple list. Because of its ease of use, formatting and
hyperlinking capabilities, many spreadsheets are used solely for this purpose.

[edit] Functions

Use of user-defined function sq(x) in Microsoft Excel.

Spreadsheets usually contain a number of supplied functions, such as arithmetic


operations (for example, summations, averages and so forth), trigonometric functions,
statistical functions, and so forth. In addition there is often a provision for user-defined
functions. In Microsoft Excel these functions are defined using Visual Basic for
Applications in the supplied Visual Basic editor, and such functions are automatically
accessible on the worksheet. In addition, programs can be written that pull information
from the worksheet, perform some calculations, and report the results back to the
worksheet. In the figure, the name sq is user-assigned, and function sq is introduced using
the Visual Basic editor supplied with Excel. Name Manager displays the spreadsheet
definitions of named variables x & y.

[edit] Subroutines
Subroutine in Microsoft Excel writes values calculated using x into y.

Functions themselves cannot write into the worksheet, but simply return their evaluation.
However, in Microsoft Excel, subroutines can write values or text found within the
subroutine directly to the spreadsheet. The figure shows the Visual Basic code for a
subroutine that reads each member of the named column variable x, calculates its square,
and writes this value into the corresponding element of named column variable y. The y-
column contains no formula because its values are calculated in the subroutine, not on the
spreadsheet, and simply are written in.

[edit] Remote spreadsheet

Whenever a reference is made to a cell or group of cells that are not located within the
current physical spreadsheet file, it is considered as accessing a "remote" spreadsheet.
The contents of the referenced cell may be accessed either on first reference with a
manual update or more recently in the case of web based spreadsheets, as a near real time
value with a specified automatic refresh interval.

[edit] Charts
Graph made using Microsoft Excel

Many spreadsheet applications permit charts, graphs or histograms to be generated from


specified groups of cells which are dynamically re-built as cell contents change. The
generated graphic component can either be embedded within the current sheet or added
as a separate object.

[edit] Multi-dimensional spreadsheets

In the late 1980s and early 1990s, first Javelin Software and later Lotus Improv appeared
and unlike models in a conventional spreadsheet, they utilized models built on objects
called variables, not on data in cells of a report. These multi-dimensional spreadsheets
enabled viewing data and algorithms in various self-documenting ways, including
simultaneous multiple synchronized views. For example, users of Javelin could move
through the connections between variables on a diagram while seeing the logical roots
and branches of each variable. This is an example of what is perhaps its primary
contribution of the earlier Javelin—the concept of traceability of a user's logic or model
structure through its twelve views. A complex model can be dissected and understood by
others who had no role in its creation, and this remains unique even today. Javelin was
used primarily for financial modeling, but was also used to build instructional models in
college chemistry courses, to model the world's economies, and by the military in the
early Star Wars project. It is still in use by institutions for which model integrity is
mission critical.

In these programs, a time series, or any variable, was an object in itself, not a collection
of cells which happen to appear in a row or column. Variables could have many
attributes, including complete awareness of their connections to all other variables, data
references, and text and image notes. Calculations were performed on these objects, as
opposed to a range of cells, so adding two time series automatically aligns them in
calendar time, or in a user-defined time frame. Data were independent of worksheets—
variables, and therefore data, could not be destroyed by deleting a row, column or entire
worksheet. For instance, January's costs are subtracted from January's revenues,
regardless of where or whether either appears in a worksheet. This permits actions later
used in pivot tables, except that flexible manipulation of report tables was but one of
many capabilities supported by variables. Moreover, if costs were entered by week and
revenues by month, Javelin's program could allocate or interpolate as appropriate. This
object design enabled variables and whole models to reference each other with user-
defined variable names, and to perform multidimensional analysis and massive, but easily
editable consolidations.

[edit] Logical spreadsheets

Spreadsheets that have a formula language based upon logical expressions, rather than
arithmetic expressions are known as logical spreadsheets. Such spreadsheets can be used
to be reason deductively about their cell values.
[edit] Programming issues
Just as the early programming languages were designed to generate spreadsheet printouts,
programming techniques themselves have evolved to process tables (also known as
spreadsheets or matrices) of data more efficiently in the computer itself.

Spreadsheets have evolved to use powerful programming languages like VBA;


specifically, they are functional, visual, and multiparadigm languages.

Many people find it easier to perform calculations in spreadsheets than by writing the
equivalent sequential program. This is due to two traits of spreadsheets.

• They use spatial relationships to define program relationships. Like all animals,
humans have highly developed intuitions about spaces, and of dependencies
between items. Sequential programming usually requires typing line after line of
text, which must be read slowly and carefully to be understood and changed.
• They are forgiving, allowing partial results and functions to work. One or more
parts of a program can work correctly, even if other parts are unfinished or
broken. This makes writing and debugging programs much easier, and faster[citation
needed]
. Sequential programming usually needs every program line and character to
be correct for a program to run. One error usually stops the whole program and
prevents any result.

A 'spreadsheet program' is designed to perform general computation tasks using spatial


relationships rather than time as the primary organizing principle.[citation needed].

It is often convenient to think of a spreadsheet as a mathematical graph, where the nodes


are spreadsheet cells, and the edges are references to other cells specified in formulas.
This is often called the dependency graph of the spreadsheet. References between cells
can take advantage of spatial concepts such as relative position and absolute position, as
well as named locations, to make the spreadsheet formulas easier to understand and
manage.

Spreadsheets usually attempt to automatically update cells when the cells on which they
depend have been changed. The earliest spreadsheets used simple tactics like evaluating
cells in a particular order, but modern spreadsheets compute a minimal recomputation
order from the dependency graph. Later spreadsheets also include a limited ability to
propagate values in reverse, altering source values so that a particular answer is reached
in a certain cell. Since spreadsheet cells formulas are not generally invertible, though, this
technique is of somewhat limited value.

Many of the concepts common to sequential programming models have analogues in the
spreadsheet world. For example, the sequential model of the indexed loop is usually
represented as a table of cells, with similar formulas (normally differing only in which
cells they reference).
[edit] Shortcomings
This article contains weasel words, vague phrasing that often accompanies
biased or unverifiable information. Such statements should be clarified or
removed. (February 2009)

While spreadsheets are a great step forward in quantitative modeling, they have
deficiencies. At the level of overall user benefits, spreadsheets have several main
shortcomings, especially concerning the unfriendliness of alpha-numeric cell addresses.

• Spreadsheets have significant reliability problems. Research studies estimate that


roughly 94% of spreadsheets deployed in the field contain errors, and 5.2% of
cells in unaudited spreadsheets contain errors.[20]

Despite the high error risks often associated with spreadsheet authorship and use,
specific steps can be taken to significantly enhance control and reliability by
structurally reducing the likelihood of error occurrence at their source.[21]

• The practical expressiveness of spreadsheets can be limited unless their modern


features are used. Several factors contribute to this limitation. Implementing a
complex model on a cell-at-a-time basis requires tedious attention to detail.
Authors have difficulty remembering the meanings of hundreds or thousands of
cell addresses that appear in formulas.[citation needed]

These drawbacks are mitigated by the use of named variables for cell
designations, and employing variables in formulas rather than cell locations and
cell-by-cell manipulations. Graphs can be used to show instantly how results are
changed by changes in parameter values. In fact, the spreadsheet can be made
invisible except for a transparent user interface that requests pertinent input from
the user, displays results requested by the user, creates reports, and has built-in
error traps to prompt correct input.[22]

• Similarly, formulas expressed in terms of cell addresses are hard to keep straight
and hard to audit. Research shows that spreadsheet auditors who check numerical
results and cell formulas find no more errors than auditors who only check
numerical results [20]. That is another reason to use named variables and formulas
employing named variables.

• Collaboration in authoring spreadsheet formulas can be difficult when such


collaboration occurs at the level of cells and cell addresses.

However, like programming languages, spreadsheets are capable of using


aggregate cells with similar meaning and indexed variables with names that
indicate meaning. Some spreadsheets have good collaboration features, and it is
inadvisable to author at the level of cells and cell formulas to avoid obstacles to
collaboration, where many people cooperate on data entry and many people use
the same spreadsheet.

• Productivity of spreadsheet modelers is reduced by the antiquated cell-level focus


of spreadsheets that is seldom used today. That old and poor approach means that
even conceptually simple changes in spreadsheets (such as changing starting or
ending time or time grain, adding new members or a level of hierarchy to a
dimension, or changing one conceptual formula that is represented as hundreds of
cell formulas) often require large numbers of manual cell-level operations (such
as inserting or deleting cells/rows/columns, editing and copying formulas, re-
laying out worksheets). Each of these manual corrections increases the risk of
introducing further mistakes. For these reasons, the use of named variables and
formulas that use variable names is the norm today.

Other problems associated with spreadsheets include:[23][24]

• Some sources advocate the use of specialized software instead of spreadsheets for
some applications (budgeting, statistics)[25][26][27]

• Many spreadsheet software products, such as Microsoft Excel[28] (versions prior to


2007) and OpenOffice.org Calc[29] (versions prior to 2008), have a capacity limit
of 65,536 rows by 256 columns. This can present a problem for people using very
large datasets, and may result in lost data.
• Lack of auditing and revision control. This makes it difficult to determine who
changed what and when. This can cause problems with regulatory compliance.
Lack of revision control greatly increases the risk of errors due the inability to
track, isolate and test changes made to a document.
• Lack of security. Generally, if one has permission to open a spreadsheet, one has
permission to modify any part of it. This, combined with the lack of auditing
above, can make it easy for someone to commit fraud.
• Because they are loosely structured, it is easy for someone to introduce an error,
either accidentally or intentionally, by entering information in the wrong place or
expressing dependencies among cells (such as in a formula) incorrectly.[30][31]
• The results of a formula (example "=A1*B1") applies only to a single cell (that is,
the cell the formula is actually located in - in this case perhaps C1), even though it
can "extract" data from many other cells, and even real time dates and actual
times. This means that to cause a similar calculation on an array of cells, an
almost identical formula (but residing in its own "output" cell) must be repeated
for each row of the "input" array. This differs from a "formula" in a conventional
computer program which would typically have one calculation which would then
apply to all of the input in turn. With current spreadsheets, this forced repetition
of near identical formulas can have detrimental consequences from a quality
assurance standpoint and is often the cause of many spreadsheet errors. Some
spreadsheets have array formulas to address this issue.
• Trying to manage the sheer volume of spreadsheets which sometimes exists
within an organization without proper security, audit trails, the unintentional
introduction of errors and other items listed above can become overwhelming.

While there are built-in and third-party tools for desktop spreadsheet applications that
address some of these shortcomings, awareness and use of these is generally low. A good
example of this is that 55% of Capital market professionals "don't know" how their
spreadsheets are audited; only 6% invest in a third-party solution[32]

A database consists of an organized collection of data for one or more uses, typically in
digital form. One way of classifying databases involves the type of their contents, for
example: bibliographic, document-text, statistical. Digital databases are managed using
database management systems, which store database contents, allowing data creation and
maintenance, and search and other access.

Contents
[hide]

• 1 Architecture
• 2 Database management systems
o 2.1 Components of DBMS
 2.1.1 RDBMS components
 2.1.2 ODBMS components
• 3 Types
o 3.1 Operational database
o 3.2 Data warehouse
o 3.3 Analytical database
o 3.4 Distributed database
o 3.5 End-user database
o 3.6 External database
o 3.7 Hypermedia databases
• 4 Models
o 4.1 Post-relational database models
o 4.2 Object database models
• 5 Storage structures
• 6 Indexing
• 7 Transactions
• 8 Replication
• 9 Security
o 9.1 Confidentiality
• 10 Locking
o 10.1 Granularity
o 10.2 Lock types
o 10.3 Isolation
o 10.4 Deadlocks
• 11 See also
• 12 References
• 13 Further reading

• 14 External links

[edit] Architecture
Database architecture consists of three levels, external, conceptual and internal. Clearly
separating the three levels was a major feature of the relational database model that
dominates 21st century databases.[1]

The external level defines how users understand the organization of the data. A single
database can have any number of views at the external level. The internal level defines
how the data is physically stored and processed by the computing system. Internal
architecture is concerned with cost, performance, scalability and other operational
matters. The conceptual is a level of indirection between internal and external. It provides
a common view of the database that is uncomplicated by details of how the data is stored
or managed, and that can unify the various external views into a coherent whole.[1]

[edit] Database management systems


Main article: Database management system

A database management system (DBMS) consists of software that operates databases,


providing storage, access, security, backup and other facilities. Database management
systems can be categorized according to the database model that they support, such as
relational or XML, the type(s) of computer they support, such as a server cluster or a
mobile phone, the query language(s) that access the database, such as SQL or XQuery,
performance trade-offs, such as maximum scale or maximum speed or others. Some
DBMS cover more than one entry in these categories, e.g., supporting multiple query
languages.

[edit] Components of DBMS

Most DBMS as of 2009 implement a relational model.[2] Other DBMS systems, such as
Object DBMS, offer specific features for more specialized requirements. Their
components are similar, but not identical.

[edit] RDBMS components

• Sublanguages— Relational DBMS (RDBMS) include Data Definition Language


(DDL) for defining the structure of the database, Data Control Language (DCL)
for defining security/access controls, and Data Manipulation Language (DML) for
querying and updating data.
• Interface drivers—These drivers are code libraries that provide methods to
prepare statements, execute statements, fetch results, etc. Examples include
ODBC, JDBC, MySQL/PHP, FireBird/Python.
• SQL engine—This component interprets and executes the DDL, DCL, and DML
statements. It includes three major components (compiler, optimizer, and
executor).
• Transaction engine—Ensures that multiple SQL statements either succeed or fail
as a group, according to application dictates.
• Relational engine—Relational objects such as Table, Index, and Referential
integrity constraints are implemented in this component.
• Storage engine—This component stores and retrieves data from secondary
storage, as well as managing transaction commit and rollback, backup and
recovery, etc.

[edit] ODBMS components

Object DBMS (ODBMS) has transaction and storage components that are analogous to
those in an RDBMS. Some ODBMS handle DDL, DCL and update tasks differently.
Instead of using sublanguages, they provide APIs for these purposes. They typically
include a sublanguage and accompanying engine for processing queries with interpretive
statements analogous to but not the same as SQL. Example object query languages are
OQL, LINQ, JDOQL, JPAQL and others. The query engine returns collections of objects
instead of relational rows.

[edit] Types
[edit] Operational database

These databases store detailed data about the operations of an organization. They are
typically organized by subject matter, process relatively high volumes of updates using
transactions. Essentially every major organization on earth uses such databases.
Examples include customer databases that record contact, credit, and demographic
information about a business' customers, personnel databases that hold information such
as salary, benefits, skills data about employees, manufacturing databases that record
details about product components, parts inventory, and financial databases that keep track
of the organization's money, accounting and financial dealings.

[edit] Data warehouse

Data warehouses archive historical data from operational databases and often from
external sources such as market research firms. Often operational data undergoes
transformation on its way into the warehouse, getting summarized, anonymized,
reclassified, etc. The warehouse becomes the central source of data for use by managers
and other end-users who may not have access to operational data. For example, sales data
might be aggregated to weekly totals and converted from internal product codes to use
UPC codes so that it can be compared with ACNielsen data.

[edit] Analytical database

Analysts may do their work directly against a data warehouse, or create a separate
analytic database for Online Analytical Processing. For example, a company might
extract sales records for analyzing the effectiveness of advertising and other sales
promotions at an aggregate level.

[edit] Distributed database

These are databases of local work-groups and departments at regional offices, branch
offices, manufacturing plants and other work sites. These databases can include segments
of both common operational and common user databases, as well as data generated and
used only at a user’s own site.

[edit] End-user database

These databases consist of data developed by individual end-users. Examples of these are
collections of documents in spreadsheets, word processing and downloaded files, or even
managing their personal baseball card collection.

[edit] External database

These databases contain data collect for use across multiple organizations, either freely or
via subscription. The Internet Movie Database is one example.

[edit] Hypermedia databases

The Worldwide web can be thought of as a database, albeit one spread across millions of
independent computing systems. Web browsers "process" this data one page at a time,
while web crawlers and other software provide the equivalent of database indexes to
support search and other activities.

[edit] Models
Main article: Database model

[edit] Post-relational database models

Products offering a more general data model than the relational model are sometimes
classified as post-relational.[3] Alternate terms include "hybrid database", "Object-
enhanced RDBMS" and others. The data model in such products incorporates relations
but is not constrained by E.F. Codd's Information Principle, which requires that
all information in the database must be cast explicitly in terms of values in relations and
in no other way[4]

Some of these extensions to the relational model integrate concepts from technologies
that pre-date the relational model. For example, they allow representation of a directed
graph with trees on the nodes.

Some post-relational products extend relational systems with non-relational features.


Others arrived in much the same place by adding relational features to pre-relational
systems. Paradoxically, this allows products that are historically pre-relational, such as
PICK and MUMPS, to make a plausible claim to be post-relational.

[edit] Object database models

Main article: Object database

In recent years, the object-oriented paradigm has been applied in areas such as
engineering and spatial databases, telecommunications and in various scientific domains.
The conglomeration of object oriented programming and database technology led to this
new kind of database. These databases attempt to bring the database world and the
application-programming world closer together, in particular by ensuring that the
database uses the same type system as the application program. This aims to avoid the
overhead (sometimes referred to as the impedance mismatch) of converting information
between its representation in the database (for example as rows in tables) and its
representation in the application program (typically as objects). At the same time, object
databases attempt to introduce key ideas of object programming, such as encapsulation
and polymorphism, into the world of databases.

A variety of these ways have been tried[by whom?] for storing objects in a database. Some
products have approached the problem from the application-programming side, by
making the objects manipulated by the program persistent. This also typically requires
the addition of some kind of query language, since conventional programming languages
do not provide language-level functionality for finding objects based on their information
content. Others[which?] have attacked the problem from the database end, by defining an
object-oriented data model for the database, and defining a database programming
language that allows full programming capabilities as well as traditional query facilities.

[edit] Storage structures


Main article: Database storage structures
This section requires expansion.

Databases may store relational tables/indexes in memory or on hard disk in one of many
forms:
• ordered/unordered flat files
• ISAM
• heaps
• hash buckets
• logically-blocked files
• B+ trees

The most commonly used[citation needed] are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory-mapped
files to make the native language (C++, Java etc.) objects persistent. This can be highly
efficient but it can make multi-language access more difficult. Others disassemble objects
into fixed- and varying-length components that are then clustered in fixed sized blocks on
disk and reassembled into the appropriate format on either the client or server address
space. Another popular technique involves storing the objects in tuples (much like a
relational database) which the database server then reassembles into objects for the client.
[citation needed]

Other techniques include clustering by category (such as grouping data by month, or


location), storing pre-computed query results, known as materialized views, partitioning
data by range (e.g., a data range) or by hash.

Memory management and storage topology can be important design choices for database
designers as well. Just as normalization is used to reduce storage requirements and
improve database designs, conversely denormalization is often used to reduce join
complexity and reduce query execution time.[5]

[edit] Indexing
Main article: Index (database)

Indexing is a technique for improving database performance. The many types of index
share the common property that they eliminate the need to examine every entry when
running a query. In large databases, this can reduce query time/cost by orders of
magnitude. The simplest form of index is a sorted list of values that can be searched
using a binary search with an adjacent reference to the location of the entry, analogous to
the index in the back of a book. The same data can have multiple indexes (an employee
database could be indexed by last name and hire date.)

Indexes affect performance, but not results. Database designers can add or remove
indexes without changing application logic, reducing maintenance costs as the database
grows and database usage evolves.

Given a particular query, the DBMS' query optimizer is responsible for devising the most
efficient strategy for finding matching data. The optimizer decides which index or
indexes to use, how to combine data from different parts of the database, how to provide
data in the order requested, etc.

Indexes can speed up data access, but they consume space in the database, and must be
updated each time the data are altered. Indexes therefore can speed data access but slow
data maintenance. These two properties determine whether a given index is worth the
cost.

[edit] Transactions
Main articles: Database transaction and Concurrency control

Most DBMS provide some form of support for transactions, which allow multiple data
items to be updated in a consistent fashion, such that updates that are part of a transaction
succeed or fail in unison. The so-called ACID rules, summarized here, characterize this
behavior:

• Atomicity: Either all the data changes in a transaction must happen, or none of
them. The transaction must be completed, or else it must be undone (rolled back).
• Consistency: Every transaction must preserve the declared consistency rules for
the database.
• Isolation: Two concurrent transactions cannot interfere with one another.
Intermediate results within one transaction must remain invisible to other
transactions. The most extreme form of isolation is serializability, meaning that
transactions that take place concurrently could instead be performed in some
series, without affecting the ultimate result.
• Durability: Completed transactions cannot be aborted later or their results
discarded. They must persist through (for instance) DBMS restarts.

In practice, many DBMSs allow the selective relaxation of these rules to balance perfect
behavior with optimum performance.

[edit] Replication
Main article: Database replication

Database replication involves maintaining multiple copies of a database on different


computers, to allow more users to access it, or to allow a secondary site to immediately
take over if the primary site stops working. Some DBMS piggyback replication on top of
their transaction logging facility, applying the primary's log to the secondary in near real-
time. Database clustering is a related concept for handling larger databases and user
communities by employing a cluster of multiple computers to host a single database that
can use replication as part of its approach.[6][7]

[edit] Security
Main article: Database security

Database security denotes the system, processes, and procedures that protect a database
from unauthorized activity.

DBMSs usually enforce security through access control, auditing, and encryption:

• Access control manages who can connect to the database via authentication and
what they can do via authorization.
• Auditing records information about database activity: who, what, when, and
possibly where.
• Encryption protects data at the lowest possible level by storing and possibly
transmitting data in an unreadable form. The DBMS encrypts data when it is
added to the database and decrypts it when returning query results. This process
can occur on the client side of a network connection to prevent unauthorized
access at the point of use.

[edit] Confidentiality

Law and regulation governs the release of information from some databases, protecting
medical history, driving records, telephone logs, etc.

In the United Kingdom, database privacy regulation falls under the Office of the
Information Commissioner. Organizations based in the United Kingdom and holding
personal data in digital format such as databases must register with the Office.[8]

[edit] Locking
When a transaction modifies a resource, the DBMS stops other transactions from also
modifying it, typically by locking it. Locks also provide one method of ensuring that data
does not change while a transaction is reading it or even that it doesn't change until a
transaction that once read it has completed.

[edit] Granularity

Locks can be coarse, covering an entire database, fine-grained, covering a single data
item, or intermediate covering a collection of data such as all the rows in a RDBMS table.

[edit] Lock types

Locks can be shared[9] or exclusive, and can lock out readers and/or writers. Locks can be
created implicitly by the DBMS when a transaction performs an operation, or explicitly at
the transaction's request.
Shared locks allow multiple transactions to lock the same resource. The lock persists until
all such transactions complete. Exclusive locks are held by a single transaction and
prevent other transactions from locking the same resource.

Read locks are usually shared, and prevent other transactions from modifying the
resource. Write locks are exclusive, and prevent other transactions from modifying the
resource. On some systems, write locks also prevent other transactions from reading the
resource.

The DBMS implicitly locks data when it is updated, and may also do so when it is read.
Transactions explicitly lock data to ensure that they can complete without a deadlock or
other complication. Explicit locks may be useful for some administrative tasks.[10][11]

Locking can significantly affect database performance, especially with large and complex
transactions in highly concurrent environments.

[edit] Isolation

Isolation refers to the ability of one transaction to see the results of other transactions.
Greater isolation typically reduces performance and/or concurrency, leading DBMSs to
provide administrative options to reduce isolation. For example, in a database that
analyzes trends rather than looking at low-level detail, increased performance might
justify allowing readers to see uncommitted changes ("dirty reads".)

[edit] Deadlocks

Deadlocks occur when two transactions each require data that the other has already
locked exclusively. Deadlock detection is performed by the DBMS, which then aborts
one of the transactions and allows the other to complete.

The Internet is a global system of interconnected computer networks that use the
standard Internet Protocol Suite (TCP/IP) to serve billions of users worldwide. It is a
network of networks that consists of millions of private, public, academic, business, and
government networks of local to global scope that are linked by a broad array of
electronic and optical networking technologies. The Internet carries a vast array of
information resources and services, most notably the inter-linked hypertext documents of
the World Wide Web (WWW) and the infrastructure to support electronic mail.

Most traditional communications media, such as telephone and television services, are
reshaped or redefined using the technologies of the Internet, giving rise to services such
as Voice over Internet Protocol (VoIP) and IPTV. Newspaper publishing has been
reshaped into Web sites, blogging, and web feeds. The Internet has enabled or accelerated
the creation of new forms of human interactions through instant messaging, Internet
forums, and social networking sites.
The origins of the Internet reach back to the 1960s when the United States funded
research projects of its military agencies to build robust, fault-tolerant, and distributed
computer networks. This research and a period of civilian funding of a new U.S.
backbone by the National Science Foundation spawned worldwide participation in the
development of new networking technologies and led to the commercialization of an
international network in the mid 1990s, and resulted in the following popularization of
countless applications in virtually every aspect of modern human life. As of 2009, an
estimated quarter of Earth's population uses the services of the Internet.

The Internet has no centralized governance in either technological implementation or


policies for access and usage; each constituent network sets its own standards. Only the
overreaching definitions of the two principal name spaces in the Internet, the Internet
Protocol address space and the Domain Name System, are directed by a maintainer
organization, the Internet Corporation for Assigned Names and Numbers (ICANN). The
technical underpinning and standardization of the core protocols (IPv4 and IPv6) is an
activity of the Internet Engineering Task Force (IETF), a non-profit organization of
loosely affiliated international participants that anyone may associate with by
contributing technical expertise.

Contents
[hide]

• 1 Terminology
• 2 History
• 3 Technology
o 3.1 Protocols
o 3.2 Structure
• 4 Governance
• 5 Modern uses
• 6 Services
o 6.1 Information
o 6.2 Communication
o 6.3 Data transfer
• 7 Access
• 8 Social impact
• 9 See also
• 10 Notes
• 11 References

• 12 External links

Terminology
See also: Internet capitalization conventions
The terms Internet and World Wide Web are often used in everyday speech without much
distinction. However, the Internet and the World Wide Web are not one and the same.
The Internet is a global data communications system. It is a hardware and software
infrastructure that provides connectivity between computers. In contrast, the Web is one
of the services communicated via the Internet. It is a collection of interconnected
documents and other resources, linked by hyperlinks and URLs.[1]

The term the Internet, when referring to the entire global system of IP networks, has
traditionally been treated as a proper noun and written with an initial capital letter. In the
media and popular culture a trend has developed to regard it as a generic term or common
noun and thus write it as "the internet", without capitalization.

The Internet is also often simply referred to as the net. In many technical illustrations
when the precise location or interrelation of Internet resources is not important, the
Internet is often referred as the cloud, and literally depicted as such.

History
Main article: History of the Internet

The USSR's launch of Sputnik spurred the United States to create the Advanced Research
Projects Agency (ARPA or DARPA) in February 1958 to regain a technological lead.[2][3]
ARPA created the Information Processing Technology Office (IPTO) to further the
research of the Semi Automatic Ground Environment (SAGE) program, which had
networked country-wide radar systems together for the first time. The IPTO's purpose
was to find ways to address the US Military's concern about survivability of their
communications networks, and as a first step interconnect their computers at the
Pentagon, Cheyenne Mountain, and SAC HQ. J. C. R. Licklider, a promoter of universal
networking, was selected to head the IPTO. Licklider moved from the Psycho-Acoustic
Laboratory at Harvard University to MIT in 1950, after becoming interested in
information technology. At MIT, he served on a committee that established Lincoln
Laboratory and worked on the SAGE project. In 1957 he became a Vice President at
BBN, where he bought the first production PDP-1 computer and conducted the first
public demonstration of time-sharing.
Professor Leonard Kleinrock with one of the first ARPANET Interface Message
Processors at UCLA

At the IPTO, Licklider's successor Ivan Sutherland in 1965 got Lawrence Roberts to start
a project to make a network, and Roberts based the technology on the work of Paul
Baran,[4] who had written an exhaustive study for the United States Air Force that
recommended packet switching (opposed to circuit switching) to achieve better network
robustness and disaster survivability. Roberts had worked at the MIT Lincoln Laboratory
originally established to work on the design of the SAGE system. UCLA professor
Leonard Kleinrock had provided the theoretical foundations for packet networks in 1962,
and later, in the 1970s, for hierarchical routing, concepts which have been the
underpinning of the development towards today's Internet.

Sutherland's successor Robert Taylor convinced Roberts to build on his early packet
switching successes and come and be the IPTO Chief Scientist. Once there, Roberts
prepared a report called Resource Sharing Computer Networks which was approved by
Taylor in June 1968 and laid the foundation for the launch of the working ARPANET the
following year.

After much work, the first two nodes of what would become the ARPANET were
interconnected between Kleinrock's Network Measurement Center at the UCLA's School
of Engineering and Applied Science and Douglas Engelbart's NLS system at SRI
International (SRI) in Menlo Park, California, on 29 October 1969. The third site on the
ARPANET was the Culler-Fried Interactive Mathematics centre at the University of
California at Santa Barbara, and the fourth was the University of Utah Graphics
Department. In an early sign of future growth, there were already fifteen sites connected
to the young ARPANET by the end of 1971.

The ARPANET was one of the "eve" networks of today's Internet. In an independent
development, Donald Davies at the UK National Physical Laboratory also discovered the
concept of packet switching in the early 1960s, first giving a talk on the subject in 1965,
after which the teams in the new field from two sides of the Atlantic ocean first became
acquainted. It was actually Davies' coinage of the wording "packet" and "packet
switching" that was adopted as the standard terminology. Davies also built a packet
switched network in the UK called the Mark I in 1970. [5]

Following the demonstration that packet switching worked on the ARPANET, the British
Post Office, Telenet, DATAPAC and TRANSPAC collaborated to create the first
international packet-switched network service. In the UK, this was referred to as the
International Packet Switched Service (IPSS), in 1978. The collection of X.25-based
networks grew from Europe and the US to cover Canada, Hong Kong and Australia by
1981. The X.25 packet switching standard was developed in the CCITT (now called ITU-
T) around 1976.

A plaque commemorating the birth of the Internet at Stanford University

X.25 was independent of the TCP/IP protocols that arose from the experimental work of
DARPA on the ARPANET, Packet Radio Net and Packet Satellite Net during the same
time period.

The early ARPANET ran on the Network Control Program (NCP), a standard designed
and first implemented in December 1970 by a team called the Network Working Group
(NWG) led by Steve Crocker. To respond to the network's rapid growth as more and
more locations connected, Vinton Cerf and Robert Kahn developed the first description
of the now widely used TCP protocols during 1973 and published a paper on the subject
in May 1974. Use of the term "Internet" to describe a single global TCP/IP network
originated in December 1974 with the publication of RFC 675, the first full specification
of TCP that was written by Vinton Cerf, Yogen Dalal and Carl Sunshine, then at Stanford
University. During the next nine years, work proceeded to refine the protocols and to
implement them on a wide range of operating systems. The first TCP/IP-based wide-area
network was operational by 1 January 1983 when all hosts on the ARPANET were
switched over from the older NCP protocols. In 1985, the United States' National Science
Foundation (NSF) commissioned the construction of the NSFNET, a university 56
kilobit/second network backbone using computers called "fuzzballs" by their inventor,
David L. Mills. The following year, NSF sponsored the conversion to a higher-speed
1.5 megabit/second network. A key decision to use the DARPA TCP/IP protocols was
made by Dennis Jennings, then in charge of the Supercomputer program at NSF.

The opening of the network to commercial interests began in 1988. The US Federal
Networking Council approved the interconnection of the NSFNET to the commercial
MCI Mail system in that year and the link was made in the summer of 1989. Other
commercial electronic e-mail services were soon connected, including OnTyme, Telemail
and Compuserve. In that same year, three commercial Internet service providers (ISPs)
were created: UUNET, PSINet and CERFNET. Important, separate networks that offered
gateways into, then later merged with, the Internet include Usenet and BITNET. Various
other commercial and educational networks, such as Telenet, Tymnet, Compuserve and
JANET were interconnected with the growing Internet. Telenet (later called Sprintnet)
was a large privately funded national computer network with free dial-up access in cities
throughout the U.S. that had been in operation since the 1970s. This network was
eventually interconnected with the others in the 1980s as the TCP/IP protocol became
increasingly popular. The ability of TCP/IP to work over virtually any pre-existing
communication networks allowed for a great ease of growth, although the rapid growth of
the Internet was due primarily to the availability of an array of standardized commercial
routers from many companies, the availability of commercial Ethernet equipment for
local-area networking, and the widespread implementation and rigorous standardization
of TCP/IP on UNIX and virtually every other common operating system.

This NeXT Computer was used by Sir Tim Berners-Lee at CERN and became the world's
first Web server.

Although the basic applications and guidelines that make the Internet possible had existed
for almost two decades, the network did not gain a public face until the 1990s. On 6
August 1991, CERN, a pan European organization for particle research, publicized the
new World Wide Web project. The Web was invented by British scientist Tim Berners-
Lee in 1989. An early popular web browser was ViolaWWW, patterned after HyperCard
and built using the X Window System. It was eventually replaced in popularity by the
Mosaic web browser. In 1993, the National Center for Supercomputing Applications at
the University of Illinois released version 1.0 of Mosaic, and by late 1994 there was
growing public interest in the previously academic, technical Internet. By 1996 usage of
the word Internet had become commonplace, and consequently, so had its use as a
synecdoche in reference to the World Wide Web.
Meanwhile, over the course of the decade, the Internet successfully accommodated the
majority of previously existing public computer networks (although some networks, such
as FidoNet, have remained separate). During the 1990s, it was estimated that the Internet
grew by 100 percent per year, with a brief period of explosive growth in 1996 and 1997.
[6]
This growth is often attributed to the lack of central administration, which allows
organic growth of the network, as well as the non-proprietary open nature of the Internet
protocols, which encourages vendor interoperability and prevents any one company from
exerting too much control over the network.[7] The estimated population of Internet users
is 1.97 billion as of 30 June 2010.[8]

Technology
Protocols

Main article: Internet Protocol Suite

The complex communications infrastructure of the Internet consists of its hardware


components and a system of software layers that control various aspects of the
architecture. While the hardware can often be used to support other software systems, it is
the design and the rigorous standardization process of the software architecture that
characterizes the Internet and provides the foundation for its scalability and success. The
responsibility for the architectural design of the Internet software systems has been
delegated to the Internet Engineering Task Force (IETF).[9] The IETF conducts standard-
setting work groups, open to any individual, about the various aspects of Internet
architecture. Resulting discussions and final standards are published in a series of
publications, each called a Request for Comments (RFC), freely available on the IETF
web site. The principal methods of networking that enable the Internet are contained in
specially designated RFCs that constitute the Internet Standards. Other less rigorous
documents are simply informative, experimental, or historical, or document the best
current practices (BCP) when implementing Internet technologies.

The Internet Standards describe a framework known as the Internet Protocol Suite. This
is a model architecture that divides methods into a layered system of protocols (RFC
1122, RFC 1123). The layers correspond to the environment or scope in which their
services operate. At the top is the Application Layer, the space for the application-
specific networking methods used in software applications, e.g., a web browser program.
Below this top layer, the Transport Layer connects applications on different hosts via the
network (e.g., client–server model) with appropriate data exchange methods. Underlying
these layers are the core networking technologies, consisting of two layers. The Internet
Layer enables computers to identify and locate each other via Internet Protocol (IP)
addresses, and allows them to connect to one-another via intermediate (transit) networks.
Lastly, at the bottom of the architecture, is a software layer, the Link Layer, that provides
connectivity between hosts on the same local network link, such as a local area network
(LAN) or a dial-up connection. The model, also known as TCP/IP, is designed to be
independent of the underlying hardware which the model therefore does not concern
itself with in any detail. Other models have been developed, such as the Open Systems
Interconnection (OSI) model, but they are not compatible in the details of description, nor
implementation, but many similarities exist and the TCP/IP protocols are usually
included in the discussion of OSI networking.

The most prominent component of the Internet model is the Internet Protocol (IP) which
provides addressing systems (IP addresses) for computers on the Internet. IP enables
internetworking and essentially establishes the Internet itself. IP Version 4 (IPv4) is the
initial version used on the first generation of the today's Internet and is still in dominant
use. It was designed to address up to ~4.3 billion (109) Internet hosts. However, the
explosive growth of the Internet has led to IPv4 address exhaustion which is estimated to
enter its final stage in approximately 2011.[10] A new protocol version, IPv6, was
developed in the mid 1990s which provides vastly larger addressing capabilities and more
efficient routing of Internet traffic. IPv6 is currently in commercial deployment phase
around the world and Internet address registries (RIRs) have begun to urge all resource
managers to plan rapid adoption and conversion.[11]

IPv6 is not interoperable with IPv4. It essentially establishes a "parallel" version of the
Internet not directly accessible with IPv4 software. This means software upgrades or
translator facilities are necessary for every networking device that needs to communicate
on the IPv6 Internet. Most modern computer operating systems are already converted to
operate with both versions of the Internet Protocol. Network infrastructures, however, are
still lagging in this development. Aside from the complex physical connections that make
up its infrastructure, the Internet is facilitated by bi- or multi-lateral commercial contracts
(e.g., peering agreements), and by technical specifications or protocols that describe how
to exchange data over the network. Indeed, the Internet is defined by its interconnections
and routing policies.

Structure

The Internet structure and its usage characteristics have been studied extensively. It has
been determined that both the Internet IP routing structure and hypertext links of the
World Wide Web are examples of scale-free networks. Similar to the way the
commercial Internet providers connect via Internet exchange points, research networks
tend to interconnect into large subnetworks such as GEANT, GLORIAD, Internet2
(successor of the Abilene Network), and the UK's national research and education
network JANET. These in turn are built around smaller networks (see also the list of
academic computer network organizations).

Many computer scientists describe the Internet as a "prime example of a large-scale,


highly engineered, yet highly complex system".[12] The Internet is extremely
heterogeneous; for instance, data transfer rates and physical characteristics of connections
vary widely. The Internet exhibits "emergent phenomena" that depend on its large-scale
organization. For example, data transfer rates exhibit temporal self-similarity. The
principles of the routing and addressing methods for traffic in the Internet reach back to
their origins the 1960s when the eventual scale and popularity of the network could not
be anticipated. Thus, the possibility of developing alternative structures is investigated.[13]
Governance
Main article: Internet governance

ICANN headquarters in Marina Del Rey, California, United States

The Internet is a globally distributed network comprising many voluntarily


interconnected autonomous networks. It operates without a central governing body.
However, to maintain interoperability, all technical and policy aspects of the underlying
core infrastructure and the principal name spaces are administered by the Internet
Corporation for Assigned Names and Numbers (ICANN), headquartered in Marina del
Rey, California. ICANN is the authority that coordinates the assignment of unique
identifiers for use on the Internet, including domain names, Internet Protocol (IP)
addresses, application port numbers in the transport protocols, and many other
parameters. Globally unified name spaces, in which names and numbers are uniquely
assigned, are essential for the global reach of the Internet. ICANN is governed by an
international board of directors drawn from across the Internet technical, business,
academic, and other non-commercial communities. The US government continues to
have the primary role in approving changes to the DNS root zone that lies at the heart of
the domain name system. ICANN's role in coordinating the assignment of unique
identifiers distinguishes it as perhaps the only central coordinating body on the global
Internet. On 16 November 2005, the World Summit on the Information Society, held in
Tunis, established the Internet Governance Forum (IGF) to discuss Internet-related
issues.

Modern uses
The Internet is allowing greater flexibility in working hours and location, especially with
the spread of unmetered high-speed connections and web applications.

The Internet can now be accessed almost anywhere by numerous means, especially
through mobile Internet devices. Mobile phones, datacards, handheld game consoles and
cellular routers allow users to connect to the Internet from anywhere there is a wireless
network supporting that device's technology. Within the limitations imposed by small
screens and other limited facilities of such pocket-sized devices, services of the Internet,
including email and the web, may be available. Service providers may restrict the
services offered and wireless data transmission charges may be significantly higher than
other access methods.

The Internet has also become a large market for companies; some of the biggest
companies today have grown by taking advantage of the efficient nature of low-cost
advertising and commerce through the Internet, also known as e-commerce. It is the
fastest way to spread information to a vast number of people simultaneously. The Internet
has also subsequently revolutionized shopping—for example; a person can order a CD
online and receive it in the mail within a couple of days, or download it directly in some
cases. The Internet has also greatly facilitated personalized marketing which allows a
company to market a product to a specific person or a specific group of people more so
than any other advertising medium. Examples of personalized marketing include online
communities such as MySpace, Friendster, Facebook, Twitter, Orkut and others which
thousands of Internet users join to advertise themselves and make friends online. Many of
these users are young teens and adolescents ranging from 13 to 25 years old. In turn,
when they advertise themselves they advertise interests and hobbies, which online
marketing companies can use as information as to what those users will purchase online,
and advertise their own companies' products to those users.

The low cost and nearly instantaneous sharing of ideas, knowledge, and skills has made
collaborative work dramatically easier, with the help of collaborative software. Not only
can a group cheaply communicate and share ideas, but the wide reach of the Internet
allows such groups to easily form in the first place. An example of this is the free
software movement, which has produced, among other programs, Linux, Mozilla Firefox,
and OpenOffice.org. Internet "chat", whether in the form of IRC chat rooms or channels,
or via instant messaging systems, allow colleagues to stay in touch in a very convenient
way when working at their computers during the day. Messages can be exchanged even
more quickly and conveniently than via e-mail. Extensions to these systems may allow
files to be exchanged, "whiteboard" drawings to be shared or voice and video contact
between team members.

Version control systems allow collaborating teams to work on shared sets of documents
without either accidentally overwriting each other's work or having members wait until
they get "sent" documents to be able to make their contributions. Business and project
teams can share calendars as well as documents and other information. Such
collaboration occurs in a wide variety of areas including scientific research, software
development, conference planning, political activism and creative writing. Social and
political collaboration is also becoming more widespread as both Internet access and
computer literacy grow. From the flash mob 'events' of the early 2000s to the use of
social networking in the 2009 Iranian election protests, the Internet allows people to work
together more effectively and in many more ways than was possible without it.

The Internet allows computer users to remotely access other computers and information
stores easily, wherever they may be across the world. They may do this with or without
the use of security, authentication and encryption technologies, depending on the
requirements. This is encouraging new ways of working from home, collaboration and
information sharing in many industries. An accountant sitting at home can audit the
books of a company based in another country, on a server situated in a third country that
is remotely maintained by IT specialists in a fourth. These accounts could have been
created by home-working bookkeepers, in other remote locations, based on information
e-mailed to them from offices all over the world. Some of these things were possible
before the widespread use of the Internet, but the cost of private leased lines would have
made many of them infeasible in practice. An office worker away from their desk,
perhaps on the other side of the world on a business trip or a holiday, can open a remote
desktop session into his normal office PC using a secure Virtual Private Network (VPN)
connection via the Internet. This gives the worker complete access to all of his or her
normal files and data, including e-mail and other applications, while away from the
office. This concept has been referred to among system administrators as the Virtual
Private Nightmare,[14] because it extends the secure perimeter of a corporate network into
its employees' homes.

Services
Information

Many people use the terms Internet and World Wide Web, or just the Web,
interchangeably, but the two terms are not synonymous. The World Wide Web is a global
set of documents, images and other resources, logically interrelated by hyperlinks and
referenced with Uniform Resource Identifiers (URIs). URIs allow providers to
symbolically identify services and clients to locate and address web servers, file servers,
and other databases that store documents and provide resources and access them using
the Hypertext Transfer Protocol (HTTP), the primary carrier protocol of the Web. HTTP
is only one of the hundreds of communication protocols used on the Internet. Web
services may also use HTTP to allow software systems to communicate in order to share
and exchange business logic and data.

World Wide Web browser software, such as Microsoft's Internet Explorer, Mozilla
Firefox, Opera, Apple's Safari, and Google Chrome, let users navigate from one web
page to another via hyperlinks embedded in the documents. These documents may also
contain any combination of computer data, including graphics, sounds, text, video,
multimedia and interactive content including games, office applications and scientific
demonstrations. Through keyword-driven Internet research using search engines like
Yahoo! and Google, users worldwide have easy, instant access to a vast and diverse
amount of online information. Compared to printed encyclopedias and traditional
libraries, the World Wide Web has enabled the decentralization of information.

The Web has also enabled individuals and organizations to publish ideas and information
to a potentially large audience online at greatly reduced expense and time delay.
Publishing a web page, a blog, or building a website involves little initial cost and many
cost-free services are available. Publishing and maintaining large, professional web sites
with attractive, diverse and up-to-date information is still a difficult and expensive
proposition, however. Many individuals and some companies and groups use web logs or
blogs, which are largely used as easily updatable online diaries. Some commercial
organizations encourage staff to communicate advice in their areas of specialization in
the hope that visitors will be impressed by the expert knowledge and free information,
and be attracted to the corporation as a result. One example of this practice is Microsoft,
whose product developers publish their personal blogs in order to pique the public's
interest in their work. Collections of personal web pages published by large service
providers remain popular, and have become increasingly sophisticated. Whereas
operations such as Angelfire and GeoCities have existed since the early days of the Web,
newer offerings from, for example, Facebook and MySpace currently have large
followings. These operations often brand themselves as social network services rather
than simply as web page hosts.

Advertising on popular web pages can be lucrative, and e-commerce or the sale of
products and services directly via the Web continues to grow. In the early days, web
pages were usually created as sets of complete and isolated HTML text files stored on a
web server. More recently, websites are more often created using content management or
wiki software with, initially, very little content. Contributors to these systems, who may
be paid staff, members of a club or other organization or members of the public, fill
underlying databases with content using editing pages designed for that purpose, while
casual visitors view and read this content in its final HTML form. There may or may not
be editorial, approval and security systems built into the process of taking newly entered
content and making it available to the target visitors.

Communication

E-mail is an important communications service available on the Internet. The concept of


sending electronic text messages between parties in a way analogous to mailing letters or
memos predates the creation of the Internet. Today it can be important to distinguish
between internet and internal e-mail systems. Internet e-mail may travel and be stored
unencrypted on many other networks and machines out of both the sender's and the
recipient's control. During this time it is quite possible for the content to be read and even
tampered with by third parties, if anyone considers it important enough. Purely internal or
intranet mail systems, where the information never leaves the corporate or organization's
network, are much more secure, although in any organization there will be IT and other
personnel whose job may involve monitoring, and occasionally accessing, the e-mail of
other employees not addressed to them. Pictures, documents and other files can be sent as
e-mail attachments. E-mails can be cc-ed to multiple e-mail addresses.

Internet telephony is another common communications service made possible by the


creation of the Internet. VoIP stands for Voice-over-Internet Protocol, referring to the
protocol that underlies all Internet communication. The idea began in the early 1990s
with walkie-talkie-like voice applications for personal computers. In recent years many
VoIP systems have become as easy to use and as convenient as a normal telephone. The
benefit is that, as the Internet carries the voice traffic, VoIP can be free or cost much less
than a traditional telephone call, especially over long distances and especially for those
with always-on Internet connections such as cable or ADSL. VoIP is maturing into a
competitive alternative to traditional telephone service. Interoperability between different
providers has improved and the ability to call or receive a call from a traditional
telephone is available. Simple, inexpensive VoIP network adapters are available that
eliminate the need for a personal computer.

Voice quality can still vary from call to call but is often equal to and can even exceed that
of traditional calls. Remaining problems for VoIP include emergency telephone number
dialling and reliability. Currently, a few VoIP providers provide an emergency service,
but it is not universally available. Traditional phones are line-powered and operate during
a power failure; VoIP does not do so without a backup power source for the phone
equipment and the Internet access devices. VoIP has also become increasingly popular
for gaming applications, as a form of communication between players. Popular VoIP
clients for gaming include Ventrilo and Teamspeak. Wii, PlayStation 3, and Xbox 360
also offer VoIP chat features.

Data transfer

File sharing is an example of transferring large amounts of data across the Internet. A
computer file can be e-mailed to customers, colleagues and friends as an attachment. It
can be uploaded to a website or FTP server for easy download by others. It can be put
into a "shared location" or onto a file server for instant use by colleagues. The load of
bulk downloads to many users can be eased by the use of "mirror" servers or peer-to-peer
networks. In any of these cases, access to the file may be controlled by user
authentication, the transit of the file over the Internet may be obscured by encryption, and
money may change hands for access to the file. The price can be paid by the remote
charging of funds from, for example, a credit card whose details are also passed—usually
fully encrypted—across the Internet. The origin and authenticity of the file received may
be checked by digital signatures or by MD5 or other message digests. These simple
features of the Internet, over a worldwide basis, are changing the production, sale, and
distribution of anything that can be reduced to a computer file for transmission. This
includes all manner of print publications, software products, news, music, film, video,
photography, graphics and the other arts. This in turn has caused seismic shifts in each of
the existing industries that previously controlled the production and distribution of these
products.

Streaming media refers to the act that many existing radio and television broadcasters
promote Internet "feeds" of their live audio and video streams (for example, the BBC).
They may also allow time-shift viewing or listening such as Preview, Classic Clips and
Listen Again features. These providers have been joined by a range of pure Internet
"broadcasters" who never had on-air licenses. This means that an Internet-connected
device, such as a computer or something more specific, can be used to access on-line
media in much the same way as was previously possible only with a television or radio
receiver. The range of available types of content is much wider, from specialized
technical webcasts to on-demand popular multimedia services. Podcasting is a variation
on this theme, where—usually audio—material is downloaded and played back on a
computer or shifted to a portable media player to be listened to on the move. These
techniques using simple equipment allow anybody, with little censorship or licensing
control, to broadcast audio-visual material worldwide.

Webcams can be seen as an even lower-budget extension of this phenomenon. While


some webcams can give full-frame-rate video, the picture is usually either small or
updates slowly. Internet users can watch animals around an African waterhole, ships in
the Panama Canal, traffic at a local roundabout or monitor their own premises, live and in
real time. Video chat rooms and video conferencing are also popular with many uses
being found for personal webcams, with and without two-way sound. YouTube was
founded on 15 February 2005 and is now the leading website for free streaming video
with a vast number of users. It uses a flash-based web player to stream and show video
files. Registered users may upload an unlimited amount of video and build their own
personal profile. YouTube claims that its users watch hundreds of millions, and upload
hundreds of thousands of videos daily.[15]

Access
See also: Internet access worldwide, List of countries by number of Internet users,
English on the Internet, Global Internet usage, and Unicode

Graph of Internet users per 100 inhabitants between 1997 and 2007 by International
Telecommunication Union

The prevalent language for communication on the Internet is English. This may be a
result of the origin of the Internet, as well as English's role as a lingua franca. It may also
be related to the poor capability of early computers, largely originating in the United
States, to handle characters other than those in the English variant of the Latin alphabet.
After English (28% of Web visitors) the most requested languages on the World Wide
Web are Chinese (23%), Spanish (8%), Japanese (5%), Portuguese and German (4%
each), Arabic, French and Russian (3% each), and Korean (2%).[16] By region, 42% of the
world's Internet users are based in Asia, 24% in Europe, 14% in North America, 10% in
Latin America and the Caribbean taken together, 5% in Africa, 3% in the Middle East
and 1% in Australia/Oceania.[17] The Internet's technologies have developed enough in
recent years, especially in the use of Unicode, that good facilities are available for
development and communication in the world's widely used languages. However, some
glitches such as mojibake (incorrect display of some languages' characters) still remain.

Common methods of Internet access in homes include dial-up, landline broadband (over
coaxial cable, fibre optic or copper wires), Wi-Fi, satellite and 3G technology cell
phones. Public places to use the Internet include libraries and Internet cafes, where
computers with Internet connections are available. There are also Internet access points in
many public places such as airport halls and coffee shops, in some cases just for brief use
while standing. Various terms are used, such as "public Internet kiosk", "public access
terminal", and "Web payphone". Many hotels now also have public terminals, though
these are usually fee-based. These terminals are widely accessed for various usage like
ticket booking, bank deposit, online payment etc. Wi-Fi provides wireless access to
computer networks, and therefore can do so to the Internet itself. Hotspots providing such
access include Wi-Fi cafes, where would-be users need to bring their own wireless-
enabled devices such as a laptop or PDA. These services may be free to all, free to
customers only, or fee-based. A hotspot need not be limited to a confined location. A
whole campus or park, or even an entire city can be enabled. Grassroots efforts have led
to wireless community networks. Commercial Wi-Fi services covering large city areas
are in place in London, Vienna, Toronto, San Francisco, Philadelphia, Chicago and
Pittsburgh. The Internet can then be accessed from such places as a park bench.[18] Apart
from Wi-Fi, there have been experiments with proprietary mobile wireless networks like
Ricochet, various high-speed data services over cellular phone networks, and fixed
wireless services. High-end mobile phones such as smartphones generally come with
Internet access through the phone network. Web browsers such as Opera are available on
these advanced handsets, which can also run a wide variety of other Internet software.
More mobile phones have Internet access than PCs, though this is not as widely used.
[citation needed]
An Internet access provider and protocol matrix differentiates the methods
used to get online.

Social impact
Main article: Sociology of the Internet

The Internet has enabled entirely new forms of social interaction, activities, and
organizing, thanks to its basic features such as widespread usability and access. Social
networking websites such as Facebook, Twitter and MySpace have created new ways to
socialize and interact. Users of these sites are able to add a wide variety of information to
pages, to pursue common interests, and to connect with others. It is also possible to find
existing acquaintances, to allow communication among existing groups of people. Sites
like LinkedIn foster commercial and business connections. YouTube and Flickr
specialize in users' videos and photographs.

In the first decade of the 21st century the first generation is raised with widespread
availability of Internet connectivity, bringing consequences and concerns in areas such as
personal privacy and identity, and distribution of copyrighted materials. These "digital
natives" face a variety of challenges that were not present for prior generations.
The Internet has achieved new relevance as a political tool, leading to Internet censorship
by some states. The presidential campaign of Howard Dean in 2004 in the United States
was notable for its success in soliciting donation via the Internet. Many political groups
use the Internet to achieve a new method of organizing in order to carry out their mission,
having given rise to Internet activism. Some governments, such as those of Iran, North
Korea, Myanmar, the People's Republic of China, and Saudi Arabia, restrict what people
in their countries can access on the Internet, especially political and religious content.
[citation needed]
This is accomplished through software that filters domains and content so that
they may not be easily accessed or obtained without elaborate circumvention.[original research?]

In Norway, Denmark, Finland[19] and Sweden, major Internet service providers have
voluntarily, possibly to avoid such an arrangement being turned into law, agreed to
restrict access to sites listed by authorities. While this list of forbidden URLs is only
supposed to contain addresses of known child pornography sites, the content of the list is
secret.[citation needed] Many countries, including the United States, have enacted laws against
the possession or distribution of certain material, such as child pornography, via the
Internet, but do not mandate filtering software. There are many free and commercially
available software programs, called content-control software, with which a user can
choose to block offensive websites on individual computers or networks, in order to limit
a child's access to pornographic materials or depiction of violence.

The Internet has been a major outlet for leisure activity since its inception, with
entertaining social experiments such as MUDs and MOOs being conducted on university
servers, and humor-related Usenet groups receiving much traffic. Today, many Internet
forums have sections devoted to games and funny videos; short cartoons in the form of
Flash movies are also popular. Over 6 million people use blogs or message boards as a
means of communication and for the sharing of ideas. The pornography and gambling
industries have taken advantage of the World Wide Web, and often provide a significant
source of advertising revenue for other websites.[citation needed] Although many governments
have attempted to restrict both industries' use of the Internet, this has generally failed to
stop their widespread popularity.[citation needed]

One main area of leisure activity on the Internet is multiplayer gaming. This form of
recreation creates communities, where people of all ages and origins enjoy the fast-paced
world of multiplayer games. These range from MMORPG to first-person shooters, from
role-playing games to online gambling. This has revolutionized the way many people
interact[citation needed] while spending their free time on the Internet. While online gaming has
been around since the 1970s,[citation needed] modern modes of online gaming began with
subscription services such as GameSpy and MPlayer. Non-subscribers were limited to
certain types of game play or certain games. Many people use the Internet to access and
download music, movies and other works for their enjoyment and relaxation. Free and
fee-based services exist for all of these activities, using centralized servers and distributed
peer-to-peer technologies. Some of these sources exercise more care with respect to the
original artists' copyrights than others.
Many people use the World Wide Web to access news, weather and sports reports, to
plan and book vacations and to find out more about their interests. People use chat,
messaging and e-mail to make and stay in touch with friends worldwide, sometimes in
the same way as some previously had pen pals. The Internet has seen a growing number
of Web desktops, where users can access their files and settings via the Internet.

Cyberslacking can become a serious drain on corporate resources; the average UK


employee spent 57 minutes a day surfing the Web while at work, according to a 2003
study by Peninsula Business Services.[20] Internet addiction disorder is excessive
computer use that interferes with daily life. Some psychologists believe that ordinary
internet use has other effects on individuals for instance interfering with the deep
thinking that leads to true creativity.

See also
Computer programming (often shortened to programming or coding) is the process of
designing, writing, testing, debugging / troubleshooting, and maintaining the source code
of computer programs. This source code is written in a programming language. The code
may be a modification of an existing source or something completely new. The purpose
of programming is to create a program that exhibits a certain desired behaviour
(customization). The process of writing source code often requires expertise in many
different subjects, including knowledge of the application domain, specialized algorithms
and formal logic.

Contents
[hide]

• 1 Overview
• 2 History of programming
• 3 Modern programming
o 3.1 Quality requirements
o 3.2 Algorithmic complexity
o 3.3 Methodologies
o 3.4 Measuring language usage
o 3.5 Debugging
• 4 Programming languages
• 5 Programmers
• 6 See also
• 7 References
• 8 Further reading

• 9 External links

[edit] Overview
Wikiversity has learning materials about programming

Within software engineering, programming (the implementation) is regarded as one phase


in a software development process.

There is an ongoing debate on the extent to which the writing of programs is an art, a
craft or an engineering discipline.[1] In general, good programming is considered to be the
measured application of all three, with the goal of producing an efficient and evolvable
software solution (the criteria for "efficient" and "evolvable" vary considerably). The
discipline differs from many other technical professions in that programmers, in general,
do not need to be licensed or pass any standardized (or governmentally regulated)
certification tests in order to call themselves "programmers" or even "software
engineers." However, representing oneself as a "Professional Software Engineer" without
a license from an accredited institution is illegal in many parts of the world.[citation needed]
However, because the discipline covers many areas, which may or may not include
critical applications, it is debatable whether licensing is required for the profession as a
whole. In most cases, the discipline is self-governed by the entities which require the
programming, and sometimes very strict environments are defined (e.g. United States Air
Force use of AdaCore and security clearance).

Another ongoing debate is the extent to which the programming language used in writing
computer programs affects the form that the final program takes. This debate is analogous
to that surrounding the Sapir-Whorf hypothesis [2] in linguistics, that postulates that a
particular language's nature influences the habitual thought of its speakers. Different
language patterns yield different patterns of thought. This idea challenges the possibility
of representing the world perfectly with language, because it acknowledges that the
mechanisms of any language condition the thoughts of its speaker community.

Said another way, programming is the craft of transforming requirements into something
that a computer can execute.

[edit] History of programming


See also: History of programming languages

Wired plug board for an IBM 402 Accounting Machine.


The concept of devices that operate following a pre-defined set of instructions traces back
to Greek Mythology, notably Hephaestus, the Greek Blacksmith God, and his mechanical
slaves.[3] The Antikythera mechanism from ancient Greece was a calculator utilizing
gears of various sizes and configuration to determine its operation.[4] Al-Jazari built
programmable Automata in 1206. One system employed in these devices was the use of
pegs and cams placed into a wooden drum at specific locations. which would sequentially
trigger levers that in turn operated percussion instruments. The output of this device was
a small drummer playing various rhythms and drum patterns.[5][6] The Jacquard Loom,
which Joseph Marie Jacquard developed in 1801, uses a series of pasteboard cards with
holes punched in them. The hole pattern represented the pattern that the loom had to
follow in weaving cloth. The loom could produce entirely different weaves using
different sets of cards. Charles Babbage adopted the use of punched cards around 1830 to
control his Analytical Engine. The synthesis of numerical calculation, predetermined
operation and output, along with a way to organize and input instructions in a manner
relatively easy for humans to conceive and produce, led to the modern development of
computer programming. Development of computer programming accelerated through the
Industrial Revolution.

In the late 1880s, Herman Hollerith invented the recording of data on a medium that
could then be read by a machine. Prior uses of machine readable media, above, had been
for control, not data. "After some initial trials with paper tape, he settled on punched
cards..."[7] To process these punched cards, first known as "Hollerith cards" he invented
the tabulator, and the keypunch machines. These three inventions were the foundation of
the modern information processing industry. In 1896 he founded the Tabulating Machine
Company (which later became the core of IBM). The addition of a control panel
(plugboard) to his 1906 Type I Tabulator allowed it to do different jobs without having to
be physically rebuilt. By the late 1940s, there were a variety of plug-board programmable
machines, called unit record equipment, to perform data-processing tasks (card reading).
Early computer programmers used plug-boards for the variety of complex calculations
requested of the newly invented machines.

Data and instructions could be stored on external punched cards, which were kept in
order and arranged in program decks.
The invention of the von Neumann architecture allowed computer programs to be stored
in computer memory. Early programs had to be painstakingly crafted using the
instructions (elementary operations) of the particular machine, often in binary notation.
Every model of computer would likely use different instructions (machine language) to
do the same task. Later, assembly languages were developed that let the programmer
specify each instruction in a text format, entering abbreviations for each operation code
instead of a number and specifying addresses in symbolic form (e.g., ADD X, TOTAL).
Entering a program in assembly language is usually more convenient, faster, and less
prone to human error than using machine language, but because an assembly language is
little more than a different notation for a machine language, any two machines with
different instruction sets also have different assembly languages.

In 1954, FORTRAN was invented; it was the first high level programming language to
have a functional implementation, as opposed to just a design on paper.[8][9] (A high-level
language is, in very general terms, any programming language that allows the
programmer to write programs in terms that are more abstract than assembly language
instructions, i.e. at a level of abstraction "higher" than that of an assembly language.) It
allowed programmers to specify calculations by entering a formula directly (e.g. Y = X*2
+ 5*X + 9). The program text, or source, is converted into machine instructions using a
special program called a compiler, which translates the FORTRAN program into machine
language. In fact, the name FORTRAN stands for "Formula Translation". Many other
languages were developed, including some for commercial programming, such as
COBOL. Programs were mostly still entered using punched cards or paper tape. (See
computer programming in the punch card era). By the late 1960s, data storage devices
and computer terminals became inexpensive enough that programs could be created by
typing directly into the computers. Text editors were developed that allowed changes and
corrections to be made much more easily than with punched cards. (Usually, an error in
punching a card meant that the card had to be discarded and an new one punched to
replace it.)

As time has progressed, computers have made giant leaps in the area of processing
power. This has brought about newer programming languages that are more abstracted
from the underlying hardware. Although these high-level languages usually incur greater
overhead, the increase in speed of modern computers has made the use of these languages
much more practical than in the past. These increasingly abstracted languages typically
are easier to learn and allow the programmer to develop applications much more
efficiently and with less source code. However, high-level languages are still impractical
for a few programs, such as those where low-level hardware control is necessary or
where maximum processing speed is vital.

Throughout the second half of the twentieth century, programming was an attractive
career in most developed countries. Some forms of programming have been increasingly
subject to offshore outsourcing (importing software and services from other countries,
usually at a lower wage), making programming career decisions in developed countries
more complicated, while increasing economic opportunities in less developed areas. It is
unclear how far this trend will continue and how deeply it will impact programmer wages
and opportunities.

[edit] Modern programming


[edit] Quality requirements

Whatever the approach to software development may be, the final program must satisfy
some fundamental properties. The following properties are among the most relevant:

• Efficiency/performance: the amount of system resources a program consumes


(processor time, memory space, slow devices such as disks, network bandwidth
and to some extent even user interaction): the less, the better. This also includes
correct disposal of some resources, such as cleaning up temporary files and lack
of memory leaks.
• Reliability: how often the results of a program are correct. This depends on
conceptual correctness of algorithms, and minimization of programming mistakes,
such as mistakes in resource management (e.g., buffer overflows and race
conditions) and logic errors (such as division by zero or off-by-one errors).
• Robustness: how well a program anticipates problems not due to programmer
error. This includes situations such as incorrect, inappropriate or corrupt data,
unavailability of needed resources such as memory, operating system services and
network connections, and user error.
• Usability: the ergonomics of a program: the ease with which a person can use the
program for its intended purpose, or in some cases even unanticipated purposes.
Such issues can make or break its success even regardless of other issues. This
involves a wide range of textual, graphical and sometimes hardware elements that
improve the clarity, intuitiveness, cohesiveness and completeness of a program's
user interface.
• Portability: the range of computer hardware and operating system platforms on
which the source code of a program can be compiled/interpreted and run. This
depends on differences in the programming facilities provided by the different
platforms, including hardware and operating system resources, expected
behaviour of the hardware and operating system, and availability of platform
specific compilers (and sometimes libraries) for the language of the source code.
• Maintainability: the ease with which a program can be modified by its present or
future developers in order to make improvements or customizations, fix bugs and
security holes, or adapt it to new environments. Good practices during initial
development make the difference in this regard. This quality may not be directly
apparent to the end user but it can significantly affect the fate of a program over
the long term.

[edit] Algorithmic complexity

The academic field and the engineering practice of computer programming are both
largely concerned with discovering and implementing the most efficient algorithms for a
given class of problem. For this purpose, algorithms are classified into orders using so-
called Big O notation, O(n), which expresses resource use, such as execution time or
memory consumption, in terms of the size of an input. Expert programmers are familiar
with a variety of well-established algorithms and their respective complexities and use
this knowledge to choose algorithms that are best suited to the circumstances.

[edit] Methodologies

The first step in most formal software development projects is requirements analysis,
followed by testing to determine value modeling, implementation, and failure elimination
(debugging). There exist a lot of differing approaches for each of those tasks. One
approach popular for requirements analysis is Use Case analysis.

Popular modeling techniques include Object-Oriented Analysis and Design (OOAD) and
Model-Driven Architecture (MDA). The Unified Modeling Language (UML) is a
notation used for both the OOAD and MDA.

A similar technique used for database design is Entity-Relationship Modeling (ER


Modeling).

Implementation techniques include imperative languages (object-oriented or procedural),


functional languages, and logic languages.

[edit] Measuring language usage

It is very difficult to determine what are the most popular of modern programming
languages. Some languages are very popular for particular kinds of applications (e.g.,
COBOL is still strong in the corporate data center, often on large mainframes,
FORTRAN in engineering applications, scripting languages in web development, and C
in embedded applications), while some languages are regularly used to write many
different kinds of applications.

Methods of measuring programming language popularity include: counting the number of


job advertisements that mention the language,[10] the number of books teaching the
language that are sold (this overestimates the importance of newer languages), and
estimates of the number of existing lines of code written in the language (this
underestimates the number of users of business languages such as COBOL).

[edit] Debugging
A bug, which was debugged in 1947.

Debugging is a very important task in the software development process, because an


incorrect program can have significant consequences for its users. Some languages are
more prone to some kinds of faults because their specification does not require compilers
to perform as much checking as other languages. Use of a static analysis tool can help
detect some possible problems.

Debugging is often done with IDEs like Visual Studio, NetBeans, and Eclipse.
Standalone debuggers like gdb are also used, and these often provide less of a visual
environment, usually using a command line.

[edit] Programming languages


Main articles: Programming language and List of programming languages

Different programming languages support different styles of programming (called


programming paradigms). The choice of language used is subject to many
considerations, such as company policy, suitability to task, availability of third-party
packages, or individual preference. Ideally, the programming language best suited for the
task at hand will be selected. Trade-offs from this ideal involve finding enough
programmers who know the language to build a team, the availability of compilers for
that language, and the efficiency with which programs written in a given language
execute.

Allen Downey, in his book How To Think Like A Computer Scientist, writes:

The details look different in different languages, but a few basic instructions
appear in just about every language:

• input: Get data from the keyboard, a file, or some other device.
• output: Display data on the screen or send data to a file or other
device.
• arithmetic: Perform basic arithmetical operations like addition and
multiplication.
• conditional execution: Check for certain conditions and execute
the appropriate sequence of statements.
• repetition: Perform some action repeatedly, usually with some
variation.

Many computer languages provide a mechanism to call functions provided by libraries.


Provided the functions in a library follow the appropriate run time conventions (e.g.,
method of passing arguments), then these functions may be written in any other language.

Você também pode gostar