Você está na página 1de 100

WEB TECHNOLOGIES A COMPUTER SCIENCE PERSPECTIVE

JEFFREY C. JACKSON

Chapter 1 Web Essentials: Clients, Servers, and Communication

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

INTRODUCTION Server The software that distributes the information and the machine where the information and software reside is called the server. provides requested service to client e.g., Web server sends requested Web page

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client:
The software that resides on the remote machine, communicates with the server, fetches the information, processes it, and then displays it on the remote machine is called the client. initiates contact with server (speaks first) typically requests service from server Web: client implemented in browser

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web server: Software that delivers Web pages and other documents to browsers using the HTTP protocol

Web Page: A web page is a document or resource of information that is suitable for the World Wide Web and can be accessed through a web browser.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Website: A collection of pages on the World Wide Web that are accessible from the same URL and typically residing on the same server

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.1The Internet
Technical origin: ARPANET (late 1960s)
Launched in 1969 Project of U.S Dept of Defense(DoD) One of earliest efforts to network heterogeneous(Different manufactures & Different OS), geographically dispersed computers Email first available on ARPANET in 1972 (and quickly very popular!)

ARPANET access was limited to select DoD-funded organizations


Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Advanced Research Projects Agency Network (ARPANET) was one of the world's first operational packet switching networks, the first network to implement TCP/IP.

The network was initially funded by the Advanced Research Projects Agency (ARPA, later DARPA) within the U.S. Department of Defense for use by its projects at universities and research laboratories in the US.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet
Open-access networks
Regional university networks (e.g., SURAnet) CSNET for CS departments with no ARPANET access. Later ARPA Internet allowed to access outside networks such as CSNET. The Connection Between CSNET to ARPA is made by Phonenet(MODEM) approach. This connection is asynchronous. This involves long distance calls
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Open-access networks A full-service network provider offering Internet solutions for business small and large, residential users and non-profit groups. Regional Universities Network (RUN) Is a network of six universities primarily from regional Australia, as well as campuses in the Australian capital cities and some international campuses Southeastern Universities Research Association network (SURAnet) provided networking services for universities and industries. SURAnet was one of the first and one of the largest Internet providers in the United States.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Computer Science Network (CSNET) was a computer network that began operation in 1981 in the United States. Its purpose was to extend networking benefits, for computer science departments at academic and research institutions that could not be directly connected to ARPANET, due to funding or authorization limitations. CSNET was funded by the National Science Foundation for an initial three-year period from 1981 to 1984.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

NSFNET(National Science Foundation) (1985-1995)


Primary purpose: connect supercomputer centers Secondary purpose: provide backbone to connect regional networks Uses TCP/IP Synchronous Communication.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Synchronous communication is said to occur when two parties communicate in real-time. Examples of synchronous communication include telephone calls and two-way radio communication.

In contrast, asynchronous communication is non real-time communication. Examples might be email, blog and message board postings, and especially text messaging.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Geographic distribution of the six supercomputers centers connected by NSFNET backbone

Operated at only 56kbits/sec No of machines connected increased Upgraded to 1.5Mbit/s in 1988 45Mbits/s in 1991
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet
Internet: the network of networks connected via the public backbone and communicating using TCP/IP communication protocol Global Communication Network
Commercial Internet dial-up access offered
Economic Increase network usage Reduced unit cost

Backbone initially supplied by NSFNET, privately funded (ISP fees) beginning in 1995 Private telecommunication firms
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.2 Basic Internet Protocols


1.2.1 TCP/IP 1.2.2 UDP, DNS, and Domain Names 1.2.3 Higher Level Protocols
TCP/IP

Single Protocol

TCP/IP actually two different protocols.

TCP-transport Layer IP-Network layer


Treated as one some bulk of services are built on the top of both the TCP and IP protocols

-> e-mail, Web browsing, File downloads, accessing remote databases


IP is the fundamental protocol defining the Internet (as the name implies!)
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Internet protocols developed as part of ARPANET research


ARPANET began using TCP/IP in 1982

Designed for use both within local area networks (LANs) and between networks IP address:
32-bit number (in IPv4) Each device on the internet has one or more IP addresses Written as four dot-separated bytes, e.g. 192.0.34.166 Each decimal number represents one byte of the IP address
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

IP function: transfer data from source device to destination device IP source software creates a packet representing the data Header: source and destination IP addresses, length of data, etc. Data itself If destination is on another LAN, packet is sent to a gateway that connects to more than one network Gateway is a device that is connected to the source computers network as well as to at least one other network. The sequence of computers that a packet travels through from source to destination is known as its route.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

How does the computer choose the next computer in the route for a packet?
A separate protocol BGP-4 is used to pass network connectivity information between gateways so that each computer can choose a good next hop for each packet it receives.

IP software adds error detection information ( a checksum) to each packet

Limitations of IP:
No guarantee of packet delivery (packets can be dropped) Unreliable Communication is one-way (source to destination)

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

CheckSum Calculation
Checksum Calculation Sender side : 1. It treats segment contents as sequence of 16-bit integers. 2. All segments are added. Let's call it sum. 3. Checksum : 1's complement of sum.(In 1's complement all 0s are converted into 1s and all 1s are converted into 0s). 4. Sender puts this checksum value in UDP checksum field. Receiver side : 1. Calculate checksum 2. All segments are added and than sum is added with sender's checksum. 3. Check that any 0 bit is presented in checksum. If receiver side checksum contains any 0 than, error is detected. So,the packet is discarded by receiver.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

SENDER 1011101110111011 0000111100001111 DATA 1100101011001010 ( sum of all DATA) 0011010100110101 (1s Complement ) Header Checksum- 0011010100110101

Receiver: 1011101110111011 0000111100001111 DATA 1100101011001010 ( sum of all DATA) 0011010100110101 (Checksum) 1111111111111111 (If any bit 0 error Occurred)
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

IP
Source Network 1

Gateway

Destination Gateway

Network 2

Network 3

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

IP
Source LAN 1

Gateway

Destination Gateway

Internet Backbone

LAN 2

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Transmission Control Protocol (TCP)

is a higher-level protocol extends IP to provide additional functionality Reliable communication based on the concept of communication TCP adds concept of a connection on top of IP Provides guarantee that packets delivered Provide two-way (full duplex) communication
A and B both send messages to one another at the same time.

Reliable data transmission by demanding an ACK for each packet it sends via IP Splitting longer messages into shorter ones Reassembling on receiver side.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP
Establish connection.

{ { {
Source

Can I talk to you? OK. Can I talk to you? OK.

Send packet with acknowledgment.

Heres a packet.
Destination Got it. Heres a packet. Heres a resent packet. Got it.

Resend packet if no (or delayed) acknowledgment.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP also adds concept of a port The port concept allows TCP to communicate with many different applications on a machine. TCP header contains port number representing an application program on the destination computer Some port numbers have standard meanings
Example: port 25 is normally used for email transmitted using the Simple Mail Transfer Protocol (SMTP)

TCP

Other port numbers are available first-come-first served to any application Assigned by IANA(Internet Assigned numbers Authority) 0-1023 requested only by the applications that are run by the system at boot-up 1024-65535 used by the first application on a system
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP Header

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.2.2User Datagram Protocol (UDP)

Like TCP in that:


Builds on IP Provides port concept

Unlike TCP in that:


No connection concept No transmission guarantee No two way connection

Advantage of UDP vs. TCP:


Lightweight, so faster for one-time messages less complexity in order to reduce overhead
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Domain Name Service (DNS)


Easier to refer machines by names DNS is the phone book for the Internet Map back and forth between host names and IP addresses DNS often uses UDP for communication When a computer on the Internet needs DNS Service to convert host names to IP uses UDP software to send UDP messages to one of the DNS Servers Host names Labels separated by dots, e.g., www.example.org Final label is top-level domain Generic: .com, .org, .edu, .biz, etc. Country-code: .us, .il(Israel), .mx, .de(germany) etc. Top level domain names assigned by ICANN (Internet corporation for assigned names and numbers)funded by U.S goverernment
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Top level domains divided into sub domains Domains are divided into second-level domains, which can be further divided into sub domains, etc.
E.g., in www.example.com, example is a secondlevel domain

Assignment of second level domain by registry operator A host name plus domain name information is called the fully qualified domain name of the computer
Above, www is the host name, www.example.com is the FQDN
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Service names and port numbers are used to distinguish between different services that run over transport protocols such as TCP, UDP When a service (server program) initially is started, it is said to bind to its designated port number. As any client program wants to use that server, it also must request to bind to the designated port number. Port numbers are from 0 to 65535. Ports 0 to 1024 are reserved for use by certain privileged services. For the HTTP service, port 80 is defined as a default and it does not have to be specified in the Uniform Resource Locator (URL). A registry operator (also called a Network Information Center (NIC)) is an entity that maintains the database of domain names for a given top-level domain and generates the zone files which convert domain names to IP addresses.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Zone file A file on a root server that contains domain name registration information. Master files contains all information related to one domain

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

nslookup program provides command-line access to DNS (on most systems to query the Internet)

looking up a host name given an IP address is known as a reverse lookup Recall that single host may have mutliple IP addresses. Only one of the names will be returned by a reverse lookup. Address returned is the canonical IP address specified in the DNS system.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.2.3 Higher-level Protocols

IP ~ the telephone network TCP ~ calling someone who answers, having a conversation, and hanging up UDP ~ calling someone and leaving a message DNS ~ directory assistance (names with numbers)

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Many protocols build on TCP Telephone analogy: TCP specifies how we initiate and terminate the phone call, but some other protocol specifies how we carry on the actual conversation Some examples: SMTP (email) FTP (file transfer) HTTP (transfer of Web documents) Primary TCP-based protocol used for communication between web servers and browsers called HTTP IP is key component in the definition of Internet HTTP - WWW
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.3 World Wide Web


Public Sharing of InformationInternet Usenet newsgroup service 1979 worldwide distributed Internet discussion system Posting Information that could be read by users on other system First Internet Chat software Internet Relay Chat Various technologies were developed for supporting information management and search on the internet. Gopher-hierarchical view of documents WAIS-(Wide area information system)-Used indexing ARCHIE Search online info archives via FTP

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Unique feature of Web: support for hypertext (text containing links) Communication via Hypertext Transport Protocol (HTTP) Document representation using Hypertext Markup Language (HTML)

The Web is the collection of machines (Web servers) on the Internet that provide information, particularly HTML documents, via HTTP. Machines that access information on the Web are known as Web clients. A Web browser is software used by an end user to access the Web.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.3.1 Hypertext Transport Protocol (HTTP)

HTTP is based on the request-response communication model:


Client sends a request Server sends a response

HTTP is a stateless protocol:


The protocol does not require the server to remember anything about the client between requests.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Hypertext Transport Protocol (HTTP)

Communication Protocol HTTP is based on the request-response communication model: Client sends a request Server sends a response Format of the message is dictated by HTTP HTTP send the message using TCP HTTP is a stateless protocol: The protocol does not require the server to remember anything about the client requests. Each request is executed independently, without any knowledge of the requests that came before it
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Browsing the web


Normally implemented over a TCP connection (80 is standard port number for HTTP) Typical browser-server interaction: User enters Web address in browser Browser uses DNS to locate IP address Browser opens TCP connection to server Browser sends HTTP request over connection Server sends HTTP response to browser over connection Browser displays body of response in the client area of the browser window
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP request Message The information transmitted using HTTP is often entirely text (readable form) Start line followed by a message header and optional message body Start line Example: GET / HTTP/1.1
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP request Message The information transmitted using HTTP is often entirely text (readable form) Connect to a web server using telnet
Connect

Send Request
Receive Response

{ {

$ telnet www.example.org 80 Trying 192.0.34.166 Connected to www.example.com (192.0.34.166). Escape character is ^]. GET / HTTP/1.1 Host: www.example.org
HTTP/1.1 200 OK Date: Thu, 09 Oct 2003 20:30:49 GMT

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.4 HTTP Request message 1.4.1 Overall Structure


Structure of the request: start line header field(s) (one or more) blank line optional message body Start line Example: GET / HTTP/1.1 Every start line consist of Three space-separated parts: HTTP request method Request-URI portion of web address HTTP version
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.4.2 HTTP version 1997 HTTP 1.1 was formally defined The version string for HTTP/1.1 must appear in the start line exactly as shown with all capital letters and no embedded white space

1.4.3 Request-URI
Second part of start line Concatenation of the string http:// Value of the host header field www.example.org Request-URI forms a string known as URI

An URI is an identifier that is intended to be associated with a particular resource on the WWW.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Every URI has two parts


Scheme appears before the colon : Another part depends on the scheme web addresses ( most part use http scheme)

URI is case sensitive generally written in lowercase URI representing the location of a resource on the web called the URL. Another type URN designed to be a unique name for a resource. Syntax: scheme : scheme-depend-part Ex: In http://www.example.com/ the scheme is http

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

URIs are of two types: Uniform Resource Name (URN) Can be used to identify resources with unique names, such as books (which have unique ISBNs) Scheme is urn Ex: Three colon separated parts
scheme name Namespace identifier Namespace specific string
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Uniform Resource Locator (URL) Specifies location at which a resource can be found In addition to http, some other URL schemes are https, ftp, mailto, and file

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.4.4 HTTP request method


The method part of the start line of an HTTP request written in uppercase letters GET is the most common HTTP method; it says "give me

this resource". Other methods include POST and HEAD.


Method names are always uppercase POST used to send information collected from a form displayed within a browser The path is the part of the URL after the host name, also called

the request URI


The HTTP version always takes the form "HTTP/x.x", uppercase.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Method Description OPTIONS Return a list of HTTP methods used to access the resource GET Retrieves the requested URI, including the headers and body (that is, the content). HEAD Retrieves only the headers for the requested URI and not the body. POST Sends information to the server from HTML forms. PUT Uploads the file indicated in the URI to a server. DELETE Deletes the URI from a server. TRACE Return a copy of the complete HTTP request message for test purposes.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.4.4 Header fields and MIME Types The host header field is required in every HTTP/1.1 request message Each header field begins with a field name such as host followed by a colon and then field value Header field structure: field name : field value Syntax Field name is not case sensitive Field value may continue on multiple lines by starting continuation lines with white space Field values may contain MIME types, quality values, and wildcard characters (*s)
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request start line, 10 header fields and a short message body

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Header field features: First header names not case sensitive Header field value wrap onto several lines Header field values using MIME types Many header field values use quality values to indicate preferences Quality value specified by a string of the form q=num Num is a decimal number between 0 and 1

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Multipurpose Internet Mail Extensions (MIME) Standard used to pass variety of information includes graphics and applications through e-mails as well as through Internet message protocols. Has two parts Content type of the message case insensitive string Subtype or private type indicated by x- or X MIME content type syntax: top-level type / subtype Examples: text/html, image/jpeg
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Quality Values and Wildcards Example header field with quality values: accept: text/xml,text/html;q=0.9, text/plain;q=0.8, image/jpeg, image/gif;q=0.2,*/*;q=0.1 Quality value applies to all preceding items Higher the value, higher the preference Note use of wildcards to specify quality 0.1 for any MIME type not specified earlier

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Common header fields: Host: host name from URL (required) User-Agent: type of browser sending request Accept: MIME types of acceptable documents Connection: value close tells server to close connection after single request/response Content-Type: MIME type of (POST) body, normally application/x-www-form-urlencoded Content-Length: bytes in body Referer: URL of document containing link that supplied URI for this HTTP request

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.5 HTTP Response Message

Structure of the response:


status line header field(s) (one or more) blank line optional message body 1.5.1 Response Status Line Example: HTTP/1.1 200 OK

Three space-separated parts:


HTTP version used by server software status code (numeric) reason phrase (intended for human use)
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Status code Three-digit number First digit is class of the status code: 1=Informational provide information to client. 2=Success 3=Redirection (alternate URL is supplied) 4=Client Error Request not valid 5=Server Error Error occurred during server processing
Other two digits provide additional information
200 301 307 401 OK Moved Permanently Temporary redirect Unauthorized

403
404 500

Forbidden
Not Found Internal Server Error

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.5.2 Response header fields Common header fields:


Connection, Content-Type, Content-Length Date: date and time at which response was generated (required).supplied by server. Server-Information identifying the server software Location: alternate URI if status is redirection Last-Modified: date and time the requested resource was last modified on the server Expires: date and time after which the clients copy of the resource will be out-of-date ETag: a unique identifier for this version of the requested resource (changes if resource changes) A hash code of resource returned.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.5.3Cache Control
A cache is a local copy of information obtained from some other source A copy of information placed in cache to improve system performance Ex: icon appearing multiple times in a Web page Advantages Most web browsers use cache to store requested resources so that subsequent requests to the same resource will not necessarily require an HTTP request/response HTTP caching when successful leads to quicker display by the browser Reduced network communication Reduce load on the web Server Drawbacks Information in a cache become invalid
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Validating cached resource: Send HTTP HEAD request and check LastModified or ETag header in response Compare current date/time with Expires header sent in response containing resource

Comparing the Etag returned by head request with


Etag stored with the cached resource If Etag values match, then the cached copy is valid

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.5.4 Character Sets


Characters represented in web documents
Every document is represented by a string of integer values (code points) The mapping from code points to characters is defined by a character set Ex: US-ASCII (7- bit Integer) char set used to represent the characters used in HTTP header field names In java the char set used internally by browser is defined by

UNICODE.
Character Encoding is a bit string that must be decoded into a code-point integer that is then mapped to a character according to the definition

provided byTechnologies: some character set. Jackson, Web A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

An encoding represents code points using variable-length byte strings Most common examples are Unicode-based encodings UTF-8 and UTF-16 IANA maintains complete list of Internet-recognized character sets/encodings Some header fields have character set values: Accept-Charset: request header listing character sets that the client can recognize Ex: accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Content-Type: can include character set used to represent the body of the HTTP message Ex: Content-Type: text/html; charset=UTF-8

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Typical US PC produces ASCII documents US-ASCII character set can be used for such documents, but is not recommended

UTF-8 and ISO-8859-1 are supersets of US-ASCII and


provide international compatibility

UTF-8 can represent all ASCII characters using a


single byte each and arbitrary Unicode characters using up to 4 bytes each ISO-8859-1 is 1-byte code that has many characters common in Western European languages, such as
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.6 Web Clients


Is a software that access a web server by sending HTTP request message and processing the resulting HTTP response Most common form of web client software
Web browsers running on desktop or laptop

Many possible web clients: Text-only browser (lynx) Mobile phones Robots (software-only clients, e.g., search engine crawlers) not designed to be used directly by humans at all. etc.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

User agent Any web client that is designed to directly support user access to web servers.

Early browsers text-based ran on specialized platforms


First graphical browser running on general-purpose platforms: Mosaic (1993) by NCSA (National Centre for supercomputer applications) Then came Netscape Navigator Microsoft Internet Explorer Browser war between Netscape Navigator and Microsoft Internet Explorer Microsoft was victorious
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Netscape was acquired by America online Launched Mozilla Firefox All the major modern browsers support a common set of basic user features Provide similar support for HTTP communication

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.6.1 Basic Browser Function

Window split into several rectangular regions known as Bars 5 Standard region in Mozilla 1.4 Primary region Client area display document Title bar title assigned by document author to the document currently displayed within the client area Menu bar dropdown menus and GUI

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Navigation toolbar push-button controls(Back, Forward Stop Print and Reload) Contains a text box known as Location bar User can enter the url in order to request the browser to display the document located at the specified URL. Status bar displays messages and icons related to the status of the browser Browser make HTTP request on behalf of the user Browser Primary tasks: Reformat the URL entered as a valid HTTP request message If server specified by host name, use DNS Establish TCP connection using IP of the specified address
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Some Mozilla Status Messages

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Send HTTP request over TCP connection and wait for response Display the document contained in the response Render (appropriately display) documents returned by a server

1.6.2 URLs
A HTTP scheme URL consist of a number of pieces

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Browser uses authority to connect via TCP Request-URI included in start line (/ used for path if none supplied) Fragment identifier not sent to server (used to scroll browser client area) 1.6.3 User Controllable Features Graphical Browsers features: Save : Most documents can be saved by the user to the client machines file system.
File|Save Page As

Find in Page: Standard documents (text and HTML) can be searched with a function similar to word processors
Edit | Find in This Page
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Automatic Form Filling : Browser can remember information entered on certain forms(billing address, phone numbers) Edit | Save Form Info Edit | Fill in Form Tools | Form Manager Preferences: User customize browser functionality in wide variety of ways Edit| Preferences Some Preference Settings are Accept-Language Navigator | Languages Lang for web page Default character set/encoding The Char set for the web documents Navigator|languages Character Coding Cache Properties Amount of local storage allocated to the cache Advanced |Cache Set Cache options Http Settings The version of Http used and whether or not the client will keep connections alive Advanced|HTTP NetworkingDirect Connections Options
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Style definition View|Text Zoom View|Use Style Document meta-Information View|page Source Raw HTML View|Page Info meta information Themes Look of one or more browser bars(Skin) View | Apply Theme|Get New Themes History Automatically maintain a list of all pages visited within the last several days Go|History Bookmarks Save the URL for that page for an indefinite length of time 1.6.4 Additional Functionality Automatic URL Completion Script Execution [ Browsers run programs to perform variety of tasks , validation] Event Handling [Clicking on a link or button occurrence of event, Button Clicks and mouse movement] Management of form GUI: Web page contains a form with fill-in fields browser allow user to perform std text-editing functions, button image,Text Cursor]
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Communication: User send sensitive information to server and the browser encode this information and prevent it from any other machines, Credit Card Number] Plug-in Execution: Support Plug-in Protocol.Display of non-HTML documents (e.g., PDF) via plug-ins Help|About Plug-ins 1.7 WEB SERVERS Tomcat 5.0 1.7.1 Server Features Accept HTTP request from web clients and return an appropriate resource in the HTTP response
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Basic functionality: Server calls on TCP software and waits for connection req to one or more ports When a connection request is received , the server dedicates a subtask(Single copy of server software handling a single client connection) Subtask establish connection and receives request Subtask examines the host header field to determine the host and invokes software for this host Virtual host software Map Request-URI to specific resource on the server.

It maps Request-URI to specific resource associated with the virtual host


File: Return file in HTTP response (MIME Type) Program: Run program and return output in HTTP response
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Log information about the request and response such as IP address and the status code in a plain-text file. If the TCP connection is kept alive , the server subtask continues to monitor the connection, the client send another request or initiates a connection close. Few Definitions All modern servers concurrently process multiple requests Multiple copies of the server running simultaneously(Concurrency) Subtask Single copy of server software handling a single client connection Virtual Host HTTP request include a host header field Multiple host names mapped by a DNS to a single IP address Web server determine which virtual host is being requested by examining the host header field.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.2 Server History


NCSA httpd web server NCSA discontinued development of server in 1990 Several individuals began developing their own updates called patches Patchy server known as Apache Server Free open server source in April 1995 Microsoft began development IIS (Internet Information Server) IIS include all features found in apache Drawbacks IIS Run only on Windows System Run programs written in VB script Apache runs on Windows , Linux, and Macintosh Run programs written in Perl and PHP
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

A number of IIS and Apache server run java programs When running a java program , both servers are configured to run the program by using a separate software called Servlet Container Servlet Container provides JVM that runs java programs(known as Servlet) It provides communication between the servlet and the Apache or IIS Server Tomcat is a popular free open-source servlet container by Apache software foundation Tomcat can also run as a standalone web server that communicates directly with web clients Tomcat 5.0 Web Server

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.3 Server Configuration and Tuning


Modern servers have large number of Configuration parameters Server Configuration broken into two areas:
External Communication Internal Processing

In Tomcat two separate Java Packages:


Coyote Catalina

Coyote Provides HTTP 1.1 communication Catalina Actual Servlet Container Coyote parameters affecting External Communication: IP addresses and TCP ports Number of subtasks created when server initialized Max number of threads allowed to exist simultaneously
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Max no of TCP connection request that will be queued if server is running its max no of threads. If queue full the received connection request is refused. Keep-alive time for inactive TCP connections Settings of the parameter affect the performance of the server. Tuning the Server Changing the values of these and similar parameters in order to optimize performance Tuning is done by trial and error Load generation or stress test tools used to simulate request to a web server helpful for experimenting with tuning parameters
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Internal Catalina parameters affect functionality: Which client machines may send HTTP request to the server Which virtual host are listening for TCP connection What logging will be performed How the request URI mapped to servers resources Password protection of resources Use of server-side caching

Install Tomcat 5.0 at the default port 8080 Open browser browse to the URL
http://localhost:8080

Click the Server Administration link cause a login page to be displayed.


Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Tomcat included in JWSDP JWSDP Service entry in the list on left side Click on the icon to reveal the associated server components Service has Five Components: Connector, Host, Logger, Realm, and Valve
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Connector is a coyote component handles HTTP communication Clicking on the connector will produce the window containing the dropdown menus of possible action that can be performed for this component
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Connector Attributes When you create or modify any type of Connector, the attributes shown in flowing table may be set, as needed. Common Connector Attributes

Attribute
Accept Count

Description
Length of TCP Connection wait queue

The number of milliseconds this Connector will Connection wait, after accepting a connection. The default Timeout value is 60000 (i.e. 60 seconds). Specifies which address will be used for listening IP Address on the specified port, for servers with more than one IP address.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Port Number

Port Number on which this connection will listen for TCP connection request
The number of request processing threads that will be created when this Connector is first started. The default value is 5.

Minimum

Maximum

The maximum number of request processing threads to be created by this Connector, which therefore determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to 75.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.4 Defining Virtual Hosts

Configuring Host Elements The Host element represents a virtual host, which is an association of a network name for a server (such as www.mycompany.com) with the particular server on which Tomcat is running. Host Attributes The attributes shown in following table may be viewed, set, or modified for a Host.

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Host Attributes Attribute Name Description FQDN that clients will use to access the virtual host

Directory Containing Web Applications The Application Base directory for this virtual host. This is the path name of a directory that may contain Applicati Web applications to be deployed on this virtual host. on Base You may specify an absolute path name for this directory, or a path name that is relative to the directory under which Tomcat is installed. Deploy on startup Boolean value indicating whether or not web applications should be automatically initialized when the server starts

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web Applications Collection of files and programs that work together to provide particular functions to web users Absolute path name Traces the path from the /(root) directory. Absolute path names always begin with the slash (/) symbol. Relative path name Traces the path from the current directory through its parent or its subdirectories and files.
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.5 Logging
Web server logs record information about server activity Access log is a file that records information about every HTTP request processed by the server Message logs variety of debugging and other information generated by web server Access logging is performed by adding a valve component The Primary fields are given in the table: Logger Attributes Attribute Directory Pattern Description Where log file will be written Information to be written to log

The prefix added to the start of each log Prefix file's name. Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Suffix Timestamp Resolve Hosts

The suffix added to the end of each log file's name. Whether or not all logged messages are to be date and time stamped. Set to True and false Whether IP address or host name to be written in log file

Tomcat writes the log information in a log file which in a plain text format. In general, the log entry has the following format: %h %l %t %r %s %b %h - Remote host name %l - Remote logical user name %t - Date and time, in Common Log Format %r - First line of the request URI %s - HTTP status code of the response %b - Bytes sent in body of response, excluding HTTP headers,
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Access log in common format:

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.6 Access Control


Provide automatic password protection for resources Access control: Password protection (e.g., admin pages) Users and roles defined conf/tomcat-users.xml Deny access to machines Useful for denying access to certain users denying access from the machines they use List of denied machines maintained RemoteHostValve (deny by host name) RemoteAddressValve (deny by IP address)

in

by
in or

Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

1.7.7 Secure Servers

Since HTTP messages typically travel over a public network, private information (such as credit card numbers) should be encrypted to prevent eavesdropping https URL scheme tells browser to use encryption Common encryption standards:
Secure Socket Layer (SSL) Transport Layer Security (TLS)

97
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers
Id like to talk securely to you (over port 443)

HTTP Requests

Heres my certificate and encryption data

HTTP Requests

Heres an encrypted HTTP request Browser

TLS/ SSL

Heres an encrypted HTTP response

TLS/ SSL

Web Server

Heres an encrypted HTTP request HTTP Responses Heres an encrypted HTTP response

HTTP Responses

98
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers Man-in-the-Middle Attack


Fake DNS Server
Whats IP address for 100.1.1.1 www.example.org?

Fake www.example.org 100.1.1.1


My credit card number is

Browser

Real www.example.org

99
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers Preventing Man-in-the-Middle


Fake DNS Server
Whats IP address for 100.1.1.1 www.example.org?

Fake www.example.org 100.1.1.1


Send me a certificate of identity

Browser

Real www.example.org

100
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Você também pode gostar