Você está na página 1de 19

INTERNET SOCKET

In computer networking, an Internet socket or network socket is an endpoint of a bidirectional interprocess communication flow across an Internet Protocol-based computer network, such as the Internet. The term Internet sockets is also used as a name for an application programming interface (API) for the TCP/IP protocol stack, usually provided by the operating system. Internet sockets constitute a mechanism for delivering incoming data packets to the appropriate application process or thread, based on a combination of local and remote IP addresses and port numbers. Each socket is mapped by the operating system to a communicating application process or thread. A socket address is the combination of an IP address (the location of the computer) and a port (which is mapped to the application program process) into a single identity, much like one end of a telephone connection is the combination of a phone number and a particular extension.

INTRODUCTION An Internet socket is characterized by a unique combination of the following: 1. Local socket address: Local IP address and port number 2. Remote socket address: Only for established TCP sockets. As discussed in the Client-Server section below, this is necessary since a TCP server may serve several clients concurrently. The server creates one socket for each client, and these sockets share the same local socket address.

* Protocol: A transport protocol (e.g., TCP, UDP), raw IP, or others. TCP port 53 and UDP port 53 are consequently different, distinct sockets. Within the operating system and the application that created a socket, the socket is referred to by a unique integer number called socket identifier or socket number. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data. In IETF Request for Comments, Internet Standards and in many textbooks, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks, the socket term refers to a local socket address, i.e. a "combination of an IP address and a port number". In the original definition of socket given in RFC 147, as it was related to the ARPA network in 1971, "the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets." Today, however, socket communications are bidirectional. On Unix-like and Microsoft Windows based operating systems the netstat command line tool may be used to list all currently established sockets and related information.

SOCKET TYPES

There are several Internet socket types available: * Datagram sockets, also known as connectionless sockets, which use User Datagram Protocol (UDP) * Stream sockets, also known as connection-oriented sockets, which use Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP). * Raw sockets (or Raw IP sockets), typically available in routers and other network equipment. Here the transport layer is bypassed, and the packet headers are not stripped off, but are accessible to the application. Application examples are Internet Control Message Protocol (ICMP, best known for the Ping suboperation), Internet Group Management Protocol (IGMP), and Open Shortest Path First (OSPF).[2] There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture (SNA).

SOCKET STATE AND CLIENT SERVER MODEL

Computer processes that provide application services are called servers, and create sockets on start up that are in listening state. These sockets are waiting for initiatives from client programs. For a listening TCP socket, the remote address presented by netstat may be denoted 0.0.0.0 and the remote port number 0. A TCP server may serve several clients concurrently, by creating a child process for each client and establishing a TCP connection between the child process and the client. Unique dedicated sockets are created for each connection. These are in established state, when a socket-to-socket virtual connection or virtual circuit (VC), also known as a TCP session, is established with the remote socket, providing a duplex byte stream. Other possible TCP socket states presented by the netstat command are Syn-sent, Syn-Recv, Finwait1, Fin-wait2, Time-wait, Close-wait and Closed which relate to various start up and shutdown steps. A server may create several concurrently established TCP sockets with the same local port number and local IP address, each mapped to its own server-child process, serving its own client process. They are treated as different sockets by the operating system, since the remote socket address (the client IP address and/or port number) are different; i.e. since they have different socket pair tuples (see below). A UDP socket cannot be in an established state, since UDP is connectionless. Therefore, netstat does not show the state of a UDP socket. A UDP server does not create new child processes for every concurrently served client, but the same process handles incoming data packets from all remote clients sequentially through the same socket. This implies that UDP sockets are not identified by the remote address, but only by the local address, although each message has an associated remote address.

SOCKET PAIRS

Communicating local and remote sockets are called socket pairs. Each socket pair is described by a unique 4-tuple consisting of source and destination IP addresses and port numbers, i.e. of local and remote socket addresses. As seen in the discussion above, in the TCP case, each unique socket pair 4-tuple is assigned a socket number, while in the UDP case, each unique local socket address is assigned a socket number. IMPLEMENTATION ISSUES TCP SOCKET FLOW DIAGRAM

Sockets are usually implemented by an API library such as Berkeley sockets, first introduced in 1983. Most implementations are based on Berkeley sockets, for example Winsock introduced in 1991. Other socket API implementations exist, such as the STREAMS-based Transport Layer Interface (TLI). Development of application programs that utilize this API is called socket programming or network programming. These are examples of functions or methods typically provided by the API library: * socket() creates a new socket of a certain socket type, identified by an integer number, and allocates system resources to it. * bind() is typically used on the server side, and associates a socket with a socket address structure, i.e. a specified local port number and IP address. * listen() is used on the server side, and causes a bound TCP socket to enter listening state. * connect() is used on the client side, and assigns a free local port number to a socket. In case of a TCP socket, it causes an attempt to establish a new TCP connection. * accept() is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client, and creates a new socket associated with the socket address pair of this connection. * send() and recv(), or write() and read(), or recvfrom() and sendto(), are used for sending and receiving data to/from a remote socket. * close() causes the system to release resources allocated to a socket. In case of TCP, the connection is terminated. * gethostbyname() and gethostbyaddr() are used to resolve host names and addresses. * select() is used to prune a provided list of sockets for those that are ready to read, ready to write or have errors poll() is used to check on the state of a socket. The socket can be tested to see if it can be written to, read from or has errors.

SOCKET IN NETWORK EQUIPMENT The socket is primarily a concept used in the Transport Layer of the Internet model. Networking equipment such as routers and switches do not require implementations of the Transport Layer, as they operate on the Link Layer level (switches) or at the Internet Layer (routers). However, stateful network firewalls, network address translators, and proxy servers keep track of active socket pairs. Also in fair queuing, layer 3 switching and quality of service (QoS) support in routers, packet flows may be identified by extracting information about the socket pairs. Raw sockets are typically available in network equipment, and used for routing protocols such as IGMP and OSPF, and in Internet Control Message Protocol (ICMP). [edit] Early implementions 1983 Berkeley sockets (also known as the BSD socket API) originated with the 4.2BSD Unix operating system (released in 1983) as an API. Only in 1989, however, could UC Berkeley release versions of its operating system and networking library free from the licensing constraints of AT&T's copyright-protected Unix. 1987 Transport Layer Interface (TLI) was the networking API provided by AT&T UNIX System V Release 3 (SVR3) in 1987[9] and continued into Release 4 (SVR4). Other early implementations were written for TOPS-20[12] , MVS[12], VM[12], IBM-DOS (PCIP)

TCP

The Transmission Control Protocol (TCP), sometimes called the Transfer Control Protocol, is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol (IP), and therefore the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer. TCP is the protocol that major Internet applications such as the World Wide Web, email, remote administration and file transfer rely on. Other applications, which do not require reliable data stream service, may use the User Datagram Protocol (UDP), which provides a datagram service that emphasizes reduced latency over reliability.

INTERNET PROTOCOL SUITE

Application Layer DHCP DNS FTP HTTP IMAP IRC LDAP MGCP NNTP NTP POP RIP RPC RTP SIP SMTP SNMP SOCKS SSH Telnet XMPP Transport Layer TCP TLS/SSL UDP DCCP SCTP RSVP ECN Internet Layer IP (IPv4, IPv6) ICMP ICMPv6 IGMP BGP OSPF IPsec Link Layer ARP/InARP NDP Tunnels (L2TP) PPP Media Access Control (Ethernet, DSL, ISDN, FDDI) (more)

NETWORK FUNCTION

TCP provides a communication service at an intermediate level between an application program and the Internet Protocol (IP). That is, when an application program desires to send a large chunk of data across the Internet using IP, instead of breaking the data into IP-sized pieces and issuing a series of IP requests, the software can issue a single request to TCP and let TCP handle the IP details. IP works by exchanging pieces of information called packets. A packet is a sequence of octets and consists of a header followed by a body. The header describes the packet's destination and, optionally, the routers to use for forwarding until it arrives at its destination. The body contains the data IP is transmitting. Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be lost, duplicated, or delivered out of order. TCP detects these problems, requests retransmission of lost data, rearranges out-of-order data, and even helps minimize network congestion to reduce the occurrence of the other problems. Once the TCP receiver has reassembled the sequence of octets originally transmitted, it passes them to the application program. Thus, TCP abstracts the application's communication from the underlying networking details. TCP is utilized extensively by many of the Internet's most popular applications, including the World Wide Web (WWW), E-mail, File Transfer Protocol, Secure Shell, peer-to-peer file sharing, and some streaming media applications. TCP is optimized for accurate delivery rather than timely delivery, and therefore, TCP sometimes incurs relatively long delays (in the order of seconds) while waiting for out-of-order messages or retransmissions of lost messages. It is not particularly suitable for real-time applications such as Voice over IP. For such applications, protocols like the Real-time Transport Protocol (RTP) running over the User Datagram Protocol (UDP) are usually recommended instead. TCP is a reliable stream delivery service that guarantees delivery of a data stream sent from one host to another without duplication or losing data. Since packet transfer is not reliable, a technique known as positive acknowledgment with retransmission is used to guarantee reliability of packet transfers. This fundamental technique requires the receiver to respond with an acknowledgment message as it receives the data. The sender keeps a record of each packet it sends, and waits for acknowledgment before sending the next packet. The sender also keeps a timer from when the packet was sent, and retransmits a packet if the timer expires. The timer is needed in case a packet gets lost or corrupted. TCP consists of a set of rules: for the protocol, that are used with the Internet Protocol, and for the IP, to send data "in a form of message units" between computers over the Internet. At the same time that IP takes care of handling the actual delivery of the data, TCP takes care of keeping track of the individual units of data transmission, called segments, that a message is divided into for efficient routing through the network. For example, when an HTML file is sent from a Web server, the TCP software layer of that server divides the sequence of octets of the file into segments and forwards them individually to the IP software layer (Internet Layer). The Internet Layer encapsulates each TCP segment into an IP packet by adding a header that includes (among other data) the destination IP address. Even though every packet has the same destination address, they can be routed on different paths through the network. When the client program on the destination computer receives them, the TCP layer (Transport Layer) reassembles the individual segments and ensures they are correctly ordered and error free as it streams them to an application.

APACHE TOMCAT

Apache Tomcat (or Jakarta Tomcat or simply Tomcat) is an open source servlet container developed by the Apache Software Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java" HTTP web server environment for Java code to run. Tomcat should not be confused with the Apache web server, which is a C implementation of an HTTP web server; these two web servers are not bundled together. Apache Tomcat includes tools for configuration and management, but can also be configured by editing XML configuration files. COMPONENTS Tomcat version 4.x was released with Catalina (a servlet container), Coyote (an HTTP connector) and Jasper (a JSP engine).

CATALINA Catalina is Tomcat's servlet container. Catalina implements Sun Microsystems' specifications for servlet and JavaServer Pages (JSP). In Tomcat, a Realm element represents a "database" of usernames, passwords, and roles (similar to Unix groups) assigned to those users. Different implementations of Realm allow Catalina to be integrated into environments where such authentication information is already being created and maintained, and then utilize that information to implement Container Managed Security as described in the Servlet Specification. COYOTE Coyote is Tomcat's HTTP Connector component that supports the HTTP 1.1 protocol for the web server or application container. Coyote listens for incoming connections on a specific TCP port on the server and forwards the request to the Tomcat Engine to process the request and send back a response to the requesting client. JASPER Jasper is Tomcat's JSP Engine. Tomcat 5.x uses Jasper 2, which is an implementation of the Sun Microsystems's JavaServer Pages 2.0 specification. Jasper parses JSP files to compile them into Java code as servlets (that can be handled by Catalina). At runtime, Jasper detects changes to JSP files and recompiles them.

JASPER 2 From Jasper to Jasper 2, important features were added: 1. JSP Tag library pooling - Each tag markup in JSP file is handled by a tag handler class.Tag handler class objects can be pooled and reused in the whole JSP servlet. 2. Background JSP compilation - While recompiling modified JSP Java code, the older version is still available for server requests. The older 3. JSP servlet is deleted once the new JSP servlet has finished being recompiled. 4. Recompile JSP when included page changes - Pages can be inserted and included into a JSP at runtime. The JSP will not only be recompiled with JSP file changes but also with included page changes. 5 JDT Java compiler - Jasper 2 can use the Eclipse JDT (Java Development Tools) Java compiler instead of Ant and javac.

FEATURES TOMCAT 5.X 1.Implements the Servlet 2.4 and JSP 2.0 specifications 2 Reduced garbage collection, improved performance and scalability 3. Native Windows and Unix wrappers for platform integration 4. Faster JSP parsing DEPLOYMENT Experienced users can build and install Tomcat manually from source code after installing such dependencies as the Java Development Kit and the Apache Ant build tool. Depending on the usage requirements, Tomcat may either be deployed as a standalone pure-Java web server or as a component in a more complex configuration in which it serves as a back-end which handles requests passed to it from a general purpose web server such as Apache, using a connector such as mod_jk supplied by the Apache Tomcat team, or mod_proxy an optional module for the Apache HTTP Server supplied by the Apache HTTP Server team.

COMMUNITIES Apache software is built in a community process, with both user and developer mailing lists. The developer list is where discussion on building and testing the next release takes place, while the user list is where users can discuss their problems with the developers and other users. A number of free Apache Tomcat resources and communities have developed in 2010 including Tomcatexpert.com, a SpringSource sponsored community for developers and operators who are running Apache Tomcat in large-scale production environments, and MuleSoft's Apache Tomcat Resource Center, where you can find instructional guides on installing, updating, configuring, monitoring, troubleshooting and securing various versions of tomcat.

TCP PROTOCOL OPERATION

TCP DIAGRAM

TCP protocol operations may be divided into three phases. Connections must be properly established in a multi-step handshake process (connection establishment) before entering the data transfer phase. After data transmission is completed, the connection termination closes established virtual circuits and releases all allocated resources. A TCP connection is managed by an operating system through a programming interface that represents the local end-point for communications, the Internet socket. During the lifetime of a TCP

connection it undergoes a series of state changes: 1. LISTENING : In case of a server, waiting for a connection request from any remote client. 2. SYN-SENT : waiting for the remote peer to send back a TCP segment with the SYN and ACK flags set. (usually set by TCP clients) 3. SYN-RECEIVED : waiting for the remote peer to send back an acknowledgment after having sent back a connection acknowledgment to the remote peer. (usually set by TCP servers) 4. ESTABLISHED : the port is ready to receive/send data from/to the remote peer. 5. FIN-WAIT-1 : 6. FIN-WAIT-2 :Indicates that the client is waiting for the servers fin segment ( which indicates the servers application process is ready to close and the server is ready to initiate it's side of the connection termination) 7. CLOSE-WAIT : 8. LAST-ACK : indicates that the server is in the process of sending its own fin segment ( which indicates the server's application process is ready to close and the server is ready to initiate it's side of the connection termination ) 9. TIME-WAIT : represents waiting for enough time to pass to be sure the remote peer received the acknowledgment of its connection termination request. According to RFC 793 a connection can stay in TIME-WAIT for a maximum of four minutes known as a MSL (maximum segment lifetime). 10. CLOSED : connection is closed

This following info describes TCP connection states and how to read Netstat (NETSTAT.EXE) output. Before data transfer takes place in TCP, a connection must be established. TCP employs a three-way handshake

TCP Connection States

Following is explanation of this handshake. In this context the "client" is the peer requesting a connection and the "server" is the peer accepting a connection. Note that this notation does not reflect Client/Server relationships as an architectural principal. 1. Connection Establishment * The client sends a SYN message which contains the server's port and the client's Initial Sequence Number (ISN) to the server (active open). * The server sends back its own SYN and ACK (which consists of the client's ISN + 1). * The Client sends an ACK (which consists of the server's ISN + 1). 2. Connection Tear-down (modified three way handshake). * The client sends a FIN (active close). This is a now a half-closed connection. The client no longer sends data, but is still able to receive data from the server. Upon receiving this FIN, the server enters a passive close state. * The server sends an ACK (which is the clients FIN sequence + 1) * The server sends its own FIN. * The client sends an ACK (which is server's FIN sequence + 1). Upon receiving this ACK, the server closes the connection. A half-closed connection can be used to terminate sending data while sill receiving data. Socket applications can call shutdown with the second argument set to 1 to enter this state.

Netstat Output The above TCP connection states can be monitored in a network trace under the TCP flags. It is also possible to determine the status of the connection by running the Netstat utility and looking at the State column. Netstat is shipped with Windows NT, Windows 95, and TCP/IP-32 for Windows for Workgroups. State explanations as shown in Netstat:

State Explanation ------------ -------------------------------------------------------SYN_SEND Indicates active open. SYN_RECEIVED Server just received SYN from the client. ESTABLISHED Client received server's SYN and session is established. LISTEN Server is ready to accept connection. FIN_WAIT_1 Indicates active close. TIMED_WAIT Client enters this state after active close. CLOSE_WAIT Indicates passive close. Server just received first FIN from a client. FIN_WAIT_2 Client just received acknowledgment of its first FIN from the server. LAST_ACK Server is in this state when it sends its own FIN. CLOSED Server received ACK from client and connection is closed. As an example, consider the following scenario

State Description

CLOSED Indicates that the server has received an ACK signal from the client and the connection is closed CLOSE_WAIT Indicates that the server has received the first FIN signal from the client and the connection is in the process of being closed So this essentially means that this is a state where socket is waiting for the application to execute close() A socket can be in CLOSE_WAIT state indefinitely until the application closes it. Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets ESTABLISHED Indicates that the server received the SYN signal from the client and the session is established

FIN_WAIT_1 Indicates that the connection is still active but not currently being used FIN_WAIT_2 Indicates that the client just received acknowledgment of the first FIN signal from the server LAST_ACK Indicates that the server is in the process of sending its own FIN signal LISTENING Indicates that the server is ready to accept a connection SYN_RECEIVED Indicates that the server just received a SYN signal from the client SYN_SEND Indicates that this particular connection is open and active TIME_WAIT Indicates that the client recognizes the connection as still active but not currently being used So the explanation for a close_wait situation is as below; CLOSE is an operation meaning "I have no more data to send." that is the client/server has chosen to treat CLOSE in a simplex fashion. The user who CLOSEs may continue to RECEIVE Until he is told that the other side has CLOSED also. Thus, a program/application could initiate several SENDs followed by a CLOSE, and then continue to RECEIVE until signalled that a RECEIVE failed because the other side has CLOSED. We assume that the TCP will signal a user, even if no RECEIVEs are outstanding, that the other side has closed, so the user can terminate his side gracefully. A TCP will reliably deliver all buffers SENT before the connection was CLOSED so a user who expects no data in return need only wait to hear the connection was CLOSED successfully to know that all his data was received at the destination TCP. Users must keep reading connections they close for sending until the TCP says no more data.

Difference between close_wait and time_wait If we take snapshot of netstat (netstat -nP tcp) the common states we see would be ESTABLISHED, TIME_WAIT, CLOSE_WAIT. Before we go into TIME_WAIT and CLOSE_WAIT, lets take close look at sequence of steps for socket closing. Socket connection is essentially between two peers (Browser to webserver, a java client to webserver, webserver to DB server, a webserver to another webserver etc ) Say there is a socket connection established between webserver1 and webserver2. This would be the closing sequence, once the data transfer is done: (From TCP sequence diagram) Here I am assuming webserver1 initiates the close of connection. 1) Socket on webserver1 sends a TCP segment with FIN bit (in TCP header) and the socket goes into FIN_WAIT_1 state. 2) Socket on webserver2 receives the FIN and responds back with ACK to acknowledge the FIN and the socket goes to CLOSE_WAIT state. Now until the application calls the close() on this socket this is going to be in CLOSE_WAIT state. 3) Socket on webserver1 receives the ACK and changes to FIN_WAIT_2 `4) Socket on webserver2 closes the connection(once the application calls close()) and sends back FIN to its peer to close the connection and changes its state to Last Ack 5) Socket on webserver1 receives the FIN and sends back ACK. At this point the socket implementation on webserver1 would start a timer (TIME_WAIT) to handle the scenario where last ACK has been lost and server resends FIN. Now the socket would wait for 2* MSL (Maximum segment lifetime- default is 4mins for solaris & windows) 6) Socket on webserver2 receives the ACK and it moves the connection to closed state 7) After TIME_WAIT is elapsed socket/connection will be closed on webserver1. These multiple levels of acknowledgments & retransmits are needed since TCP is a reliable protocol unlike basic UDP Here is what the three states mean: ESTABLISHED: This is pretty explanatory which basically means the two ends are in a state where data transfer can occur or occurring in both directions. (tcp socket is full duplex, i.e data can be received and responded to on same channel)

CLOSE_WAIT: This is a state where socket is waiting for the application to execute close() CLOSE_WAIT is not something that can be configured where as TIME_WAIT can be set through tcp_time_wait_interval (The attribute tcp_close_wait_interval has nothing to do with close_wait state and this was renamed to tcp_time_wait_interval starting from Solaris 7) A socket can be in CLOSE_WAIT state indefinitely until the application closes it. Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets. (At java level, this manifests as "Too many open files" error) TIME_WAIT: This is just a time based wait on socket before closing down the connection permanently. Under most circumstances, sockets in TIME_WAIT is nothing to worry about.

Você também pode gostar