Você está na página 1de 11

IIT Bombay Network Measurements: Monitor the performance of the wired IITB campus network including the backhaul

Manveer Singh Chawla (04D05003) Guided By: Prof. Purushottam Kulkarni and Prof. Bhaskaran Raman Indian Institute of Technology, Bombay Abstract
Communication networks have become an indispensable part of our life today. A failure in a large and complex network, such as IIT Bombays campus network, is difcult to detect and diagnose. It is this problem of network failure, which we are trying to solve here. Network measurements in such scenarios provide us very valuable information about the performance, utilization and health of network. However, complexity and size of the network makes these measurements a daunting task. Lack of administrative access to backhaul network, multiple points of failure and a need to make these measurements in a non-intrusive manner makes the problem much more challenging. In this report we present the node, link and topological properties of a network, which are measured directly and some network properties which are inferred from these properties. We also provide a case study of IIT Bombays campus network, the logical structure of network, the composition and direction of trafc ow within network, and services provided by Computer Center to the users within network. We observe that network trafc from hostel area in campus upsets the performance of users in academic area. We nally categorize the problems faced by different users in network and try to identify their frequency through some basic experiments. tries to shape the network trafc. This growth in complexity and the distributed nature has made network measurement and monitoring a daunting task not only for administrators but for researchers as well. In such large and complex networks, there are multiple points of failures. For example, a person is browsing Internet, but he is not able to access websites at intermittent intervals. This can be because of several different reasons. Input buffer at intermediate router might be full and packets might be getting dropped or the proxy server might be overloaded and dropping requests or some WAN link might itself be down or domain name is not resolved because of under-performance of DNS or their might be some mis-conguration at the users machine. Last but not the least, trafc from other users also affects the performance experienced by a user. Virus infected machine of a single user starts bombarding the proxy and network performance of all other users is disrupted. Larger is the network larger is the set of possible point of failures. The question which comes in virtually every users mind is, What is wrong with the network?. In this thesis we try to answer this question, by detecting and diagnosing the network failures in campus network like scenario. Continuous measurement and monitoring of network state by measuring properties such as bandwidth, link utilization, packet drop-rate and link-delay, help in detecting and diagnosing these failures. But size and complexity of network make measurements challenging so as to where and when to perform. The challenge also lies in measuring network properties using existing infrastructure. Our motivation is to measure the performance experienced by an end user, thus we want to make these measurement from his perspective, a person who does not has administrative access to the backbone network, which makes the measurements all the more difcult. It is much more challenging to infer networks internal characteristics from end-to-end measurements. And to make these measurements in an non-intrusive manner such that results of our measurements are not inuenced (or least inuenced) by the method of making these measurements. Challenging it is, however measurement of network performance is very useful. It is provides valuable information about improving performance, assessing utilization, engineering trafc and validating 1

1 Introduction
Over the years we have seen tremendous growth in communication network complexity. The complexity is also reected in campus networks, such as IIT Bombays campus network. IIT Bombays campus network is a hierarchical network with an array of routers, switches, hubs, over 5000 users and larger number of services running at different levels. At the bottom most level we have a user, accessing the network from his machine. A group of users in a building form a subnet, which are connected to backbone routers through switches and hubs. And nally there is a NAT box cum proxy server which provides WAN access to users. User has multiple applications running on his machines, routers and switches perform the function of routing and trafc measurements, and rewall

design choices.

which are measured directly by conducting active experiments or passively by reading logs, while the properties which are 1.1 Problem Statement inferred from network parameter are dened metrics. In [1] We aim to study the campus network of IIT Bombay to authors have provided a classication of network properties achieve following goals, on the basis of entity they are associated with, viz Node properties, Link properties and Topology properties. In following Monitor the performance of campus network, such that sections we look at each of them in detail. in case of network failure, we are able to detect and diagnose the exact cause of failure. By failure in network 2.1.1 Measured Node Properties we mean, if a part of it be it a link or a server or a router IP aliases A router or switch has multiple interfaces for mulor a software component is malfunctioning or not at all tiple domains, and each of these interfaces have an adfunctioning. These failures include, mis-conguration at dress associated with them. These addresses are nothend users machine, failure of link or an intermediate node ing but aliases for the node however they can very highly such as switch or a router, failure of proxy or the DNS from each other. to identify these aliases might lead to server and failure of the WAN link or the server itself. incorrect or inconclusive results. In [2] Govindan et. al, Study the trafc behavior pattern of hostel and academic area. We aim to characterize the trafc in hostel area and the academic area, to study type of trafc, composition of trafc based on applications used and effect of the trafc in hostel area on trafc in academic area. By the end of this thesis, we will develop a tool which can keep a track of network status, and helps network users to diagnose problems effectively and efciently. By this we mean that we are able to correctly and quickly, detect and diagnose the cause of failure. Also, we will characterize the network trafc within campus and help in identifying applications that are not effectively using the network. And to set the expectations of end users accurately, so that they know what to expect from the network. To the best of our knowledge there hasnt been any such measurement study at the campus network level. 1.2 Structure of this Report In Section 2 we enlist all the network properties and a tool/method to measure these properties. Section 3 we present a case study of IIT Bombay campus network, describing some of the problems faced by the network users within campus. We present results of some preliminary experiments conducted by us in Section 4 and nally we conclude with our plan for the next two stages of this project in Section 5. have developed a network measuring tool called Mercarator to identify these aliases. They sent a packet addressed to a non-existent port, and router/switch replies back with an ICMP port unreachable message. If a probe to interface X, contains the source address on it as Y, then X and Y are aliases for the same router. Owner Internet is one of the fastest growing means of information ow today, and it services a wide range of people from an average web browser to system administrators. Ownership of a node thus becomes important, for different users may have different policies on their machines which along with user behavior dene the way, how that node behaves. There hasnt been much work done in this direction due to privacy concerns. In a planned campus network like scenario it can be estimated by looking at the IP address of the node, while it becomes much harder at level of internet because of its distributed and unplanned nature. However DNS entries and whois database search are some of the possible ways to determine the ownership. Geography With onset of location aware applications, mapping a node to its geographical location has gained a lot of attention from internet community. This information is useful in determining Quality of Service of node experienced by node and can also help in explaining some of the performance metrices. It is done primarily by probing a list of addresses using traceroute, and mapping the list of intermediate routers to geographical locations by using the domain name information. Router role This property refers to the role of router in topology, a router can be either backbone router or edge router located at the edge of network. Rocketfuel [3], proposed by Spring et. al, helps in identifying the role of router. It uses the DNS information to distinguish edge routers, as directly connected egde routers have different naming conventions. DNS naming conventions were obtained from publicly available data of the ISPs and for those 2

2 Literature Survey
In this section we summarize the work done by other people done in the eld of network measurements. It is not a comprehensive summary in any way, just a summary of literature survey done by us to study the network properties which have been studied and measured by other people. Finally in Table 1 we present the tools (or method) which we plan to use to measure the properties in our thesis. 2.1 What to measure: Network Properties Due to complexity of network measurements none of the studies so far have been able to present a wholesome picture of network, and have largely focussed on a subset of network properties. We dene network parameter as the properties

which it was not available neighboring routers information was used to obtain the naming conventions.

Pathchar is unique in its ability as it is able to measure bandwidth of all the links in path.

Implementation features This refers to identifying TCP Reordering Due to multiple routes between two hosts, somefeatures and services running on a node. This informatimes the packets get delivered in different order. It is thus tion is useful for determining the trafc composition as important to identify these out of order packets because well as to determine how quickly various features are dethey are an indicative of multiple paths in topology. S. ployed in todays network. nmap can be used to scan Savage have proposed Sting [6], a TCP based tool to meathe node for services running on various ports. However sure the reordering of packets. They have exploited the work is needed to be done to deploy such a technique on TCPs error control mechanism to identify the reordering the scale of Internet. of packets. If an out of sequence packet is received by the host it replies back with the sequence number of the 2.1.2 Measured Link Properties last packet it was expecting. This method is also used by Loss Packet loss over the link is one of the most studied propthem to measure loss rate of a link. Initially a series of inerties of links. It can occur due to network congestion sequence packets are sent and then packets are sent with where the server or intermediate router starts dropping sequence number as that of the last packet in the initial packets because of input buffer getting overowed, it can sequence. If a packet was somewhere lost during the inialso occur because of packets getting corrupted and being tial phase, receiver would reply back with the sequence dropped at intermediate node or server. Pathchar [4] is a number of the lost packet. tool developed by Van Jacobson to measure the packet loss in a link. However, Pathchar only measures end-to- Delay Variation As mentioned above, it is the queuing delay which introduces randomness in network delay. This end packet loss. For measuring packet loss of individvariation in delay is important for real time trafc which ual links Usgaonkar et. al [5] have proposed an adapis one of the dominant component of todays network tive approach. They progressively increase the number trafc. Cing [7] proposed by Anagnostakis et. al, meaof probes to nodes based on the loss rate in intermediate sures delay variation using ICMP time stamp requests. routers, thus compensating for the intermediate loss. For calculating the delay variation on link between two Delay Delay is measured as time taken by packet to reach node X and Y , they send ICMP time stamp requests from from one machine to another in a network. It is a sum a node S to X and Y , such that path from S X is of propagation delay, queuing delay and transmission desubset of path from S Y . The queueing delay is lay. Given the bandwidth and size of the link, propagathen calculated as follows. = tY tX = tqueuing + tion delay and transmission delay are deterministic. It is tpropagation + ClockOf f setX,Y , where tY and tX are the non-deterministic nature of queuing delay which adds replies of ICMP time stamp requests to Y and X, respecthe randomness to packet delay and thus making its meatively. Now a minimum of is calculated over n such surement harder. The lack of time synchronization across probes. This min is assumed to be free of queuing delay hosts in network makes the experimental determination over a large n. Thus the difference, i -min , gives the of delay that much harder. internal queuing delay of link between X and Y . Round Trip Time or RTT, is measured directly at the sender by measuring the time difference between the time 2.1.3 Measured Topology Properties when packet was sent and time when acknowledgment was received. There are several tools which give us RTT, Topology Efforts have been made to study network graph of Internet at different levels. Routeviews [8], is used to such as Ping. It is the one way delay, that is hard to meastudy the graph at Autonomous System level. It colsure. Time synchronization protocols are used to synlects data, BGP route updates from twelve routers placed chronize the clocks on two machines, which is then used around the world and the peering ASs which share there to measure the one way delay. data with these routers. Using this data it tries to create Bandwidth Bandwidth or throughput is the amount of data a AS level graph of internet in which each node is an AS per unit time which is delivered over a link. In network and two nodes are connected if there is a link between measurements it is sometimes used exchangeably with corresponding ASs. It also archives the list of IP prelink capacity, which is dened as the maximum amount xes owned by each AS. of trafc which can be transmitted over the link. Thus Mercarator [2], on the other hand generates a router bandwidth is data rate which we get from our experlevel topology of Internet by sending hop-limited probes. iments. Pathchar [4] measures the link bandwidth by These hop-limited probes are IP packets which are limsending variable size packets and performing a statistiited to number hops they reach by controlling the TTL cal analysis of the results to compute the link capacity. eld in IP header. ICMP response on expiration of TTL 3

leads to detection of routers. The novelty of there approach lies in determining the targets of their probes. They try to guess the IP prexes which can be probed for some addressable node within that prex, it is here that novelty of their approach lies. Following two methods are used to guess the addressable prex: First, if a node with IP address A responds to its probes, it assumes that some prex P of A might contain some other addressable nodes. Secondly, for a prex P, it is assumed that neighboring prexes of P are also addressable.As mentioned earlier they have also developed an alias resolution mechanism to identify different aliases of same node. Cooperative Association for Internet Data Analysis (CAIDA) had designed the Skitter tool [9] for discovering IP level topology of Internet. It probes an IP address from already generated list every two minutes to record the RTT and intermediate nodes. This list of IP addresses is obtained using the logs of NLANRs squid proxy server, an IP address is randomly added from /24 segment of IP address space and DNS servers and routers encountered during the initial phase of the probing. Routing In networks where routing policies are not made public to users, such as the policies used by ISPs, discovery of routing policies is important to determine the performance of network. A sudden route change can be because of the congestion and can also be because of change in routing policies. It is in such scenarios determining routing policies becomes important. Rocketfuel [3], can be used to discover routing policies at the router level. They have used traceroute to discover the router level maps and policies of the network.

the SLAs. Also, for the network administrators, it reveals important information about network so as to why the network is underperforming. It is dened as the ratio of bandwidth available to total capacity of the link. Pathchar is used to measure the available bandwidth and thus can be used to measure the link utilization. However, none of the tools which exist today can accurately measure available bandwidth, because of the cross over trafc in network. However as proposed by authors of [1], delay variation (measure using cing [7]) and loss at router, can be used to estimate utilization. 2.2 How to measure: Architecture of Measurement Tool In this section we describe some of the design related issues of Network Measurement Tool. There are some important decisions to be made with regard to a network measuring tool: type of data collecting mechanism to use and point of control, whether it should be distributed or central and last but not the least, the location of point of observation. 2.2.1 Type of Data Collection Mechanism To collect network performance data a tool can either actively inject data into network to make active measurements or can use the logs collected from different entities to make passive measurements about the network.

Monitoring using Active Probes This method has several advantages as it treats the system as black box and is useful for making end-to-end measurements. And since measurements can be done at the probing machines not much of co-ordination is required between the probing machines. However it has one major disadvantage, it injects synthetic trafc in network, which might inuence the results obtained. Also, in case of failure or net2.1.4 Inferred Network Properties work congestion this method is not useful to pin-point the The properties we discussed so far can be directly measured in cause of failure or congestion. Iperf, Ping, Traceroute, network, however there are properties which are determined PathChar are some of the active measurement tools. from these measured properties. These properties are called as metrics. These help us analyze the network performance in a Monitoring using Passive Probes Just like active monitorbetter way. Some of these include: ing system the biggest advantage of this method is that it does not injects any trafc into network and does not Failure Most of the networks today are so large that detection inuences the data obtained. Also, it can be used to diand isolation of these failures is an enormous job. Howagnose the root cause of failure. However, data obtained ever, a very interesting result [10] shows that most of the at single probing station may not be sufcient to measure network failures last only for few minutes at maximum. all the network paramters. Data collected from several This is probably because of the distributed nature of most probes is needed to be consolidated at one (more than networks. Some of the long lasting failures are detected one probes can also do this) probing station to make inby measuring the changed in network parameters. For exferences about network parameters. Also, it can not be ample, an unexpected change in path traversed by packet used for end-to-end measurements and can be only used might be an indicative of node failure along the path or for the points between which probe is installed. Nagios, failure of link(s) along the path. NetProler [11], develMRTG are some of the commonly used passive measureoped by Padmanabhan et. al, compares the end-to-end ment tools. experience of different users within a network to identify the reason for failure. 2.2.2 Point of Control Utilization Link utilization is an important metric for mea- There has been a perennial debate going on between the pursuring performance especially for the ISPs because of porters of centralized architecture and distributed architecture 4

Table 1. Properties we wish to measure and tool (or method) we plan to use to measure them

Properties Bandwidth Loss Delay Delay variation Owner and Geographic location

Tool (or method) Pathchar [4] Pathchar and method proposed by Usgaonkar et. al [5] Ping to a machine or measuring the time taken to access the website using output of wget Method proposed by Anagnostakis et. al [7] IP address (ours is a planned network)

since the time distributed architecture has been proposed. Both 2.3 Properties of interest to us methods have their own advantages and disadvantages, and In Table 1, we summarize the properties which we plan to should be chosen with respect to the situation at hand. measure and the method or tool which we plan to use to meaCentralized Architecture The biggest advantage of a tool sure them. Ours is a planned campus network hence some with centralized architecture is that the infrastructure of the network properties are already known to us, such as needed for using this kind of tool is minimal. All the Network Topology and IP aliases of the routers. Some other measurement can be performed at a single host without properties such as Ownership of the node and Geographic loany co-operation from other hosts in the network. There cation can be inferred directly by looking at the IP address of is a complete control over type and time of measurements the node, since most of the machines in network have static IP performed. And the biggest disadvantage of such an ar- addresses. For inferring the properties such as Failure we plan chitecture is that it does not gives a complete picture of to study the packet trace as mentioned in [10] and compare the the network, it only gives the central systems view point experience of multiple users across the campus to diagnose the of network which might be biased because of the loca- cause of failure as proposed in [11]. tion of the system. Also, it is more vulnerable to failure 3 IIT Bombay Network: Background because of centralized nature. Failure of link around the The IIT Bombay Network is a complex and vast entity, procentral system can lead to failure of the tool. viding network access to a large group of faculty, students and Distributed Architecture The biggest advantage of a dis- staff. Round-the-clock monitoring and management of such tributed architecture for network measurements is mul- a network is a daunting task. To understand the challenges tiple viewpoints obtained from different observers in net- involved in measurements of such a network it is important work. Networks today are so vast and complex that it to rst study its structure, understand the kind of trafc that is hard for a single node to monitor the whole network. ows through it and the services that are run over it. This secAs shown by Padmanabhan et. al [11], performance in- tion outlines the structure of the network in IIT Bombay and formation from multiple clients can be compared to di- gives a description of its various components emphasizing on agnose the root cause of failure with promising results. the problems faced by different group of people using the netHowever, a major disadvantage of such an architecture is work. the bootstrapping of such a system into existing infras- 3.1 Network Structure tructure. Also, location and number of hosts to collect The campus network is arranged in a hierarchy, with ve such an information from is also a major design chalrouters located in Computer Center, Computer Science Delenge. partment, Aerospace Department, Hostel-8 and Hostel-3 form2.2.3 Point of Observation ing backbone of network, as shown in Figure 1. The routers In complex networks which exists today, be it Internet or a interconnect several building subnets. Within the buildings, campus network, the placement of network measuring tool hubs and switches fan out to the individual Ethernet outlets, is an important design decision. Placing it near the server connecting more than 5,000 users across campus. Thus for [12], can reveal some interesting characteristics about differ- the purpose of management it is broken down into following ent types of clients such as the type of services they are us- sections according to their location, ing. Similarly, monitoring routers and intermediate links gives important information about the health of network. However, ultimate goal of any network is to provide access to a client. None of the other two methods reveal information about endto-end health of network as experienced by an end users. Thus placing it near client reveals the network performance experienced by end users. In the end we can say that location of point of observation depends upon the goal of the tool. 5 Resnet access to residential area (faculty, staff and their families) Hostels access to student and staff hostels Acad access to the academic area Admin access to the administrative block

Wireless wireless access at selected locations in the institute

it is routed to the central CC router in KReSIT and then forwarded to DMZ after passing through a rewall.

In the DMZ, Internet bound packets are forwarded to Physically, wire from an ofce or lab or user machine goes to the squid proxy server, Netmon, which authenticates the one of several communication equipment rooms in the buildpacket based on LDAP username and password. After ing and connects into a hub or switch. These hubs are interconthis it is passed to a rewall cum NAT server, which pernected via either a shared Ethernet segment or a switch in the forms the NATing for the ow, of which packet is a part buildings main communication equipment room. This hub or of, and also checks for any irregularities in the packet. switch is then connected via ber-optic cable to a router port The high packet generating applications such as YAHOO in one of the ve campus routers. Using switches in buildings, and MSN messenger are allowed to bypass the Netmon network trafc in different buildings is kept separate. That proxy server and are directly sent to the rewall which is, computers in one particular building can not see the trafc performs the NATing on the packet. from any other building. IIT Bombay local network is connected to external network via three leased broadband lines (8 Mbps, 8 Mbps, 16 Mbps). External Trafc ow Incoming packets, after passing through the rewall, enter the DMZ. Here again, packets Trafc destined to nodes outside campus if ltered at the cenare serviced according to their destination, those destined tral CC router kept in KReSIT building and ows to the outto the servers in DMZ are serviced directly. While bound links after passing through a rewall and DMZ. The those destined to hosts within network, are sent to the DMZ contains servers which host services that are accessible respective router after performing a reverse NAT lookup. from outside the institute, such as HTTP, SMTP and DNS, besides a group of machines implementing proxy servers and 3.3 Network Services rewalls. It is designed as follows: Computer Center [13], provides, manages and monitors CC-580: CC router in KReSIT where the internal trafc various network services to all the users in the institute. These arrives services include electronic mail, FTP, world wide web, DNS Netmon Proxy which provides HTTP connections from and many other services. In this Section we describe some of these services, which are of interest to us. inside to the internet iGarbo: machine which provides rewall for trafc from 3.3.1 Web Proxy inside to outside. Performs packet check, NATing and Squid proxy server, internally referred as netmon, serves the forwarding of packet to destination campus network as web proxy. It performs the function of web Garbo1, Garbo2, Garbo3, Garbo4: machines which pro- caching, content ltering, user authentication, ad-blocking and bandwidth shaping. As mentioned in Section 3.1, campus netvides rewall for packets coming from outside work is connected to Internet via three WAN links. Proxy wum1, wum2, wum3: machines which balance the re- server also performs load balancing on these WAN links using Ultra Monkey [14]. Proxy server is essentially a cluster quests from outside among the proxy servers of 3 machines which appears as a single server to end-users 3.2 Trafc Flow because of Ultra Monkey. This is essentially done by providDepending upon the destination, packets within campus ing virtual server as front end and using real servers as back network are classied into two categories: Inbound trafc con- end. Depending upon the destination packets are sent on the sists of packets which are destined to nodes within the campus corresponding WAN link. Packets destined to journal sites, network or to the servers hosted on the DMZ. While Internet SciFinder and other educational sites are sent over the faster bound trafc comprises of packets which are destined to nodes links. outside the campus network. Packets from each of these class 3.3.2 Domain name service have a different ow within network as follows: There is a local hierarchy of DNS servers within IIT Bombay Internal Trafc ow Packets generated at the host, if des- network. Each building has a local DNS at lower level with IIT tined to another host within its own subnet, then they are Bombay DNS at the upper level. Queries for addresses with delivered to the destination host via switch. However, if .iitb.ac.in sufx are resolved by the DNS at the local subnet the packet is destined to a node outside the subnet, it is or by the campus DNS, dns.iitb.ac.in, while all other queries forwarded to the router interfacing the subnet. Inbound are forwarded to the DNS server provided by the ISP. There is packets are then forwarded to the router which interfaces a small cache of queries maintained at each of these servers. with destination nodes subnet. On the other hand if it dnscache, an open source recursive name resolver is used to is Internet bound packet or meant for servers in DMZ, implement the DNS. 6

Towards H6,7 8, 9, 11,12and 13

H8 H3 Towards H1,2,3,4,5

Firewall

DMZ

Firewall

CC

WAN Links (8+8+16 Mbps)

Towards Residential Area, MB and Wireless

AERO

CSE

Towards Depts

Towards Depts

Figure 1. Overview of IIT Bombay Network Map

3.4

Firewall

A packet coming into or going out of the campus network has to pass through two different rewalls, one on either side of the DMZ. The purpose of two rewalls is to protect internal network in-case DMZ servers are compromised. These rewalls are implemented using, iptables [15] - the open source rewall tool. It consists of rules for how to deal with packets. These rules are grouped into chains, an ordered list of rules. Further, these chains are grouped to form tables, there are three basic tables containing some predened set of chains. These are the Filter table, NAT table and Mangle table. Filter table is used to packet ltering. It is used to restrict the services available to the network users within campus. It does so by blocking the trafc on specic ports, for example outgoing ssh has been blocked from everywhere except the department machines (which can be identied from their IP address). NAT table is used for rewriting packet addresses and ports. It is through the use of NAT table that rewall also acts as the NATing agent. Connection tracking is done to keep track of states and expectations. SNAT is used for changing the source address while DNAT is used for changing the destination address. Mangle table is used for modifying packet options and hence enables trafc shaping. TOS, TTL and MARK eld of the IP header are modied. By changing the MARK eld and using iproute2, specic routing is achieved. 3.4.1 Network Monitoring Services Network administrators at CC provide some of the statistics about network performance using some of the commonly available tools. These include: MRTG (Multi Router Trafc Grapher) is uses the SNMP(Simple Network Management Protocol), for 7

monitoring and measuring the trafc load on network links. Data is collected every 5 minutes and results are plotted vs time into day, week, month and year graphs. Note that MRTG is used to monitor the WAN links, hostel switches, staff hostel switches and department switches . Nagios is used to monitor hosts and services, such as SMTP, POP3, HTTP, NNTP, ICMP, SNMP, FTP and SSH running on them. An online web interface is available where users can check the status of hosts and services. Currently it is used to monitor only the routers and switches of different subnets. It is installed at central CC router and it continuously checks these machines for services(SNMP) running on them. Mail logs are also provided giving a count of number of incoming mails, outgoing mails and their sizes along with number of mails queued up in the IMAP server. Proxy Servers usage statistics is also provided using MRTG. 3.5 User Behavior and Problems

To facilitate the process of network management, campus network has been divided into different sections, Section 3.1. A study of IIT Bombays web trafc [16], reveals that trafc from hostel area (65%) signicantly outnumbers the trafc from hostel area (30%). Interestingly, the users in each of these sections have vastly different web usage. On the basis of type of network usage they can be classied into: academic and hostel users. There can be another class of users, administrative users. However, they are largely responsible for network management and their web usage behavior is captured by other two classes.

Some of the common problems/failures/faults in the netEach of these user has certain common usage characteristics, for example all of them still use email services. However, work, which are cause of some of the above mentioned issues they have certain characteristics peculiar to their group. Aca- are: demic users have higher web usage, accessing journal sites Bombardment of netmon by unauthenticated requests. and HTTP content(text + images) access. While hostel users These requests can be generated by a virus infected host largely access multimedia content and heavy trafc generator by certain applications which dont have any mechaing applications such as instant messengers. It is also known nism to handle the HTTP 407 request sent by netmon to that a large amount of multimedia content is downloaded using authenticate themselves. Such applications include live HTTP tunneling, since the content type is masqueraded here update of many softwares such as Adobe. to appear application, this also contributes to the application trafc of the users in hostel area. Problems at the side of ISP, DNS or link failure. There is another peculiar characteristic of trafc in hostel Power breakdown of node(switch, router or server) and area. A large amount of P2P trafc is seen in hostel area, arislinks physically getting cut. ing because of le sharing between users or trafc generated by network games played over local area network. Work done Duplex mismatch across hosts in network. This occurs by Sen et. al [17] shows that P2P trafc can have effect on when one of the two communicating host is in full duplex underlying network. P2P trafc was found to be more stable mode and other is in half duplex. as compared to the web trafc, thus it can upset the web experience of other users in hostel area. Software upgrade or some maintenance work or migraBased on above hypothesis campus network users face foltion of servers. lowing issues, and these issues raise some very interesting 3.6 Types of Failures questions. We are looking at clients experience in an end-to-end trans Extremely low WAN bandwidth. An aggregate of 32 action and we categorize any abnormal user experience as failMbps bandwidth from all the WAN links, theoretically ure. The work done by Padmanabhan et. al [18] classies this looks sufcient for 5000 users in campus. But then transaction failures into following three categories: it raises the question of why is network underperforming? Is heavy work load on proxy and/or WAN links the 1. DNS Lookup Failure These refer to failure in resolving only reason? Can some internal link be underperformIP address of the target machine from the domain name. ing? What does utilization map of internal links looks It can occur due to one or several of following reasons: like? (a) Failure to connect to the DNS server either because People not able to access Internet at all. This occurs out of server being down or because of failure of some of many reasons, Proxy server not running or DNS not intermediate link or mis-congured client using inrunning or WAN links may be down. What is the exact correct DNS server. cause and solution to this problem? (b) Failure to resolve name at some DNS server, higher in DNS hierarchy. People in some subnets are able to access the network while those in other subnets are not able to access. Is this (c) Not being able to resolve the name at all. because of some software service or hardware failure in 2. TCP Failure These failures occur when client is able to a subnet? Is using a static routing table in routers a wise resolve the domain name but their is some failure in TCP decision? connection before or after establishing it. It can occur Certain useful services are intentionally blocked, for exbecause of following reasons: ample video streaming, voice conferencing (in Yahoo and (a) Failure to establish connection. It can occur because Google Talk), certain domains being blocked, etc. People of failure of some intermediate link or server being resort to using tunneling, anonymous proxys to access down or user is unable to authenticate itself to the these services, which again in the end increases the load proxy server. on network. Is that an efcient solution to the problem? Can a content based blocking/ltering of these ows be (b) Client was able to establish the connection. Howused? ever, due to failure of server application or overload, server does not responds to clients requests. Network connection getting reset every now and then. Is (c) Due to network congestion or failure of network this because of temporary failure of proxy server? Or link in path, client receives only a part of servers some is it the problem at the servers end or somewhere response and connection is terminated prematurely. outside the campus network? 8

3. HTTP connection Failure Here client was able to resolve the targets IP address as well as establish a TCP connection to it. However, it is not able to establish a HTTP session because of Page not found or some other errors. Above failures can be categorized into following categories on the basis of location where they occur. Client Side If the failure has occurred at a particular client while other clients in same and other domain are able to connect to the server then failure is probably because of some mis-conguration or software or hardware failure at the client side.

Figure 2. DNS and Proxy Up/Down time

burst length, Figure 3, shows a very interesting result, dns Server Side If none of the clients are able to connect to and proxy server are down for a burst length of three minserver then failure is probably because of failure of ser- utes roughly 70 % and 80 % of the time respectively. This is vice at server or server itself. probably of overload at these servers that they are temporarily unavailable either because of input buffer over ow or server Network Side If some of the clients, within a domain, goes down because of overload. are not able to connect to server while others in different domain are able to connect then it is probably because of some network failure in rst domain. The network failure can correspond to anything from switch failure to physical link being down.

4 Experiments and Results


In Chapter 3.6, we classied some of the failures in an endto-end transaction. In this Chapter we present our results of simple experiments to determine the frequencies of these failures. 4.1 DNS Lookup and Proxy Failure This experiment was motivated by the fact that users had been experiencing intermittent WAN access. In this experiment we tried to determine the frequency of uptime/downtime of Proxy and DNS. In a 33 hour experiment from 13:00 hrs, 15 June to 22:00 hrs, 16 June we we used dig and nmap to continuously probe the DNS (dns.iitb.ac.in) and Proxy (netmon.iitb.ac.in) respectively, at a probe rate of 1 probe every 3 minutes. Figure. 2 show our results. A value of 1 indicates that DNS was up and was able to process the query, it might or might not have been able to resolve the address. Since we were probing for random address, we are not sure if domain exists or not. And a value of 0 indicates that either DNS was down or we were not able to connect to DNS. Using nmap we were scanning if the port 80 on proxy was open or not. A value of 1 indicates that port 80 was open while value of 0 indicate that it is closed. It is interesting to note that 8.67 % of time DNS was found to be down or network was down. Similarly, 7.51 % of time Proxy server was found to be down. And 10.1 % of the time one of them was not working. Thus resulting in web access failure 10.1 % of time. Another interesting observation is that both of them are prone to failure and these failures occur at random intervals. The cumulative down time graph for the 9
Figure 3. Cumulative graph for down-time burst length of DNS and Proxy

4.2

Web access delay

We wanted to capture the effect of trafc in hostel area on the trafc in academic area. Thus by this experiment we have tried to capture the effect. In this experiment, three URLs were repeatedly fetched using wget, with no-cache enabled. These three URLls were: berkley 1 , wash 2 and niehs 3 . This experiment was run from two machines, one in New Software Lab (10.105.11.11) in Mathematics Department and other from a machine in hostel area (10.13.22.19). The script in NSL was run from 23:30 hrs on 24 June to 11.10 hrs on 26 June and the script in Hostel-13 was run from 12:10 hrs on 25 June to 1:30 hrs on 26 June at request rate of 1 request every 10 minutes. Two values are measured for each experiment, output of time command and output of wget command. Using the output of wget command, again the time taken for download is measured. time command gives us the total time taken for the queuing delay for the request at proxy, dns-resolving
1 http://www.cs.berkley.edu 2 http://www.nps.gov/history/NR/travel/wash/text. htm#intro 3 http://kids.niehs.nih.gov/text.htm

there is a LAN curtailment in hostel area. The average output of time increases from 90.8 sec to 211 sec and wget output increases from 81.9 sec to 188.8 sec. And that for hostel machine is 146.7 sec and 144.6 sec respectively.

Figure 4. Combined output of time" for three URLs

Figure 7. Comparing the wget" output of hostel and nsl for wash

berkley received noticeably better bandwidth (shorter delay) as compared to wash and niehs, this is probably because of differentiated service at proxy, where certain academic related websites are sent over faster links. At initial peak in berkley Figure 4, it was not able to fetch berkley because of Gateway Figure 5. Combined output of wget" for three URLs Time out Error, while it was able to fetch other URLs. This probably occurred because of some error in upstream path for time(etc). While time from wget gives us the time to down- berkley or the server itself may not have responded. load the page after the http session has been established.

5 Plan for Next Stage

By the end of next stage we plan to Study of NAT box cum rewall, how does that effects the performance of network. We plan to nish this within next 10 days. Perform a statistical analysis, involving the frequency of occurrence of different types of failures, as mentioned in Section 3.6, which occur in an end-to-end web access. We have seen that people in academic area experience quite a different performance from people in hostel area and causes of failure can vary a lot. We will perform a study of packet trace to identify the exact cause of failures. We will perform this analysis for different types of users. We plan to nish this analysis by end of August. Design the architecture of our measurement tool and start implementing it. Acknowledgments I would like to thank Prof. Purushottam Kulkarni and Prof. Bhaskaran Raman for their continuous guidance and support in making this project an informative experience for me. I would also like to thank Nithin Dara, for his suggestions and discussion during initial part of the project, and Dhananjay Sahasrabduhe for providing me information regarding the IIT Bombay campus network.

Figure 6. Comparing the time output of hostel and nsl for wash

Figure 4 and 5, plot the combined the time and wget output respectively for three URLs. Figure 6 and 7, compare the performance of hostel and nsl machine for the time when both of them were running simultaneously. The secondary yaxis in Figure 4 is for the output of wash and secondary y-axis in Figure 5 is for the berkley URL. From Figure 4 and 5, we can see that performance of nsl machine signicantly suffers when hostel area network is running. It performs much better from 4:00 hrs to 7:00 hrs, when

10

References

[18] Venkat N. Padmanabhan, Sriram Ramabhadran, Sharad Agarwal, and Jitendra Padhye. A Study of End-to-End [1] Neil Spring, David Wetherall, and Thomas Anderson. Web Access Failures. IEEE/ACM Transactions on NetReverse Engineering the Internet. In Second Workshop working, 12, April 2004. on Hot Topics in Networks, 2003. [2] Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map discovery. In proceedings of Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, 2000. [3] Neil Spring, Ratul Mahajan, and David Wetherall. Measuring ISP Topologies with Rocketfuel. In Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 2002. [4] Van Jacobson. pathchar - a tool to infer characteristics of Internet paths, 1997. [5] Ameya P. Usgaonkar. Network Performance Analysis by Mining Multi-Variate Time Series Data, January 2001. [6] Stefan Savage. Sting: a TCP-based Network Measurement Tool. In Proceedings of the Second Conference on USENIX Symposium on Internet Technologies and Systems, 1999. [7] Kostas G. Anagnostakis, Michael Greenwald, and Raphael Ryger. cing: Measuring network-internal delays using only existing infrastructure. In proceedings of IEEE Infocom, April 2003. [8] David Meyer. Routeviews. routeviews.org. http://www.

[9] Bradley Huffaker, Marina Fomenkov, David Moore, and Ke Claffey. Macroscopic analyses of the infrastructure: measurement and visualization of Internet connectivity and performance. In proceedings of Passive and Active Measurements, 2001. [10] Gianluca Iannaccone et. al. Analysis of link failures in an IP backbone. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, 2002. [11] Venkat N. Padmanabhan, Sriram Ramabhadran, and Jitendra Padhye. NetProler: Proling Wide-Area Networks Using Peer Cooperation. In Fourth International workshop on Peer-To-Peer Systems, 2005. [12] Mark Allman. A Web Servers View of the Transport Layer. ACM, CCR, July 2004. [13] Computer Center, IIT Bombay. iitb.ac.in. http://www.cc.

[14] Ultra monkey. http://www.ultramonkey.org/. [15] iptables. http://www.netfilter.org/ projects/iptables/index.html. [16] Nirav S. Uchat. IIT bombay web trafc characterization. [17] Subhabrata Sen and Jia Wang. Analyzing peer-to-peer trafc across large networks. In Proceedings of the 2006 ACM CoNEXT conference, 2006. 11

Você também pode gostar