Você está na página 1de 12

WHITE PAPER

FOUR ESSENTIALS FOR A FLAWLESS IT ENVIRONMENT


Have you ever considered the fact that your companys IT environment, along with your company culture, constitutes the prime catalyst for getting the maximum mileage out of your business? This means that the most crucial task of your IT department is to safeguard good functioning of your IT environment. In this white paper we provide you with helpful, hands-on advice on how to enhance the quality of your IT infrastructure by quick and efficient troubleshooting of any problems that may occur.

Want to know more about Netrounds? Please contact: Marcus Jonsson, Product Expert E-mail: Marcus.jonsson@netrounds.com Mobil: +46 70 666 27 33

Netrounds Solutions AB Storgatan 9 SE-972 38 LULE SWEDEN

www.netrounds.com info@netrounds.com +46 920 42 00 15

1. Full Productivity Requires a Flawless IT Environment


A most interesting survey of employees perception of their IT environment was conducted by the Danish network operator TDC jointly with Swedish market research company TNS SIFO. It returned the following findings: In companies where employees reported that they are using poor IT and telephony solutions, the respondents estimated that they spent an average of 2.5 hours every week battling with malfunctioning technology. In companies with well-functioning IT and telecom solutions, technologyrelated time waste averaged only 1 hour per week. Thus, companies with poorly performing IT environments stand to gain 1-2 productive man-hours per employee every week by improving their IT.

Up to 2 hours/week

In addition to this tangible potential for improved efficiency, a couple of further important observations could be made: Companies with sound IT and telecom solutions have 150% more satisfied employees than companies with poor IT and telecom solutions. 92% of employees think that good IT solutions are important for their satisfaction in the workplace. This is the exact same percentage of employees who find a high salary important.

Now, suppose that 100 employees spend one extra hour per week caught up in IT trouble caused by underperforming technology. Assuming a total per-head cost of $60 per hour, these problems will cost this company about $6,000 each week, which equates to a loss of production capacity of $24,000 per month. To this we must add loss of revenue due to reduced productivity and competitive power. Also considering that your employees will be less comfortable in the office, we easily realize the importance of smoothly working IT infrastructure. The combination of a good IT environment and a winning company culture creates an unbeatable formula for maximizing the performance of your business.
Winning company culture

Good IT environment

Maximized performance

Conclusion: The most important task of your IT department is to safeguard good functioning of your IT environment.

2. The Necessity of Efficient Fault-tracing


You should constantly ask yourself how you can improve. In many organizations, there is a potential for improvement in ensuring good functioning of applications and stable operation of the IT environment, as well as in speeding up the faulttracing process when the problems nonetheless do arise. It is vital that you quickly arrive at an understanding of what remedies are needed in order to minimize the impact on your business when a problem arises. As your organization matures, it is natural that your focus should shift gradually from fixing urgent issues (although these will of course still appear every now and then) to addressing less critical but still quality-impairing problems in your IT environment. This idea pervades the discussion that follows.

3. Four Essentials for a Flawless IT Environment


In this white paper, we divide IT fault-tracing into four categories and describe how you can excel in each category.

Fix critical interruptions

Prevent interruptions

Fix critical disturbances

Prevent disturbances

1. Fix critical interruptions When: resources in the IT environment stop working. 2. Prevent interruptions When: resources in the IT environment are severely strained. 3. Fix critical disturbances When: users complain about unresponsive applications and quality issues. 4. Prevent disturbances When: response times are lengthened and packet loss occurs.

Essential #1: Fix critical interruptions: Emergency measures Fix critical interruptions Prevent interruptions Handle critical disturbances Prevent disturbances

Most IT departments today are using a monitoring system to quickly find out if a router, switch, or server suddenly stops working. Normally, the monitoring system works by pinging your nodes at regular intervals. If a node stops responding to pings, an alarm is raised which is communicated to the relevant persons in your organization to act upon immediately. For example, there might be a need to restart or replace a malfunctioning power supply unit. There are a vast number of ping-based monitoring systems on the market, both commercial and open-source. Examples include op5 Monitor, Pingplotter, and Whats up Gold. As is evident from the designation critical, these are situations in which an interruption has already occurred and emergency measures are called for to restore functioning. To maintain smooth operation and reduce the risk of critical conditions arising, preventive measures are also required. Essential #2: Prevent critical interruptions: Supervision Handle critical interruptions Prevent interruptions Handle critical disturbances Prevent disturbances

As a complement to their emergency measures, many IT departments also engage in preventive monitoring of network elements and resources. This may include monitoring of the load on a particular switch interface or router CPU. It may also extend to supervising server resources, such as free hard disk space, free RAM, or CPU temperature. The data is usually retrieved from the standard protocol SNMP, Simple Network Management Protocol, but certain scheduled command line tools also find some use. This type of supervision is important to allow you to take action before the problems affect your users, for instance by increasing the link capacity in a router, or beef up the memory on a server in good time before there is a negative impact on user experience. In many cases, the system relied on for emergency measures can also be used for preventive purposes. Examples of such systems are op5 Monitor, Solarwinds Orion, SNMP-c, and Cacti.

There is no insight, however, into how the data traffic affects the applications using the network elements. This means that even if every individual network element seems to be working fine, it is unfortunately still possible that the user experience is not acceptable. This can be very frustrating to your users as well as to your IT department: the applications have problems, even though all network elements seem to be in perfect order. Problems like these tend to take a long time to sort out. It is therefore vital to be able to continuously monitor the quality of the traffic passing through your network. Essential #3: Fix critical disturbances: Emergency measures. Handle critical interruptions Prevent interruptions Fix critical disturbances Prevent disturbances

Quality issues hit the critical level when multiple users suddenly complain that my application keeps freezing, or now the network is so infuriatingly slow again. This type of problems will often take a long time to fix, often several weeks or even months. Not infrequently you will need to consult staff from both support and IS/IT, as well as consultants with the necessary tools. All the while, your users are suffering. Lingering quality issues like these will cost your company big money. The reason such problems often take a long time to fix is that their causes are hard to pin down. The tools and methods described in steps 1 and 2 simply dont take you all the way. When facing such problems, you need to be able to also measure the quality of your computer network from the users perspective, all the way from their equipment to your server room, in order to track down the location of the fault. What matters most is that you can quickly determine whether the problems stem from your network or from somewhere else, so that you can focus your faulttracing efforts on the right spot. One system that can help you with this is Netrounds. Now, rather than having to troubleshoot and fix urgent issues with disturbances, it is of course preferable if you can detect and remedy problems before they afflict your users at all. You need work proactively to assure the quality of the IT services you provide.

Essential #4: Prevent critical disturbances: End-to-end supervision Handle critical interruptions Prevent interruptions Handle critical disturbances Prevent disturbances

Disturbance prevention can be seen as taking over where the scope of interruption prevention ends. Unfortunately, just because your equipment responds to ping and its workload looks normal, that does not guarantee that your users are able to work efficiently. The sad fact is that long delays and packet loss between the user and the application servers in your server hall are rarely detected by the tools mentioned in steps 1 and 2. To enable measurement of delay and packet loss, the customary strategy is to use a system that sends real, but non-disturbing, traffic through your computer network: for example, from points near the users to the server hall and back. In this manner you can measure and verify your network quality as it is experienced by your users. Netrounds is a tool that can assist you in your preventive work as well. Read on for a practical guide to efficient locating of faults in your IT infrastructure

4. Practical Guide to Fault-tracing in Your Network


This guide describes a number of tools and methods you can use to determine whether problems with slow applications are due to your computer network or other components in your IT infrastructure, or whether this can be ruled out. A good strategy is to start with a number of quick and simple methods as described below. If you cannot isolate the problem area with these simple methods, you should proceed to the more sophisticated tools and methods that are also covered in what follows. The scenario assumed in this guide is this: A staff member of the finance department complains that your companys salary administration application is running slowly. Quick and simple methods
Tip Spot patterns in trouble reports filed by users Description Are there any other users who have had the same network problems? If so, is there a common factor? Are these persons using the same applications? Are they communicating with the same server? Are they using the same network connection? Getting a quick overview of who else has experienced problems may give you a preliminary indication of the problem cause. However, use critical thinking. There could be a general network problem even though only the finance department is complaining. The reason might be simply that the salary application is more sensitive to communication issues than are other applications. It could of course also be that certain individuals are more sensitive than others to disruptions. Test response times with ping The ping command is usually run by a technician rather than an end-user. Ping is extensively used for both fault-tracing and continuous monitoring. In simple terms, the ping command sends a data packet to an IP address, which automatically returns the packet to the sender. A successful ping confirms that the packet has traveled to its destination and back, and gives the time taken for this round trip (round-trip time) It is possible to send a number of ping packets sequentially and thereby detect more far-reaching communication problems. Slow ping response times or lost ping packets always indicate a problem. The converse, however, is not true: good ping stats are not always a guarantee that things are in good working order.

Check the load on various nodes in your network

If feasible, it is a good idea to check the workload on the various nodes in your IT infrastructure: for example, on a router interface or on a server processor. A high load increases the risk of individual operations taking unnecessarily long to perform, or of information being lost in the communication between user and server (packet loss). These metrics are often rather crude; network elements usually report average load over 5-minute periods. An example: On a 2 Mbit/s connection, the load (data rate) is reported as 1.5 Mbit/s for the past 5 minutes. This indicates a high average load, but it is often difficult to find out whether the maximum capacity has been exceeded at some point and whether prioritized traffic has been affected. These averages alone are rarely enough to give a full picture of the situation, although they may provide some useful hints.

More advanced and in-depth methods When the network is perceived as slow but no obvious reason can be found, the above-mentioned methods generally will not fill the bill. What you need to do then is to route real test traffic, mimicking that of your users, through the network. By using the methods described below, you can track down and correct the problems easier and faster. First, to create test traffic, you need devices that generate, send, and receive data. As a very first step, you can test performance with a regular browser-based speed test such as Netrounds Speed Test. It is important to interpret the results correctly and to keep these points in mind: Using a publicly available browser-based speed test, a low bandwidth reading can be caused by problems in your own network, but just as well by problems residing in external networks through which the traffic is routed on its way to the nearest measurement point of the speed test service. In view of the above, it is appropriate to begin by placing a measurement point in the heart of your network, preferably in the server hall itself. Position this measurement point in such a way that your users can perform tests from their web browsers against your central measurement device. Netrounds Speed Test is a tool which can easily be downloaded and installed for this purpose, and which also gives you the ability to collect statistics on your network performance based on the tests made by your users.

If you have several traffic classes set up in your network (e.g. web surfing, IP telephony, video conferencing), browser-based speed tests would normally go in the web surfing class. Note that a high bandwidth value in the web surfing class does not guarantee that things will run hassle-free in the IP telephony class, where a faulty configuration may cause quality issues. Therefore, as a complement, you should place one or several measurement points closer to the users. Such distributed measurement points can be used for performance tests during fault-tracing, such as load tests for various traffic classes. By deploying both stand-alone, high-accuracy measurement points, in combination with applications installed on end-users computer, you will get an end-to-end perspective of how you services are perceived by your users. At the same time you will be able to quickly understand if your network causes problems or if they are application specific, which is a key question to being able to quickly solve the problem. Again, it is wise to choose measurement points that can also be used for recurrent measurements against the central point, so that you can accumulate a history and monitor SLA levels for your connection over extended time periods for preventive purposes. Netrounds is unique in that it, in one single tool, supports both long term monitoring as well fault-tracing tests, making it possible to delimit and pinpoint most of the disturbances that may occur in your IT infrastructure.

Read on for some useful tips, a checklist, and a summary.

Tip Let users measure performance from their web browsers

Description When users experience problems, let them use their web browsers to measure response times and bandwidth when connecting to a central measurement point in your network, or to a browser-based speed test server. This gives a first indication of whether there is a problem at hand or not.

Check whether the problem is related to link load

If you are renting, say, a 10 Mbit/s connection, you should run tests to find out if you are actually getting this capacity or if there are limitations, for example caused by malfunctioning equipment or inappropriate configuration. Do this by means of a performance test where the measurement devices send traffic to each other, utilizing the link capacities to the fullest. As such tests interfere with other traffic in your network, be sure to inform your users beforehand, or do the testing at times when the network is normally not in use.

Verify prioritizing of traffic classes

If you prioritize some services over others, you should check that the prioritizing works as intended. To do this, overload the links with traffic assigned to different priority classes, and verify that only the non-prioritized traffic is affected by packet loss and delays.

Verify that connections are transparent Verify stability of connections

It is important that the priority labeling is retained throughout your network without being overwritten or blocked. For example, various types of control traffic such as CDP must be let through. It is important that you keep measuring for an extended period of time, at least a couple of days. This is because many problems are hard to detect during brief tests. Problems with slow networks are often difficult to track down precisely because they come and go.

Investigate what traffic goes through your LAN/WAN

As a last resort it may be necessary to check what sort of traffic is going through your network and your various offices. Be aware, however, that while the measurements described in the preceding are fairly easy to interpret, packet analysis is a task that requires more detailed knowledge of how the applications work. The most widespread packet analysis tool is Wireshark, which is free to download and use.

10

5. Checklist and Summary


Along with your company culture, the number one catalyst for a smoothly operating business is a well-functioning IT environment. Therefore the most important task of your IT department is to ensure that your IT environment is stable and performs at a high standard, round the clock, every day of the year. If you feel that your current IT operation leaves room for improvement, you should go through the following checklist: If you dont have it already, deploy a customized monitoring solution that alerts you if network equipment stops responding to ping. This meets your needs at the critical level. Complement your preventive monitoring by using SNMP to retrieve fundamental data from the nodes of your infrastructure. This data should include status and utilization of network links as well as resource usage in vital network components: for example, processor load and free storage capacity. This meets your needs at the preventive level. Obtain an effective tool for monitoring quality from an end-user perspective in order to verify SLA levels and ensure top quality in your network. This allows you to follow up connection quality over time, create quality reports, and configure your network to alert you immediately when problems arise, so that you can hopefully remedy the problems before your users notice and your business suffers. This meets your needs in the realm of quality assurance. Strengthen your quality monitoring further with a tool for testing and faulttracing of more elusive network performance issues. With one measurement point in your server room and another in the office that has problems, you can quickly and efficiently detect: o Stability issues -- Problems that come and go and are difficult to capture in any other way. o Performance issues -- Problems that arise when you arent getting the capacity youre paying for, or in case of interface or duplexing problems. o Prioritization issues -- For example, when IP telephony or video conferencing fails because that traffic is not given the priority it is due.

With the above elements integrated into your business, you safeguard a smoothly functioning IT environment that supports your daily work, is cheap to operate and maintain, and helps you become a more competitive player in your field.

11

6. Where to Go from Here?


Do you have performance and quality problems in your IT infrastructure? Feel free to contact us for more information on how Netrounds can help you.

About Netrounds
Until now, fault-tracing tools have been expensive and complicated to use. Now there is Netrounds -- a cloud-based and easy-to-use option offered at a small fraction of the cost of traditional test tools. Because Netrounds is cloud-based and is offered as a subscription service, you avoid costly and risky investments while benefiting immediately from using it to quickly locate and fix problems. Read more about Netrounds at www.netrounds.com. Netrounds recently received a top rating by our customers in a customer satisfaction survey. Netrounds customers include:

The originator of Netrounds is the Swedish company Netrounds Solutions, which both develops and markets the solution.

About Netrounds Solutions


At Netrounds, we have many years of experience in testing and troubleshooting computer networks, and we have transferred much of this experience to Netrounds as it has been developed. Also included in our service is expert support at no additional cost, walking the extra mile to help you achieve a more effective, efficient and high-quality IT operation. You are very welcome to contact us. Marcus Jonsson, marcus.jonsson@netrounds.com, +46 70 666 27 33

12

Você também pode gostar