Xem mẫu

  1. Developing a Troubleshooting Checklist There is an old saying that when you practice what you need to do in the time of a crisis, when the crisis occurs the reaction tends to be automatic. When the firewall is down is not the time to try to figure out what you should be looking at to resolve the problem. Instead, develop a troubleshooting checklist in advance. The reason is simple: There will already be enough stress and confusion as a result of the failure; there is no need to increase either by not having a plan. Your troubleshooting checklist is that plan. Obviously, you cannot plan for every failure that will occur, but you can put together a strategy that, if executed properly, increases the likelihood of being able to isolate the problem more rapidly. The primary objective of the troubleshooting checklist is to provide a methodical and logical approach to troubleshoot the problem. After all, computer systems (including firewalls) are binary devices, they are on or off. The logic is simple, and the devices always do exactly what they are supposed toeven when they fail. A troubleshooting checklist should guide you through that logical troubleshooting process. I often use an analogy of eating an elephant when I talk about troubleshooting. Trying to eat an elephant introduces a big, big problem. If you try to sit down and eat the elephant all at once, you are going to quickly find yourself overwhelmed with the task at hand. Troubleshooting is no different. If you try to troubleshoot the entire problem all at once, you are going to quickly find yourself overwhelmed with the task at hand. However, instead of trying to deal with the whole elephant, if you chop it into smaller, easier-to-manage steak-sized pieces, you will find the task of eating the elephant more manageable. Troubleshooting is no different, and after you have developed a checklist of methodical and logical approaches to troubleshooting a problem, a secondary objective of a troubleshooting checklist is to use the results obtained by following the checklist to narrow down the potential causes of whatever failure is occurring. Keeping in mind that every firewall, environment, and problem is unique, the following represent a good baseline troubleshooting checklist: Step 1. Verify the problem reported. Step 2. Test connectivity. Step 3. Physically check the firewall. Step 4. Check for recent changes. Step 5. Check the firewall logs for errors.
  2. Step 6. Verify the firewall configuration. Step 7. Verify the firewall ruleset. Step 8. Verify that any dependent, non-firewall-specific systems are not the culprit. Step 9. Monitor the network traffic. Step1: Verify the Problem Reported One of the most overlooked steps in troubleshooting is to actually verify that the problem that was reported is occurring as it was reported. Far too often, people report what they suspect the problem is without being for certain that the problem is indeed related to the firewall. I have lost count of how many times I have heard "I cannot access this server, the firewall must be down" only to discover that the server itself was down. So before you begin the actual troubleshooting process, ensure that the problem has been reported accurately and that you understand and if possible can reproduce or see the problem as it is occurring. The old saying "To know where you are going, you need to know where you are at" holds true in troubleshooting. Before you can troubleshoot a problem, you need to make sure you know what the problem is. Another aspect of verifying the problem is to make sure you treat the problem, not the symptoms. This is similar to treating a medical patient. If a person comes in with a fever and all you do is treat the fever (the symptom), you have done nothing to fix the problem (the illness causing the fever). Accordingly, when the problem is reported, try to distinguish between the symptoms of the problem (which are normally what is reported) and the problem itself. The reason for this is simple. If all you do is treat the symptoms, you may eliminate the cause for the problem being reported, but you have not fixed the problem itself, and a good chance exists that it will reappear at some point in the future. This is particularly true when it comes to dealing with performance-related issues. It is easy to lose sight of the problem, treat the symptom, and move on without ever addressing the root cause of the performance problem. Step2: Test Connectivity In the realm of networking and firewalls, one of the most important and first questions to ask is this: Is the device up? This is where testing connectivity comes into play. Although this step is not applicable to every situation, it is usually a good idea to try to connect to the firewall or system protected by the firewall just to make sure it is up. There are a number of ways to do this. Using Ping to Test Connectivity
  3. The de facto standard method of testing connectivity is to send a ping to the target host. There are a couple of ways that this can be done that will provide additional information based on the response. To help with understanding the process and the interaction of each step, see the connectivity testing flowchart in Figure 13-1. Figure 13-1. Connectivity Testing Flowchart
  4. The first step is to attempt to ping the target host by its host name. If this succeeds, it validates that everything from name resolution to physical delivery of the data is functioning properly. If this is not successful, the next step is to attempt to ping the target host by its IP address. This eliminates name resolution as part of the problem. If this succeeds, the problem is likely going to be related to name resolution (either Domain Name System [DNS] resolution or NetBIOS name resolution). Perhaps the DNS server is down or the target host name is not known. If this does not succeed, it is possible that the target host is inaccessible for some reason (regardless, however, the problem warrants more attention). If the target host is remote, the next step is to attempt to ping the default gateway of the source machine as well as another host on the same network as the remote machine (it is a good idea to use the remote machines default gateway as the destination). If you can physically access the target machine, repeat this process in the other direction. These steps validate that both hosts are able to communicate with their local routers as well as validating that you can reach something on the remote network. If they cannot, a good chance exists that the problem exists between the host and its router. It is possible that there is an invalid Address Resolution Protocol (ARP) entry on either the host or the router. If the hosts can successfully ping their respective routers, and you are unable to ping another host on the remote network, the next step is to perform a traceroute from the source to the target host. This will enable you to determine the approximate location where the problem is occurring. In a complex routed environment, it is generally a good idea to have some baseline results of a functional traceroute because the traceroute is typically unable to provide the IP address of the failing hop. Only by knowing what the next hop from the last successful hop is can you have an idea of what specific router might be the cause. Testing Connectivity Without Using Ping One thing to keep in mind when testing connectivity is that many firewalls, by design, do not allow ICMP traffic to traverse the firewall, and thus render the use of ICMP to test connectivity worthless. You have a couple of options in this event. For one, you can permit the ICMP traffic for the purposes of troubleshooting the problem and then disable it again when you are finished. Another option is to use another protocol to determine whether the remote system is responding at all. For example, just telnetting to many TCP ports will either confirm or deny whether a remote host is accessible, as shown in Example 13-1. Example 13-1. Telnetting to TCP port 80 to Test Connectivity
  5. C:\Documents and Settings\wnoonan>telnet web server 80 GET /HTTP/1.0 HTTP/1.1 200 OK Content-Length: 2795 Content-Type: text/html Content-Location: Last-Modified: Tue, 23 Nov 2004 05:23:47 GMT Accept-Ranges: bytes ETag: "f9fcf19b1cd1c41:336" Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Date: Thu, 02 Mar 2006 05:29:53 GMT Connection: close Connection to host lost.