- By just telnetting to TCP port 80 and typing GET / HTTP/1.0 and then pressing Enter a
few times, I can retrieve the default web page for the server, which at least verifies that
the target host is properly connected to and communicating with the network and at best
will tell me exactly what web server software is being run as shown on the 5th line from
end, "Server: Microsoft-IIS/6.0.
Step3: Physically Check the Firewall
One of the most common failures on a network is a physical failure. From bumped power
cords to incorrectly seated network cables, you can often quickly identify and remedy the
problem by paying a visit to the physical machine. In addition, many network devices,
including firewalls, provide visual indicators regarding the status of the system. For
example, if you do not see a link light on an interface, that is usually a good indicator that
the network cable is not plugged in.
In some cases (for example, remote firewalls), it is just not feasible to check the firewall
yourself. If you have a trusted person at that remote site, however, you can ask that
person to check on the firewall on your behalf. As firewall administrators, many of us
have gotten used to being able to do most of our work remotely from our desk. As much
of an annoyance as it may be to have to walk over to where the firewall is to check on it,
that pales in comparison to spending time trying to troubleshoot a connectivity problem
only to find that someone bumped the power cord on the firewall.
Step4: Check for Recent Changes
Recent changes are not always responsible for problems that occur, but they should
always be examined as a potential cause of the problems. The reason for this is simple:
Today's networks are so complex that it is difficult to ensure that a change does not cause
a problem for a dependent system. Consequentially, it is critical that you have a means of
tracking and monitoring the changes that are made in your environment so that you have
something that you can refer back to.
Good change control is more than just "busy work." It provides a methodical means of
answering the questions of who, what, and when:
• Who made recent changes At the most simplistic, this gives you the name of who
to check with regarding the changes to determine whether they can provide insight
into the problem.
• What were the changes that were made This is the most important information that
your change-control process contains. This information enables you to look at
what was changed to make a decision as to whether it looks like the changes could
be responsible for the problems. For example, if someone updated the SNMP
settings but the problem appears to be with traffic being blocked, a good chance
- exists that the changes that were made are irrelevant for the problem that is
• When were the changes made Changes that were made days or weeks ago
probably are not responsible for the problems of today. Similarly, however, if the
changes were made an hour ago, and the problem showed up an hour ago, it is
probably worth investigating the changes in more detail.
It is important to view recent changes as a culprit for problems with a skeptical eye,
however. Before spending time undoing the changes, examine the change in the context
of the problem and make sure that it makes sense for the changes that were made to be a
cause of the problem. For example, one time I watched a company roll back a series of
virus Digital Audio Tape (DAT) files because they were the last change made on the
network before authentication errors started occurring. Now, anyone who knows anything
about DAT updates knows that they have pretty much nothing to do with authentication,
and this case was no different. When it was all said and done, the DAT updates were
rolled back and the problem still existed, but the company lost hours of time that could
have been spent fixing the problem. It was subsequently discovered that a domain
controller in error was causing the problems. The point is, make sure that the changes
appear to be relevant before devoting full attention to them. Just because there were
recent changes does not mean that they are responsible for the problem. This is
particularly true with firewalls, where it seems like if a change has been made to a
firewall within six months of a problem occurring, someone will immediately question
whether the firewall is the problemeven if the problem traffic in question never goes
through the firewall.
Step 5: Check the Firewall Logs for Errors
As you saw in Chapter 12, "What Is My Firewall Telling Me?," a wealth of information
is available in most firewalls logs and logging systems. Therefore, always review your
firewall logs as a routine troubleshooting step. To assist in using the logs as a
troubleshooting tool, you can increase the level of logging detail, perhaps changing to
informational or even debugging level or selecting to log specific error messages to help
isolate the issue. When examining the logs, pay particular attention to the following types
• Look for state errors State errors can be indicators of problems with the firewall
translation tables (for example, if the Cisco Secure PIX Firewall has an incorrectly
configured static translation value).
• Look for denied traffic Denied traffic is the classic indicator of an incorrectly
configured ruleset. Although virtually all firewalls include an implicit deny
statement at the end of the firewall ruleset, to assist in troubleshooting it can be
helpful to include an explicit deny and log statement to ensure that the denied
traffic is logged accordingly.
- • Look for configuration errors Often configuration errors will be reported in the
firewall logs as error events, allowing you to rapidly identify a configuration error
without needing to review the configuration line by line. A good example of this
might be speed and duplex mismatch errors, which can cause the firewall to not be
able to make a reliable network connection.
• Look for hardware errors Event logs are one of the best sources for discovering
hardware-related errors because most firewall vendors log hardware error events
in the firewall logs.
Step 6: Verify the Firewall Configuration
There are two elements to verifying the firewall configuration. The first is to compare the
current configuration to a known good configuration. The second is to verify that the
firewall configuration is accurate with no typos or other errors.
Every time that the firewall configuration is changed (in addition to the first time the
firewall is configured), a copy of the new configuration should be saved for archival
purposes. This archive represents the last known working configuration. In the event that
the firewall is changed, having this archive allows you to compare the current
configuration to the archive in an attempt to identify whether any changes have been
made to the configuration. If there have been, you can further investigate the changes to
determine whether the changes are responsible for the problems that are occurring.
Perhaps the most common source of problems with firewalls, however, comes from
simple misconfigurations of the firewall. It is too easy to mistype a line, click the wrong
element in a graphical user interface (GUI), or just apply the wrong command to the
firewall, thus causing a problem on the network that must be troubleshot. This is
particularly true when it comes to troubleshooting the firewall ruleset. It is easy to enter
the wrong transport protocol (TCP when you meant UDP), IP address, or port number
and thus cause the problem. A great example of this occurred when Cisco released the
security advisory "Cisco IOS Interface Blocked by IPv4 Packets." As a workaround, it
was recommended that, among other things, protocol 53 be blocked. Unfortunately, so
many network administrators see "53" and automatically assume DNS (TCP and UDP
ports 53), which resulted in folks implementing rulesets to block TCP and UDP port 53
(thus causing DNS traffic to stop being passed) instead of protocol 53, which is related to
Cisco IPv4 Packet Processing Denial of Service (SWIPE).
Step 7: Verify the Firewall Ruleset
As mentioned in the previous section, the firewall ruleset deserves the most scrutiny of
anything regarding a firewall during the troubleshooting process. After all, in most cases
the firewall exists solely to filter traffic in accordance with the ruleset, which means that
- if there is a mistake in the ruleset it will almost certainly manifest itself as a problem on
The most common ruleset error is a simple typo. For this reason, I like having someone
validate the ruleset other than the person making the changes. The reason for this is
simple: The person making the changes generally knows what the changes should be and
is more apt to read what he or she thinks the ruleset is supposed to contain, not what the
ruleset actually contains. Putting a fresh set of eyes on the ruleset increases the odds that
someone will notice that someone inadvertently configured the rule for TCP rather than
UDP and so on.
Another common error with rulesets is the processing order of the ruleset. You need to
understand in what order your firewall processes the ruleset and then verify that you do
not have a rule out of order which is causing the problem. For example, if the rules are
processed top down until a match is made and you have a rule that denies traffic before a
rule that permits traffic, the firewall is going to process the deny and then exit the ruleset
because it found a match, never making it to the line that permits the traffic in question.
Step 8: Verify That Any Dependent, Non-Firewall-Specific Systems Are Not the
Something else to consider in troubleshooting are the dependent services and systems that
are not firewall specific or for which the firewall administrator might not be responsible.
This includes the systems that are being protected by the firewall.
Common services to examine are name resolution processes such as DNS and WINS.
Many times, someone will attempt to access a resource by name through the firewall and
when the request fails assume that the firewall is the problem. However, if name
resolution is not working properly, the user may not be able to resolve the name of the
resource requested to an IP address, which is the cause of the connection failure.
Another common source of dependent problems are the systems that provide services to
users through the firewall, such as web servers. These servers are frequently managed by
a completely separate team that may or may not communicate the status of the servers
with the firewall administrators. Therefore, the server administrators may take systems
down for maintenance and so on without informing the firewall team. When a user
attempts to access the resource, the request naturally failsnot because of the firewall but
because the server behind the firewall providing the actual service is not online.
External authentication servers such as RADIUS, TACACS+, and Microsoft Windows
Domain Controllers can also be a source of problems. For example, if the access to a
protected resource behind the firewall requires external authentication and the firewall
cannot communicate with the authentication server, it may appear that the firewall is
- blocking traffic (and in a manner of speaking, it is), but the real problem is not the
firewall but a failure of the authentication server.
Step 9: Monitor the Network Traffic
When all else has failed and you are left scratching your head regarding what the problem
may be, it is a good time to monitor the actual network traffic and examine precisely how
the systems are attempting to communicate to and through the firewall. Doing so can help
to identify communications problems that may or may not have shown up in the firewall
event logs or may have shown up in the firewall event logs but not have provided enough
information to determine a course of action to correct the problem.
As mentioned previously in this book, monitoring the network traffic with something like
Ethereal, allowing you to view the raw packets and communications between hosts, is
much like having a Rosetta stone to help decipher the network languages and
communications processes that hosts are using to talk to each other. For example, a
common ruleset error that people implement is to open TCP port 20 to their FTP servers
because it has been commonly reported that FTP servers use both TCP port 20 and 21 for
communications. Although this is true, most FTP clients and servers can communicate
solely using TCP port 21, which can be validated by monitoring the traffic between the
client and server. Having access to this kind of information will assist you in identifying
and troubleshooting problems that do not exhibit symptoms anywhere else, be it in the
firewall logs, configuration, or firewall ruleset.