Xem mẫu
- Networks and Telecommunications: Design and Operation, Second Edition.
Martin P. Clark
Copyright © 1991, 1997 John Wiley & Sons Ltd
ISBNs: 0-471-97346-7 (Hardback); 0-470-84158-3 (Electronic)
36
Maintaining the
Network
No matter how much careful planning goes into the design of a network, and no matter how
reliable the individual components are, corrective action will always be required in some form or
another, to prevent or make good network and component failures, and maintain overall service
standards. However, attitudes towards maintenance and the organization behind it vary widely,
ranging from the ‘let it fail then fix it’ school of thought right through to ‘prevent faults at any
cost’. This chapter describes a typical maintenance regime in its philosophical, organizational and
procedural aspects.
36.1 THE OBJECTIVES OF GENERAL MAINTENANCE
As succinctly stated by ITU-T, the objective of a general maintenance organization to
is
minimize the occurrence of failures. and to ensure that in case of failure
0 the right personnel can be sent to
0 therightplacewith
0 therightequipment at
0 therighttimetoperform
0 theright corrective actions.
36.2 MAINTENANCE PHILOSOPHY
Inpursuingthese objectives, the wise networkoperator establishes a maintenance
philosophy closely linked overall
with targets for
network
quality for
and the
663
- 664 MAINTAINING THE NETWORK
proportion of time that the network is intended to be fault-free (available). In this task
he will take due account of network economics and the most likely causes of failure.
Networks fail for all sorts of reasons; common examples are
0 cable or connector damage or disturbance
0 equipment
overheating
0 electronic component failure
0 mechanical equipment jamming, or other failure
0 mechanical
wear
0 dirty (high resistance) relay or switch contacts (a diminishing problem as electro-
mechanical exchanges are withdrawn)
0 powersupplyloss
0 vandalism(e.g.topublicpayphones)
0 software
errors
0 erroneous exchange data
0 poor connections between cables or other components (e.g. dry soldered joints)
0 interference(e.g.due toelectromagneticdisturbances,recentstandards on E M C ,
electromagetic compatibility, are designed to ensure that equipment does not cause
electromagnetic disturbance and is itself not unduly sensitive to such interference,
i.e. is electromagnetically protected)
Each cause of failure has its own cure, but broadly speaking, there are three main
approaches
0 corrective maintenance
0 preventive maintenance
0 controlled maintenance
Correctivemaintenance is carriedoutafterthe failurehas been diagnosed,and it
consists of the repair or replacement of faulty components.
Preventive
maintenance is to eliminate accumulation
the of faults.Preventive
maintenance usually consists of routine testing and correction of working equipment
(as opposed to failed equipment) to prevent degradation in performance before any
failure actually occurs.
Controlled maintenanceis a more systematic approach, combining both the corrective
and preventive methods. The underlying philosophy of controlled maintenance is to
prevent network failure. This is done by using special analysis techniques to monitor
day-to-day network performance and degradation, thereby avoiding maintenance work.
- MAINTENANCE ORGANIZATION 665
The advantage of controlled maintenance is that it concentrates on areas where the
customer is likely to benefit most, and it reduces the extent of preventive maintenance
andthe complications of correctivemaintenance.When new networksare being
designed or extended, or when new capabilities are being added to existing networks,
consideration needs to be given to the maintenance philosophy andto the organization,
the maintenancefacilities, and thetest equipment thatwill support it. The best controlled
mix of both corrective and preventive philosophies depends the number and nature
on of
problems. These, in turn, depend on the overall network structure and the component
equipment types. So, in the days of widespread electromechanical switches and relays
of
when a frequent cause failure was mechanical wear, a good deal time was spent on
of
preventive type maintenance, oiling themovingparts as it were. Nowadays,when
hardware faults in modern electronic equipment are relatively rare and software faults
often take some to present themselves (the exchange operating fault-free for extended
periods between occurrences), a corrective philosophy is adopted. Faulty component
boards arecompletely replaced without even attempting a diagnosis, and software faults
are debugged as they arise.
36.3 MAINTENANCE ORGANIZATION
Real networks are in a constant state change throughout their lives. To matchtraffic
of
demand, new circuits are continually established between exchanges. Established cir-
cuits may need to be re-arranged as transmission systems are upgraded or taken down
and faulty equipment needs to be repaired, replaced or avoided by a diversion. For
optimum efficiency the organizations set up to establish these maintenance tasks, and
the tools with which they are provided, should be planned in such a way that lifetime
costs will be minimized. There is a choice, for instance, between paying more at the start
for expensive but reliable equipment, or using cheaper equipment and incurring higher
ongoing running costs. Lifetime cost analysis must include
0 initialcost of equipment
0 cost of spares and test equipment
0 ongoing running and maintenance costs
0 costs associated with periods of lost service
High wages and skills shortages in recent years have weighed the scales in favour of
using more reliable equipment and a smaller maintenance workforce. Indeed, in some
instances the field workforce has been pared to the minimum of two people, one worker
plus a stand-in to cover annual leave and periods of sickness. Some observers question
the sense of this, pointing to the fact that so complex are the devices, so computerized
the routine activities and so rare the faults, that the
field maintenance staff often do not
have the experience to cope.
For thisreasona
comprehensive headquarters maintenance
support,
technical
support or back-up organization (sometimes called second line support or third line
- 666 MAINTAINING THE NETWORK
support) is needed in addition to the direct maintenance staff, to perform the
following
functions.
0 To provide detailed equipment and maintenance documentation.
0 To provide maintenance training on new equipment document
and ‘fail-safe’
maintenance procedures (explaining the general methods be adopted, and making
to
sure that unintended disturbance to other customers is not caused by maintenance
action).
0 To develop and put in place a fault-reporting procedure.
0 To repair complicated items of equipment or resolve complex software problems.
0 To develop and procure the necessary test equipment.
0 To maintain an appropriate store of spare parts and to call for re-design of poor
equipment, taking duly into account the failure rate of each item, the number of
items in operation, and the actual repair turn around time(e.g.repairtime, or
delivery time for a part not held in stock), and calculating the service and revenue
risk if no spare part is available.
0 To developandmaintainanequipmentidentificationandinventory scheme for
tracking equipment in use and spare equipment either stock or on order (in addi-
in
tion, maintenance staff in different exchanges need be able to indicate faulty lines
to
or circuits to one another).
0 Topreparea list of contactpoints andtelephonenumbersthrough which the
maintenance staffs in different maintenance centres may communicate.
The direct maintenance workforce is usually collocated with the exchange, some staff
being switching experts while others have transport so that they can go out and deal
with exterior plant problems. The staff located within the exchange provide a ‘control
point’ for new circuit lineups, and for the initial reporting and diagnosis of faults.
The number of staff located at any exchange depends on the size, complexity and
reliability of the exchange. Not every exchange can justify its own on-site maintenance
staff,andout-stationed staffmay be postedtotheexchangeeitheronaregular
preventive maintenance schedule, or simply when there is a fault.
36.4 CENTRALIZED OPERATIONAND MAINTENANCE
A modern practice,
aimedreducing maintenance
at the workforce, is to leave
exchanges unmanned. Computer technology and extended alarms allow staff ina single
centralized operation and maintenance (CO&M)centre to monitor and control a number
of differentexchangesin real time (giving instantaneousand live control of each
exchange).Figure 36.1 illustratesatypicalcentralizedoperationandmaintenance
scheme.
- LINING UP ANALOGUE ANDANALOGUE/DIGITS
MIXED CIRCUITS 667
Remote
exchanges
*
Centralized
operation and e
maintenance centre
M a i nCe n ap u t e r
t om nce
staff
*
*
m- -
Data links to
monitor
a n d c o n t r o l the
exchanges
Figure 36.1 Centralizedoperation and maintenance
Centralized operation and maintenance (CO&M) can be introduced only when the
remoteexchanges are computer-controlled, and are designed to be capable ofself-
diagnosis of faults. A datalink back to a computer at the centralized operation and
maintenance centre allows the maintenance staff to monitor the exchange performance,
noting any problems and applying any necessary controls. UnderCO&M scheme, the
a
exchanges are designed with duplicated items of equipment, which remain idle until
they are activated electronically to take over the functiona failed item. The exchange
of
can thus continue to work at load, while a member of the maintenance team is sent
full
out to the exchange site, to repair the faulty equipment,or to replace it completely by a
circuit-board change.
36.5 LINING UP ANALOGUE AND MIXED
ANALOGUE/DIGITS CIRCUITS
The transmission links between exchanges are commissioned (or lined up) using a two-
stage method as follows. First, the lineplant itself is established in sections which are
tested and calibrated in turn, and then connected together. A number of reference
measurements are made along the entire length to check and calibrate the overall end-
to-end performance. When the line system as a whole been established, multiplexing
has
and other terminal equipment is applied to its ends to obtain the individual circuits or
groups which may be tested individually. The individual circuit testing is necessary
because, as Figure 36.3 shows, a real circuit (or group) likely to traverse a numberof
is
line systems,which may interact adversely.So although each section in isolation may be
within limits, the combination may not so. The calibration of each group and circuit
be
is thus carried out on an end-to-end basis.
- 668 MAINTAINING THE NETWORK
Figure 36.2 Maintenance workstation. The AT&T SESS telephoneexchangenowincludesa
video monitor that displays colour-coded diagrams, eachwhich indicates the status a certain
of of
part of the system.A central office technician is checking the status digital trunks.(Courtesy o
of f
AT&T)
- LINING UP ANALOGUE ANDANALOGUE/DIGITS
MIXED CIRCUITS 669
I I r---J I
I l l
- 670 MAINTAINING THE NETWORK
Figure 36.3 shows four exchangesA, B, C and D, located in transmission centres a, b,
c and d. From the diagram of Figure 36.3 we see that theexchanges are configured
inset
topologically in a fully interconnected manner. Each exchange has a circuit to every
other exchange but the total of six circuits has been achieved with the use of three line
systemsonly: a-b, b-c and b-d. Wherethetransmissioncentresarenotdirectly
interconnected by a line system, a circuit been provided by the concatenation of two
has
linesystems,witha jumper wire completing the connection across the intermediate
transmission centre. Thus for example the circuit from exchange A to exchange C uses
line systems a-b and b-c, and a jumper wire across transmission centre b.
The number and diversity of calibration measurements and adjustments necessary
during circuit line-up,and the amountof deviation allowed in the ongoing values depend
on the type of circuit (e.g. analogue or digital), and on the use to which the circuit is
being put (e.g. voice or data). High grade data circuits, for example, have more strin-
gent line conditioning requirements than simple voice grade circuits. The measurements
andadjustmentsensurethatthe circuitconformswiththetransmissionplan (see
Chapter 33). Thus on analogue and mixed analogue/digital line systems and circuits
line-up, measurements will be made of
0 overall loss in signal strength (in dB)
0 amplitudeloss/frequency attenuationdistortion
0 group delay (particularly if the circuit is to be used for data)
0 noise,crosstalk,echo, etc.
0 inter-exchangesignallingtests (if appropriate)
Various equipments, including amplfiers, equalisers, filters, and echo controllers, are
thenadjustedtobringthelineconditionswithinthesetlimits,usingmeasuring
equipment as follows
0 signal tone generators (calibrated for precise frequencies and signal strengths)
0 calibrated frequency and signal strength detectors
0 noise
meters
0 equipment for inter-exchange signalling or data protocol testing
First, the end-to-end circuit loss is determined by sending a calibrated 1020Hz (or in
purelyanaloguenetworks,800Hz)signal of aknownstrength,andmeasuringthe
received strength of this allows the circuit amplification to be adjusted accordingly.
Next, a range of different calibrated signal frequencies across the whole circuit band-
width is sent, and the received signal strengths are again measured. This allows the
frequency distortion equalizers to be adjusted.Thenthegroup delaydistortion is
correctedusing group delayequalizers.Thisequalization is particularlyimportant
for high speed modem data circuits, and it is achieved by measuring the relative phase
ofdifferentfrequenciesrelative to the1020Hz(or800Hz) signal.Followingthis,
- DE HIGH 671
psophometric noisechecks conducted,
are together tests
with of inter-exchange
signalling systems or of data protocols (e.g. X.25, frame relay or IBM’s SNA), and
finally a test call is established.
Where a pure tone signal of calibrated strength needs to be injected into the digital
part of a mixed analogue/digital connection or network, this can be done either using
an anlogue tone sender and an analogue/digital converter, or by the use of a digital
referencesequence ( D R S ) . A digitalreferencesequence is adigitalbit patterncor-
responding to a particular analogue signal frequency and strength. Such a pattern is
easy to store in computer-like memory and is a very reliable means of reproducing an
accurately calibrated signal.
36.6 HIGH GRADE DATACIRCUIT LINE-UP
In high grade datacircuits which use modems over analogue or mixed analogue/digital
plant, a number of extra line-up measurements may be necessary, as follows
m weighted
noise
m notched
noise
m impulse
noise
m phase hits and gain hits
m harmonicdisturbance (orinter-modulationnoise)
m frequencyshiftdistortion
m jitter
Weighted noise is a measure of the noise inthe middle of the channel bandwidth. Noise
frequencies this
in range are
most likely to cause modem errors. Psophometric
(European) and C-messageJilters (United States) are used to measure this type of noise.
Notchednoise is measured by applyingapurefrequencytone at oneend and
removing it with a notchJilter at the other; the remaining noise is then measured. The
notched noise itself arises from the way in which signals have been digitized or other-
wise processed over the course of the link. It is thus similar to quantization distortion,
which was discussed in Chapter 5.
Impulse noise is characterized by large ‘spikey’ waveforms and arises from unsup-
pressed power surges or mechanical switching noise. It is most common on electro-
mechanical switched networks.
Phase hits and gain hits are intermittent but only moderate and relatively short (less
than 200 ms) disturbances in the phaseor amplitude of a signal. Typically, less than 10
should be recorded in a 15 minute test period. Special test equipment is required. More
serious gain hits are called dropouts. Gain hits are most troublesome in voice use; phase
hits manifest themselves as bit errors in data signals.
- 672 MAINTAINING THE NETWORK
Harmonic disturbance may result from the intermodulation of two different signal
frequencies F1 and F2 when passed through nonlinear processing devices. New stray
signals of frequencies F1 + F2, F1 - F2, F1 + 2F2, etc., are produced. This type of
disturbance is measured by a spectrum analyzer.
Frequency shift is also measured by a spectrum analyzer. Frequency shift obviously
is
a problem for a data modemif the frequency received differs to such an extent from that
sent as to be mis-interpreted. It is most likely to occur when a carrier system or other
frequency modulating signal processing has been used.
Jitter or phase jitter arises when the timing of the pulses on incoming data signal
varies slightly, so that the pulse pattern is not quite regular. The effects of jitter can
accumulate over a number of regenerated links, and they result in received bit errors.
It canbe reduced by reading the incoming data into a store and then reading it back out
at an accurate rate, using a highly stable clock controlledby a phase-locked loop ( P L L )
circuit.
l a ) V22 bis modem - perfect 1 b ) Noisy signal-cloudy
undistorted signal - pattern of dots
appears as clean’dots’
l c ) Signal affected by phase ( d ) Signal affected by gain
hits-appears as circular hits-appears as radial
streaks streaks
Figare 36.4 Detecting analogue line disturbances using constellation diagrams
- LINING UP DIGITAL CIRCUITS 673
All of the above parameters should be checked when the line systemor circuit is first
established. Problems are likely to reflect poorly designed or poorly installed equip-
ment, and the best remedy is prevention: check the quality of work which is made,
because correction circuits are expensive and not entirely effective.
In Chapter 9 we introduced the idea of constellation diagrams for modems. We are
now in a position to illustrate their practicaluse for detecting noise, phase hitsand gain
(or amplitude) hits on analogue lines employing modems. Special test equipment may
be connected at the receiving end of the line in place of the receiving modem. The test
equipmentdisplayson an oscilloscope-likescreentheconstellation pattern of the
received data signal. Disturbances appear as shown in Figure 36.4.
As is apparent from the patterns of Figure 36.4, the problem with noise, phase and
gain hits is that if they become too great, they result in the incorrect interpretation of
the signal; the received bit pattern then includes errors.
36.7 LINING UP DIGITALCIRCUITS
Digital systems
line and their tributary bitstreams
must conform to adifferent
transmissionplanfromtheiranalogueequivalents and are therefore lined up ina
different manner. The important parameters, as we saw in Chapter 33, are
e the biterrorratio (BER)
e thenetwork synchronization
e the quantization or quantizingdistortion
In practice, the network synchronization (i.e. the jitter and clock accuracy) and the
quantization distortion are set by the design of a circuit and can be improved only by re-
design. The lining-up process can only address the questionof error rate. Theaccuracy
of synchronization depends on the use of highly stable clocking sources and inter-
exchange synchronization links, as described in Chapter Quantization distortion is a
33.
form of noise, which affects analoguevoice and data signals when they are carried over
digital media using pulse code modulation ( P C M ) and other signal processing tech-
niques (see Chapter 5). It can be reducedonly by re-designingthecircuit to avoid
multiplesignalconversions(e.g.analogue to digitalconversion, signalcompression,
etc.). In the case of digital data circuits, these should never be designed to include any
form of signal processing because digital data devices are intolerant of the high biterror
rates that arise from quantization distortion.
Standard practicefordigitalcircuitline up is to performatest of digital error
performance. This may involve a lengthy stability test over several days simply be a
or
quick-check (15 minute) test. A pseudo-random bit pattern generator (a digital signal
generator) is used to provide a test digital signal. During the test the proportion of bit
errors (the bit error ratio, or B E R ) is measured, along with the proportion error free
of
seconds ( E F S ) .Expected values are typically no more than 1 error in 10’ or 109for BER
and at least 99.5% EFS. The errors, if excessive, may have arisen as the result of any
number of different impairments. Some causes
can be eliminated easily (e.g. by
increasing the transmitted power to reduce the effect of noise). Other causes need more
radical circuit checks.
- 674 MAINTAINING THE NETWORK
Like their analogue equivalents, digital bit streams are lined-up in two stages, first
at
higher order (e.g. at 140 Mbit/s, if this is the line system bit rate)and then on an end-to-
end basis for each tributary stream (e.g. 2 Mbit/s, 1.5 Mbit/s). The secondary checking
of lower order tributary streams is necessary for the same reasons as in the comparable
analogue case exemplified by Figure 36.3.
36.8 PERFORMANCE OBJECTIVES
In recognition of the fact that practical networks can never match idealized perfor-
mance objectives, ITU-T recommendation G. 102 sets out different operating limits for
the target performance objective, the design objective, the commissioning objective (also
known as the line-up limit) and the maintenance limit.
The performance objective is the performance
ideal level for particular
the
application.
The designobjective transfersthisobjectiveintoarealizabletargetrange,within
which economically designed equipment can be expected to operate, given optimum
conditions of power supply, temperature, humidity, etc.
The line-up limit recognizes that the optimum conditions can rarely be achieved in
practice, but nonetheless sets a stringent practical range within which the equipment
must operate on first establishment. Any faults identified during line-up, which cause
thecircuit to operate outside thisrange,shouldbecleared and the circuitre-lined,
before taking it into service.
The maintenance limit is the least stringent range of operating conditions, but still
within a ‘tolerable’ range as far as the application is concerned. The limit is chosen so
that, if exceeded, a fault is considered to exist. The fault should be cleared and the
circuit re-lined-up. As Figure 36.5 shows, each of the last three performance ranges
described is a slight relaxation of its predecessor, so giving targets which are achievable
in practice. Thus even if a slight degradation in quality occurs after circuit line-up, the
circuit still operates within the maintenance limit operating range.
range Unusable Measured
parameter value
\\\\\\\\\\\\\\\ __---- ’
I
- MAINTENANCE POINTS’ 675
A steady deterioration in performance items inservice can be expectedas the result
of
of operational use, occasional overload and general ageing as well as genuine faults.
Often the degradation is slow enough to be imperceptible and not to warrant routine
checks. For thisreason,it is commonpracticeto use automaticalarmstoalert
maintenance staff to the need forcorrectiveaction, so that faults can promptly be
eliminated and the equipment returned to its line-up operating range. Two levels of
alarmare possible,one at the maintenancelimit level, andoneatthe ‘unusable’
performance level. Faults on the latter level clearly need more urgent attention than
those on the former.
36.9 MAINTENANCE ‘ACCESS POINTS’
For the purpose of network lining-up and subsequent maintenance, it is necessary to
provide a number of test access points for maintenance. Ideally, test access points are
provided at a number of different points in a network to enable easy localization of
faults and initial segment-by-segment circuit alignment. A very large number of test
points, increases the overall networkand equipment costs, and may in itself be a source
ofunreliability.Figure 36.6 showspossibletest access arrangementsforasimple
network of two exchanges (one analogue and one digital) interconnected by a digital
transmissionlink. Note howtest access pointshavebeenprovidedbetweenallthe
major items of equipment. This allows the causes any faultsor line-up problems to be
of
quickly narrowed down.
As is typically the case, both exchangesinFigure 36.6 havebuilt-intest access
equipment for diagnosing and correcting faults within the exchange itself. In addition,
Transmission
Analogue Transmission Digital
exchange cent re centre exchange
Switch
Digital matrix
transmission line
Cross connict
frame
TAE - testaccessequipment ( b u i l t intotheexchange)
A/D - analogue to d i g i t a l conversion equipment
@ - test
access
point
Figure 36.6 Testaccesspoints. TAE, testaccessequipment(built into theexchange); A/D,
analogue to digital conversion equipment; T, test access point
- 676 MAINTAINING THE NETWORK
five external test access points enable the exchange’s external conditions to be verified
and faults in other equipment along the link to be localized, diagnosed and corrected.
36.10 LOCALIZING NETWORK FAULTS
In anefficient maintenance organization, faults are discovered before the customer finds
them. In this way some of the faults can be correctedeven before the customer is aware
of them. Means for fault detection include
0 equipment alarms
0 equipmentroutinetesting(so-called routining)
0 live networkmonitoring
0 customerfaultreporting
As before, a combination of techniques is usually appropriate. The four are
discussed in
turn.Equipment alarms are used to indicate an abnormal state (i.e. when the
maintenance limit has been exceeded) in an equipment or an environment of crucial
importance. environment
(An fault might be too hightemperature:
a computer
equipment is prone to failure under such conditions.) Alarms are usually ranked in
order of importance, and areusually generated by continuous monitoring of some type
of ‘heartbeat’ signal. When the heartbeat stops, it is time for action. For example, the
pilot signal on analogue transmission systems is a low level and single tone frequency
outside the normal bandwidth range which is continuously transmitted. Absence of the
pilot signal at the receiving end is interpreted as a transmission link failure, and it is
usually notified to maintenance staff as an audible and/or visible alarm).
Equipment routine testing, on the other hand, is suited to any equipment which is
fault- or wear-prone(e.g. mechanical equipment). It can be carried out manually or by
automatic test equipment, the choice depending on the number and type of devices,
their complexity, and the perioridicity of tests. For routine network testing, ITU-T has
described a number of automatic transmission, measuring and signalling test equipments
( A T M E ) in its 0-series of recommendations.
Live networkmonitoring helps to pick up transientfaultsand thenavertthem;
network overload can thus be foreseen, and its adverse effects corrected, as we shall see
in Chapter 37. Usefulmeans of monitoring live networkperformanceincludecall
completion rate and congestion statistics, and quality sampling of a small number of
connections. For example, a sudden flood of calls or a rapid dropin the call completion
rate may indicate onset
the of congestioncaused by network failure or quality
degradation.
Not all faults, however, can be detected other than by the user, and so in the last
resort an efficient customer or user reporting
fault and handling procedure is
paramount.
After detecting a fault, the next step is to localize it and diagnose its nature, all of
which may be directly apparent or correctly reported, but usually more diagnosis is
required. For example, when a major transmission system fails, a whole host alarms of
- LOCALIZING 677
may go off in the maintenance centre,indicating failure only
not of the main
transmission system, but also of each of the derived circuits or tributaries. The alarms
must be cleared in priority order. Correcting the main transmission fault may clear
first
the alarms on each individual tributary, but will not doso if lesser problems remain on
particular tributaries. Thus, in our example, maintenance staff should first attempt to
localize the main transmission failure to a given section of the link. This they can do by
making use of the various test access points available to them, andby liaison with staff
in other maintenance centres through which the link passes. The example is illustrated
in Figure 36.7, where the failure in one of the channels in section A-B and the link
A-B-C has been detected by maintenance staff in station C. The absence of the main
transmissionpilot signal inthe receive channel at stationChas raised analarm.
Telephone liaison with the staff in station B reveals that though this alarm actually
notifies one of the loss of a whole analogue or digital group of circuits, it is evident that
this is the likely cause of the problemon the particular A-C channel. If instead the fault
had lain on the transmission link between stations B and C, the alarm in station B
would not have gone off. Furthermore, if the fault had been in station C’s transient
channel, the alarms would have gone off at one or both of stations A and B.
Some faults in switched networks, can be extremely difficult to trace. A common
problem that is encountered when trying to locate faults in switched networks is trying
to trace which precise links and exchanges were traversed at the time of the fault. It is
like knowing that your friend is driving between London and Birmingham but not
knowing which route he took. You know he has broken down, but where do you look
first? Faults of this nature can persist for many months and arefinally cleared, either as
the result of routine maintenance, or because routine testing of one of the individual
network components revealed the fault.
Not all switched network faults are difficult to trace. For example, a phenomenon
known to all experienced maintenance staff is the occurrence of a killer trunk. Imagine
that the first choice circuit on the route between themainLondon-to-Birmingham
telephone routehas become faulty.Imaginealso thatthefault resultsin any call
attempting to use the circuit being immediately failed and released. Affected callers
receive nothing at all! N o so bad, you might think: one circuit faulty in hundreds will
not make much difference. However, the fault has an incredibly wide-ranging effect,
nearly ‘killing’ all the traffic on the route. The reason is that, being the first choice
circuit, all calls will attempt to seize it before trying other circuits. Of course, any call
that attempts to do so is immediately failed and released. So nearly all calls fail! Hence
Alarm ‘Pilot’raises
raised in
statlon C
V
0 0
Station A ~ ~ Station B~ ~ S t a t~o n
i C ~
A
link failure
Figure 36.7 Transmissionfailureand alarm
- 678 MAINTAINING THE NETWORK
the name killer trunk. To the experienced maintenance man, however, the killer trunk
phenomenon shows up like a sore thumb, because traffic records reveal
0 a huge number of short holding time calls
0 a very low call success (i.e. answering) rate
0 very littleoveralltrafficinerlangs
0 virtually no activity on manycircuits
The condition can be cleared in the first instance simply by busying out the circuit (i.e.
making it unavailable for use). A repair can then be carried out. The worst effects of a
killer trunk can also be prevented by making sure that circuits within the route are
chosen randomly, not always scanned in the same order.
Before leaving the subjectof network test points and fault localization procedures we
should also mention the use of loopback techniques for detecting faults within the tail
part of a circuit between the network operator’s site and the end user’s terminal. This
part of the circuit can be the most inaccessible (say in unmanned premises), may be
or it
subject to quite adverse conditions in congested ducts, strung between telegraph poles
or even draped across desks. It is just as prone to failure as any other section, and the
network operator needs to be able to localize faults here as anywhere else, ideally being
able to distinguishsuchfaultsfromfaultsinthe user’s terminalequipment,and
preferably without even visiting the user’s location. What makesthis possible is a circuit
loopback or a responding equipment.
A loopback simply loops the user’s receive channel directly to the transmit channel,so
enabling the network operator to send a test signal both ways along the tail and to
confirm its correct return. Loopbacks may be either manually operated by the user
(on request from the network operator) or invoked by the network operator using a
Normal
connections
User‘s receive channel
User’s
‘ l e s t loopback‘ terminal
Figure 36.8 Testing a connection tail using a loopback
- HARDWARE FAULTS 679
2713 Hz tone orequivalent signal to switch a remotely controlled equipment within the
line termination socketat the user’s premises. As we see from Figure 36.8, the loopback
enables the faults to be localized beyond doubt as being either on the line or in the
user’s terminal equipment, all without a visit to the user’s premises.
Loopbacktechniques are widely used onpoint-to-pointdata connections,where
users are intolerant of any significant downtime.PTOs provide them on thetails of their
private leased circuits, and other data network providers put them on their equipment
circuits. Loopbacks for modems are described in ITU-T Recommendation V.54.
Another way of testing lines to remote unmanned locations is to use responding
equipment which answers calls made to the location, giving standard test signals and
other responses in accordance with commands.
36.11 HARDWARE FAULTS
Historically most
the common faults come
have frommechanicalequipment or
hardware. Some maintenanceorganizations are well prepared deal such
to with
component failures, and general wear and tear, by preventive action or by repair, but it
must be recognized that the semiconductor age has brought with it much greater levels
of reliability and complexity and much greater risk associated with failure. All this has
forced change on the handling of software faults.
Nowadays, it is common to design electronic equipment to be tolerant of faults;
computer processors and network exchanges have their crucial parts duplicated, the
two halves sharing the traffic load and continually monitoring one another for faults.
If a fault is detected in either half, then itis shut down and the other half takes over the
full traffic load until the faulty half is repaired by maintenance staff, in response to the
alarm.
Self-diagnosis software within the computer or exchange is used to localize the fault
to a particular circuit board. Correction achieved merely by sliding the board out and
is
replacing it with a new one. The processor or exchange can then be restored to normal
working, and thecircuit board can be repaired at leisure, sent back to the manufacturer
or thrown away.
36.12 SOFTWARE FAULTS
Errors in computer programs (or software) are becoming the most common cause of
faultsintelecommunicationsequipment. Newly developedsoftware is oftenlittered
witherrors or bugs, of anunpredictablenaturewhichcometolight in operation,
sometimes years later, so that it is difficult to build experience. In addition, because the
background of most telecommunications technicians is not in computer programming,
the skills for rectification of faults are extremely rare.
Software bugs may take some time to find, and they require the expert attention of
specialized maintenance support staff andthe originaldesigner,beingoutsidethe
competence of normal direct maintenance staff.
- 680 MAINTAINING THE NETWORK
So, you may ask, how do we keep the service on the exchange running while we sort
out the problem? The answer is by initiating a manual or automatic restart procedure
which re-boots (resets) the whole exchange to allow it to carry on working. This is done
by re-loading historical data and software, overwriting any corruptionswhich may have
crept in as a result of the software bug. The software and data resident in the exchange
processor at the time of failure is downloaded for a later fault diagnosis, which attempts
to trace back the processing events leading up to the problem. Meanwhile the exchange
is likely to run quite normally for some time, until a similar sequence events conspires
of
against it again.
Localized restarts of minor functional units arelikely to be automatically invoked by
the exchange itself to minimize the off-air time, but more (e.g. whole exchange) restarts
may require manual initiation, to prevent an automatic restart from disrupting a large
number of unaffected calls at an inopportune moment.
Thecomputerization of manymaintenancetasksmayhavegreatlyreducedthe
human numbers required, but the extra skills and knowledge demanded of those staff
that remain make them a prized commodity!
36.13 CHANGE CONTROLPROCEDURE FOR
HARDWARE AND SOFTWARE
As telecommunications equipment has become increasingly computerized over the last
few years, so has the importance of effective change control procedures for the various
hardware and software releases.
It is normal nowadays equipment
for manufacturers further
to develop
their
hardware and software in one year steps, offering new hardware and software releases
at least once a year, if not more often. Sometimes these new releases resolve previous
functional or operating problems,
sometimes bring them
they with new service
capabilities. Nearly always, even for those releases which are backward-compatible with
previous releases, is some sort of system procedure necessary. New releases may be
backward-compatible only with the immediately preceding release, so that any older
releases of the softwareor hardware may need tobe updated before the new release can
be introduced. On other occasions,someform of network,equipmentortopology
configuration change may be necessary before the release is installed and activated.
new
Theact of installing the new release must a
becarefullyplanned and executed
procedure.
The first step in introducing a new hardware or software release to the network
should be the verification of its correct functioning and good quality. Here an ofline
(i.e. non-live) test network is helpful, not only in checking the correct functioning of
new servicecapabilitiesbutalso in the regressiontesting offunctionsalready used
extensively within the network (which were already available in a previous release, but
may not operate in quite the sameway in the new release). The test network also serves
to practice the installation of the new release, refining the most appropriate order of
steps to be taken to minimizeservicedisturbance to customers when the release is
installed in the live network.
- CHANGE CONTROL PROCEDURE FOR HARDWARE AND SOFTWARE 681
Having installed anew release of hardware or software itis very important tokeep an
inventory of the hardware and software level installed in each of the individual nodes
and equipments making up a network, so that further release updates can be correctly
administered and installed. Ideally, each of the nodes should be updated to run at the
mostrecent level of hardwareand software.Thismuchreducestheproblemsof
different software and hardware vintages having to interoperate with one another, and
usually also guarantees better support from the manufacturer should a problem arise.
(Thespecialistsalwaystend to be most acquainted with the latest version, tending
slowly to forget the idiosyncrasies of older versions). Unfortunately, practical reality
does not usually allow this for various reasons
0 the equipment manufacturers’ pricing scheme may charge software according to the
number of nodes in which itis installed; if the service benefit is only warranted in a
few nodes it may be uneconomic to install it in all the nodes
0 a new software release may demand hardware upgrades which are not affordable
0 services may operate ina
different and
(froma
particular user-perspective)
unacceptable way in a new release
Experience with software demonstrates that each new release and even each new patch
(a software correction intended to eliminate a problem or software error) should be
treated with caution. You cannot treat new software as ‘only for the good’; it can also
bring problems.
nguon tai.lieu . vn