Xem mẫu
- Networks and Telecommunications: Design and Operation, Second Edition.
Martin P. Clark
Copyright © 1991, 1997 John Wiley & Sons Ltd
ISBNs: 0-471-97346-7 (Hardback); 0-470-84158-3 (Electronic)
37
Containing Network
Overload
Network designers like think that their traffic models ideally stable, andthat their forecasts
to are
of future traffic will never be wrong.In real life, however, such assumptions are unwise, because
any one of a number of problems may arise, resulting in network overload, and so congestion.
The forecast may underestimate demand; there may be a short period of extraordinarily high
pressure (for example at New Year, Christmas, or any public holiday, or following a natural
disaster); or there may be a network link switch (exchange) failure. Monitoring controlling
or and
the network to detect and avoid network congestion and the resulting service degradation is a
difficult task, but an important one. This chapter discusses these new‘networkmanagement’
methods in more detail.
37.1 THE EFFECT OF CONGESTION
Congestion in a telecommunications network manifests itself to customers who are
attempting to make calls as a ‘networkbusy’ tone, or as a delay computer or data net-
in
work response. It annoys the customer;even worse, the congestion can rapidly increase
of
as the result customer or equipment repeat attempts. (Repeat attempts further call
are
attempts, made in vainby customers hoping to make a quick connection by immediate
re-dialling.)Thesefurther call attempts onlyexacerbatetheproblem,becausethey
greatly increase the loading on exchange equipment (e.g. switch processors, number
stores, senders and receivers), and they add to the overall volume. The congestion
traffic
can then spiral out of control. The networks most at risk from congestion are those
which have been kept to minimum circuit numbers in order to keep costs down. Also
very much at risk are those networks that employ a high proportion of multi-link
overflow routings, because congestion on one link this type of network rapidly affects
in
other routes, with a consequent ‘dominoeffect’).
As aresult of practicalstudies, teletraffic expertshaveproducedmodels which
show that under overload conditions the effective throughput of a network can actually
fall.Thismakestheoverloadworse,furtherreducing throughput,asFigure 37.1
demonstrates.
683
- 684 OVERLOAD CONTAINING NETWORK
I
Offered traffic Load ( e r l a n g s )
Figure 37.1 The effect of congestion
Spotting the onset of congestion and taking early appropriate action is crucial to
maintaining control of the network. Only by careful control can customer annoyance
and unnecessary call failure be minimized.
37.2 NETWORK MONITORING
The most common parameter used to measure network congestion in a circuit-switched
network (such as a telex or telephony network) is the trafic load (usually measured in
Erlangs, the number of circuits in use) over a period of time. In a packet network, the
length of the delay queue (i.e. the number of packets awaiting transmission or the
percentage trunk loading) can used instead, aswe saw in the formulaeof Chapter 30.
be
To serve the purposes of network management, these parameters need to be monitored
frequently, so giving a human network manager a real-time perceptionofnetwork
performance. Other measurements, such as the lost call count (peg count) or machine
overload information may also be available, helping quickly to diagnose congestion.
However, in all cases, care must be taken when interpreting measurements taken over
too short a periodbecause short duration measurementsor a very small sample of calls
can be statistically deceptive. Traffic measurements are not reliable when taken over a
time period much shorter than the average call holding time.
Traffic and network performance information nowadays usually comes from com-
puters. Daily, weekly, and monthly usage records can be calculated by computer post-
processing of information. The post-processing may only be made well after the day
when the traffic itself was recorded from information stored on magnetic tape or disk.
Such information is clearly of no advantage in recommending immediate alleviation
action. The experience of many network operators is that network management mon-
itoring information can be obtained only by setting up a completely distinct computer
network management system.
- NETWORK MONITORING 685
The monitoring part of a network management system relies on the processing of
status information (i.e. information about the current state of the network and its
components), output in computer format by the switches or stored program control
( S P C ) exchanges. Commonly this information comprises alarms and other fault mes-
sages, but in addition may also include an itemized call record for each call made
through the exchange. Call records are merely computer messages from the exchange to
the network management system, giving all the details of the call (i.e. number dialled,
start time, duration, whether successful or not, etc.).
From the collation of information relating to all current andvery recent calls, a true
picture of the current network state may be calculated by the network management
system, and is displayed in some way to a human network manager. Most importantly,
the load oftraffic and the degreeof congestion on each individual route into and out of
the exchange can be shown.
If the network manager, in watching this information (or having it alarmed to his or
her attention), notices that the congestion level on a particular route, link or exchange
has exceeded a pre-determined threshold or that some other problem has arisen, he or
she needs to evaluate the cause of the problem and take corrective action as fast as
possible. A quick and correct diagnosis enables the manager to adopt the best available
solution.
It is worth reflecting here that, as in road traffic jams and human diseases,the cause
of a telecommunication network problem and its symptoms may not both be centred on
the same place or region. Causesmay lurk a long way from the effect. For illustration,
consider the example shown in Figure 37.2.
As a result of failure on the link that connects the tandem exchange with exchange
T
A
C, there is growing congestion on the link between exchanges and T. The congestion
is caused by customers at exchange A persisting in vain attempts to make new calls to
exchange C . Therepeatattemptsareresulting in unusuallyhigh traffic between
exchanges A and T, thereby also affecting other customers A wishing to connect to
at B.
In a very sophisticatednetworkmonitoringscheme,thenetworkmanageraimsto
achieve an instant indication and diagnosis of major problems. This may be easy, for
example, as the resultof an alarm. In Figure 37.2, the T-C link failure may register an
alarm, which will instantly tell the manager the cause of network congestion. In prac-
tice, however, real-time network monitoring facilities may only be available at alimited
B
Symptom-
congestion
A T
\ Cause
Link f o i l u h
Figure 37.2 Example of symptom and cause of congestion
- 686 CONTAINING NETWORK OVERLOAD
number of nodes or withingiven sub-network areas corresponding to thedata network
technology of a particular supplier, so that the network manager may have only an
incomplete knowledge of the network status as a whole. In this case instant diagnosis
and resolution of problems will require some guesswork. If in Figure 37.2, exchange A
is the only one with network management capabilities (perhaps because it is the only
exchange that is modem enough), then the network manager associated with exchange
A could not directly know that the T-C had failed. In this case he has to rely more
link
heavily on ‘gut feel’ and experience. So what other tools are available to help diagnosis?
The monitoring of a second parameter comes in useful as an aid to diagnosis. The
second parameter used is either the answerlbid ratio ( A B R ) or the answerlseizure ratio
( A S R ) . They are similar. ABR is the percentage ratio of answered calls to a given
destination,comparedwiththenumber ofcallattempts (bids) intendedforthat
particular destination. ASR is nearly the same, but instead is calculatedusingthe
number of outgoing seizures from the originating exchange as the denominator. The
only reason for using one value rather than the other mightbe that it is this value that
the particular exchange normally measures.
For network management purposes either ABR or ASR should be measured at each
originating exchange on a destination basis. Let us assume for the remainder of this
chapter that we have chosen to measure the ABR by destination as well as each route’s
traffic intensity in Erlangs. Therefore, exchange A in Figure 37.2 two ABR values are
at
monitored, corresponding to destinations B and C, and the route traffic A-T is also
measured. The ABR value is performed at A as follows
number of answered calls to destination C
ABR for destination C - originated at exchange A during time T
(measurement at A) total
number of calls to
destination
C
originated at exchange A during same time T
In the example of Figure37.2 the network manager at A, on noticing that the Erlang
threshold has been exceeded on route A-T will also notice that the ABR from A to C
hasalsosignificantlydeteriorated.Meanwhile, calls are still completingfrom A to
exchange B. The network manager at will not know the root cause the failure, but
A of
this is not necessary. The manager has two courses of action to choose from
0 call personnel at exchange C to request further information on the problem, and
thus receive advice on which network controls to apply
0 go ahead and implement unilateral network management controls on the basis of
the diagnosis to hand
Which course of action and the exact controlsused by the manager will depend on the
individual circumstance.
We have now seen that the monitoring of ABR values is invaluable in diagnosing
congestion, but it need not be restricted to a follow-on action which is considered only
after the traffic intensity threshold hasbeen exceeded. There are additional benefits to be
gained by monitoringdestination ABR valuescontinuouslyandcomparing figures
against threshold, or action required values. Figure 37.2 is again useful in illustration.
Consider the case when a failure link T-C occurs at night or some other period
of when
- NETWORK 687
demand is so low that, because of the low calling rateon the network as a whole, there is
no resulting traffic congestion on link A-T. The traffic threshold value in this instance
would not be exceeded; and if the network manager were not additionally monitoring
ABR on destination C (the value have dropped to zero completion rate), the problem
will
in completingcalls to C might pass unnoticed, andno corrective action could taken. be
In data networks it is nowadays common for trafficpolicing algorithms to regulate
the influx of traffic (i.e. packets, frames or cells) from a given source. Where a given
source exceeds the committed information rate (CIR), a frame rate contracted to be
‘guaranteed’ by the network at the time of set-up of the connection, then excess frames
or cells may either be marked to be the first to be thrown away when congestion is
encountered or may be directlydiscarded.This is achievedusingspecialsignalling
techniques. Forwardexplicitcongestionnotification (FECN) and backward explicit
congestion notification (BECN) messages are tagged to actual user message frames, to
alerttheuserendequipmentsandtheswitchesto which theyareconnected that
congestion has been encountered somewhere along the path. These messages allow
alleviation action tobe triggered. Equivalent messages also exist other data protocols
in
(e.g. ATM) but may have other names.
Automatic congestion notification procedures built into data networksavoid the
may
need, least
at to some extent, special
for monitoring congestion
and alleviation
measures to be taken by human network managers of data networks. Further measures
available to human network managers include taking individual virtual connections out
of service or re-scheduling particular computer devices run their programs to run at a
later (quieter) point in time.
The simplest method detecting network problems to monitor all available alarms.
of is
Alarms may simply activate bells and lamps, but a sophisticated network management
system can have equipment alarm presentation built in, and it can be capable of some
automatic problem diagnosis. As with other network status
information, alarm
information,alongwithABRs,etc., is most effectively presented tothenetwork
manager as a combinationof graphical and tabular displays on video terminals. These
allow the manager at glance to gaugeoverall network status, make a rapid diagnosis of
a
problems, and take corrective or fault-containing action in good time.
The collation of all alarms by a central network management system (sometimes
also called an umbrella network management system) has become the standard means of
network monitoring and control. Specialist network management platforms (suites of
computer hardware and software) have been developed specifically for this purpose, as
we saw in Chapter 21.
37.3 NETWORK
MANAGEMENT CONTROLS
With problem
the located, sort
what of network
management control will
be
appropriate? All networkmanagementactionscan be
classified intooneoftwo
categories
0 expansive controlactions
0 restrictive controlactions
- 688 CONTAINING NETWORK OVERLOAD
The correct action to taken in any individual circumstance
be needs to be considered in
the light of a set of guiding principles.
e Use all available equipment to complete calls or deliver data packets, frames or
cells.
e Give priority to calls or packets, most likely to complete.
e Prevent exchange (nodal) congestion and its spread.
e Givepriority to calls or data packets that can be completedusingonlyasmall
number of links.
Inan expansiveaction thenetworkmanagermakesfurtherresources or capacity
available to alleviate the congestion, whereas in restrictive action, having decided that
a
there are insufficient resources within the network asa whole to cope with the demand,
the manager can simply fail any calls coming in to the to reach (i.e. temporarily
new hard
congested) destination(s). It makes good sense to fail these calls close to their point of
origin, because early rejection calls frees many network resourcespossible, which
of as as
canthen be put to good use in completingcallsbetweenunaffectedpoints of the
network.
37.4 EXPANSIVE
CONTROL
ACTIONS
There are many examples of expansive actions. Perhaps the two most worthy of note
are
e network
restoration
e temporaryalternativere-routing (TAR)
Network restoration, no means a
by new technique, hasbeen carried out by many net-work
operators for years; but for some of them the introduction of computerized network
management centres has provided an opportunity to apply centrally controlled net-
work restoration for the first time.
Network restorationis made possibleby providing more plant in the network than the
normal traffic load requires. During times of failure this ‘spare’ or restoration plant is
used to ‘stand-in’ for the faulty equipment, for example, a failed cable or transmission
system, or even an entire failed exchange. By restoring service with spare equipment,
the faulty line system or exchange can be removed from service and repaired more
easily.
Network restoration techniques have historically been applied to transmission links,
where it has been usual to plan spare capacity into the network the formof analogue
in
spare groups, supergroups, hypergroups or as digital bit-carrying capacity. The spare
I
capacity is built into new transmission systems on a for N basis; i.e. 1 unit of restora-
tion capacity for every N traffic-carrying units.
The following example shows how 1 for N restoration works. Between two points of
a network, A and B, a number of transmission systems are required to carry the traffic.
These are to be provided in accordance with a l-in-4 restoration scheme. One example
- EXPANSIVE CONTROL A(;TIONS 689
of how this could met is with five systems, operated asfour fully loaded transmission
be
line plus a separate spare (which should kept warm; inactive plant tends not to work
be
when called into action). Automatic changeover equipment can be used to effect instant
restoration of any of the other cables, should they fail. An alternative but equally valid
l-in-4 configurationis to load each of the cables at four-fifths 'stand-in', should any
five
of the other four fail (Figure 37.3(a)). In the latter case, the failed circuits must be
restored in four parts, each the other cables taking a quarter the failed portion (as
of of
shown in Figure 37.3(b)).
In practice, not all cables are of the same capacity and it always practicable or
is not
economic to restore cables on a one-for-one basis as shown in Figure 37.3(a). Both
methods prevail. Another common practice used for restoration is that of 'triangula-
tion'(concatenatinganumber of restorationlinks via thirdpointstoenable full
restoration). Figure 37.4 illustrates the principle of triangulation. In the simple example
shown, a cable exists from exchange A to exchange but there is no direct restoration
B,
path. Restoration is provided instead by plant which is made available in the triangle of
links A-C and C-B. These restoration links are also used individually to restore simpler
cable failures, i.e. on the one-link connections such as A-C or B-C.
Because of the scope for triangulation, restoration networks (also called protection
networks) are often designed on a network-wide basis. This enables overall network
costs to be minimized without seriously affecting their resilience to problems.
-
[ a ) E n t i r e sporecobleundernormalcondition
Working cable'
1
* '
Workingcable*
Working able%
c
3
Failed cable
A / I B
R e s t o r e ds t a n d b y t
- 5
Each working cab!e carries its normal load
t Cable 5 ' s t a n d s - I n ' to carry the e n t t r e l o a d of c a b l e I
N o r m a l l y c a b l e 5 i s idle
(bl Spread load under normol condition
Working cable 3
l
Working cable a
Failed cable
W o r k i nc a b l e 2
4
% E a c h c a b l e n o r m a l l y o n l y 80 per cent loaded
O n f a i l u r e of any one cable, four 2 0 per cent components
a r e s p r e a d over the other four F a b l e s
Figure 37.3 Two methods of 1 in 4 restoration
- 690 OVERLOAD CONTAINING NETWORK
r cable Failed
A It
JJ 0
C
?
Figure 3 . Restoration by triangulation
74
The use of idle capacity via third points is the basis of a second expansive action,
termed temporary alternative re-routing (TAR). This method is generally invoked only
from computer-controlled switches where routing data changes can be made easily. It
involves either routing calls,to a particular destination temporarily different routes,
via
or allowing moreroute-overflowoptionsthannormal. In Figure 37.5, somedirect
capacity between A and B has failed, resulting in congestion. On noticing this, the
network manager has also observed that the routes A-C and C-B are currently lightly
loaded (again by chance this traffic has a different busy hour from that of A-B). A
temporary overflow route via C would help to relieve congestion from A to B. The
manager can adapt the normal routing choice list (used at exchange A) to include a
temporary overflow via C . This expedient route path is termed a temporary alternative
route, or TAR.
Directroute
r e m a i n s f i r s t choice
.
A
Failure / - 0
Overt low v i a C
H temporarily
permitted
TAR*\ /
U
*Transit traffic from A to B v i a C not normally allowed
Figure 3 . Temporaryalternative re-routing
75
- RESTRICTIVE ACTIONS 691
0 sub-network ring
El crossconnect
Figure 3 . Alternative paths A-B in an SDH network made up of sub-network rings
76
In modem transmission technology(SDH and SONET as discussed in Chapter 13),
restoration capabilities are built-in. Thus in both SDH and SONET it is intended that
highly resilient transmission networks should be built up from inter-meshed ring sub-
networks. Thering topology alone leads to the possibility of alternative routing around
the surviving ring arc, should oneside of the ring become broken due to a link failure.
Crossconnect points between ring sub-networksfurther
ensuremultitude
a of
alternative paths through larger networks, as clear from Figure37.6. The possibilities
is
are limited only by the capabilities of the network planner to dimension the network
andtopologyappropriatelyandtheability of the network management system to
execute the necessary path changes at times when individual links fail come back into
or
service.
In data networks (e.g. packet, frame and cell-switched networks) alternative routing
is usually automatically inherent within the normal routing algorithms of the network,
and is undertaken automatically by affected switches, sometimes in conjunction with
the network management system.
Where temporary alternative routing can be undertaken the or
at data voice
switching level, it may make sense to include all available transmission capacity into the
switched network rather than keep some capacity free for transmission restoration
level
(e.g. using SDH). This saves the need for additional restoration switching equipment at
the transmission level and in addition will afford much better network performance to
customers when no links are out of service, because more bandwidth will normally be
available.
37.5 RESTRICTIVE
CONTROL
ACTIONS
Unfortunately, there will always be some condition under which no further expansive
action is possible (in Figure 37.5, routes A-C and C-B may already be busy with their
- 692 OVERLOAD CONTAINING NETWORK
own direct traffic, or may not be large enough for the extra demand imposed by A-B
traffic). In this state, congestion cannotbe alleviated. Even worse, all calls made in vain
to the problematic destination will be a nuisance to other callers, because these fruitless
calls lead to congestion traffic attempting to reach other destination. In this case, the
of
best action is to refuse (or at least restrain) calls to the affected destination as near to
their points of origination as possible. Perhaps the most flexible restrictive actions is
of
call gapping (in the case of a data network, the similar and equivalent action is called
pacing, POWcontrol or ingress control).
Under call gapping or flow control the traffic demand is ‘diluted’ at all originating
exchanges. A restrictednumber of callattempts(ordataframes)tothe affected
destination are allowed to pass from each originating exchange into the network as a
whole. Within the wider network, this reduces the network overload, relieves congestion
of traffic to other destinations, giving a better chance of completion. There are two
principle sub-variantsof call gapping, as shown in Figure 37.7. These are thel-in-N and
I-in-T types. The I-in-N method allows every Nth call to pass into the network. The
remaining proportion of calls, ( N - I ) / N , are failed immediately at their originating
exchange (in this case, A). These callers hear networkbusy tone. The l-in-T method, by
comparison, performs a similar call dilution by allowing only 1 call every T seconds to
mature.
A C B [Congested
destination)
4
Traffic
source
L Call gapping
applied here on The overall
c a l l s to B network
(Greater completion
is reduced success
C to unaffected
destination)
Call gapping type Effect
1-in-N Of N calls generated at a
particular exchange for the
affected destination, only one
is allowed to mature into a
n e t w o r k call a t t e m p t . All others
are failed at A regardless
1 -in- T Only 1 call attempt every T
seconds is permitted to mature
beyond the originating
exchange
Figure 37.7 Callgapping
- RESTRICTIVE CONTROL ACTIONS 693
In the first method, the valueof N may be varied, to control the level of call dilution
(for N = 2, 50% of calls would be allowed into the network; for N = 3, 33%, etc.).
Similarly, in the l-in-T method, the value T may be adjusted according to the level of
congestion. The greater the value at which T or N are set at any particular instant in
time, the greater the diluting effect on calls. In data flow control, any incoming frames
over and above a rate allowed by a given threshold value areeithermarkedfor
preferential discarding (i.e. will be thrown away before other frames should congestion
be encountered) or may be immediately discarded.
N
If congestion continues to get worse, the value or T can be increased accordingly.
With a very large value of N or T, nearly all calls are blocked at the originating point.
Theaction of completeblocking is quiteradical,butnonethelessit is sometimes
necessary. This measure may be appropriate following a public disaster (earthquake,
riot, major fire, etc.). Frequently in these conditions, the public are given only one
telephone number as a point enquiry, and inevitably there an instant flood afcalls
of is
Exchange
Network
Network statusinformation
Control
information y-tz management
Network
manager
Figure 3 . A network management system
78
Figure 37.9 Network management centre. AT&T's Network Operations Centre at Bedminster
in New Jersey, USA. It controls the AT&T worldwide intelligent network, the most advanced
telecommunications network in the world. The centre controls data and voicecallsover2.3
billion circuit miles worldwide, handling more than 75 million cals a day. It is active 24 hours a
day, 365 days a year. (Courtesy o AT&T)
f
- 694 OVERLOAD CONTAINING NETWORK
to the number, fewof which can be completed. In this instance, call gapping is a
powerful tool for diluting calls,therebyincreasingthelikelihood of successful call
completion for other network users.
37.6 NETWORK MANAGEMENT SYSTEMS
Today it is common for network management computer systems to be developed as an
integral partof the modernswitches (i.e. exchanges) or sub-networks which they control.
Direct datalink connections between exchange processors or sub-networks andnetwork
management systems allow real-time network status information tobe presented to the
network managementsystems, and for traffic control signals tobe returned. In this way,
switch data changes or other network control methods made quickly. An example
be can
might be a switch routing data change to amend routing patterns by the addition of
TARS. Another exampleof a control signal might be an indication to the exchange to
perform call gapping on calls to an appropriate destination. Figure 37.8 illustrates
the functional architecture of a typical network management system, but we discuss the
subject in more depth in Chapter 27.
Network management systems can be procured either from computer manufacturers
or computer software companies, but they are increasingly being offered by switch
manufacturers as integral parts new telephone or dataexchanges. As the competition
of
between suppliers becomes ever greater, perhaps we will have to wait only a few years
before someone achieves a truly ‘self-healing’ network, one that remains congestion-
free without the continuous assistance of human network managers.
nguon tai.lieu . vn