Xem mẫu

  1. Networks and Telecommunications: Design and Operation, Second Edition. Martin P. Clark Copyright © 1991, 1997 John Wiley & Sons Ltd ISBNs: 0-471-97346-7 (Hardback); 0-470-84158-3 (Electronic) 37 Containing Network Overload Network designers like think that their traffic models ideally stable, andthat their forecasts to are of future traffic will never be wrong.In real life, however, such assumptions are unwise, because any one of a number of problems may arise, resulting in network overload, and so congestion. The forecast may underestimate demand; there may be a short period of extraordinarily high pressure (for example at New Year, Christmas, or any public holiday, or following a natural disaster); or there may be a network link switch (exchange) failure. Monitoring controlling or and the network to detect and avoid network congestion and the resulting service degradation is a difficult task, but an important one. This chapter discusses these new‘networkmanagement’ methods in more detail. 37.1 THE EFFECT OF CONGESTION Congestion in a telecommunications network manifests itself to customers who are attempting to make calls as a ‘networkbusy’ tone, or as a delay computer or data net- in work response. It annoys the customer;even worse, the congestion can rapidly increase of as the result customer or equipment repeat attempts. (Repeat attempts further call are attempts, made in vainby customers hoping to make a quick connection by immediate re-dialling.)Thesefurther call attempts onlyexacerbatetheproblem,becausethey greatly increase the loading on exchange equipment (e.g. switch processors, number stores, senders and receivers), and they add to the overall volume. The congestion traffic can then spiral out of control. The networks most at risk from congestion are those which have been kept to minimum circuit numbers in order to keep costs down. Also very much at risk are those networks that employ a high proportion of multi-link overflow routings, because congestion on one link this type of network rapidly affects in other routes, with a consequent ‘dominoeffect’). As aresult of practicalstudies, teletraffic expertshaveproducedmodels which show that under overload conditions the effective throughput of a network can actually fall.Thismakestheoverloadworse,furtherreducing throughput,asFigure 37.1 demonstrates. 683
  2. 684 OVERLOAD CONTAINING NETWORK I Offered traffic Load ( e r l a n g s ) Figure 37.1 The effect of congestion Spotting the onset of congestion and taking early appropriate action is crucial to maintaining control of the network. Only by careful control can customer annoyance and unnecessary call failure be minimized. 37.2 NETWORK MONITORING The most common parameter used to measure network congestion in a circuit-switched network (such as a telex or telephony network) is the trafic load (usually measured in Erlangs, the number of circuits in use) over a period of time. In a packet network, the length of the delay queue (i.e. the number of packets awaiting transmission or the percentage trunk loading) can used instead, aswe saw in the formulaeof Chapter 30. be To serve the purposes of network management, these parameters need to be monitored frequently, so giving a human network manager a real-time perceptionofnetwork performance. Other measurements, such as the lost call count (peg count) or machine overload information may also be available, helping quickly to diagnose congestion. However, in all cases, care must be taken when interpreting measurements taken over too short a periodbecause short duration measurementsor a very small sample of calls can be statistically deceptive. Traffic measurements are not reliable when taken over a time period much shorter than the average call holding time. Traffic and network performance information nowadays usually comes from com- puters. Daily, weekly, and monthly usage records can be calculated by computer post- processing of information. The post-processing may only be made well after the day when the traffic itself was recorded from information stored on magnetic tape or disk. Such information is clearly of no advantage in recommending immediate alleviation action. The experience of many network operators is that network management mon- itoring information can be obtained only by setting up a completely distinct computer network management system.
  3. NETWORK MONITORING 685 The monitoring part of a network management system relies on the processing of status information (i.e. information about the current state of the network and its components), output in computer format by the switches or stored program control ( S P C ) exchanges. Commonly this information comprises alarms and other fault mes- sages, but in addition may also include an itemized call record for each call made through the exchange. Call records are merely computer messages from the exchange to the network management system, giving all the details of the call (i.e. number dialled, start time, duration, whether successful or not, etc.). From the collation of information relating to all current andvery recent calls, a true picture of the current network state may be calculated by the network management system, and is displayed in some way to a human network manager. Most importantly, the load oftraffic and the degreeof congestion on each individual route into and out of the exchange can be shown. If the network manager, in watching this information (or having it alarmed to his or her attention), notices that the congestion level on a particular route, link or exchange has exceeded a pre-determined threshold or that some other problem has arisen, he or she needs to evaluate the cause of the problem and take corrective action as fast as possible. A quick and correct diagnosis enables the manager to adopt the best available solution. It is worth reflecting here that, as in road traffic jams and human diseases,the cause of a telecommunication network problem and its symptoms may not both be centred on the same place or region. Causesmay lurk a long way from the effect. For illustration, consider the example shown in Figure 37.2. As a result of failure on the link that connects the tandem exchange with exchange T A C, there is growing congestion on the link between exchanges and T. The congestion is caused by customers at exchange A persisting in vain attempts to make new calls to exchange C . Therepeatattemptsareresulting in unusuallyhigh traffic between exchanges A and T, thereby also affecting other customers A wishing to connect to at B. In a very sophisticatednetworkmonitoringscheme,thenetworkmanageraimsto achieve an instant indication and diagnosis of major problems. This may be easy, for example, as the resultof an alarm. In Figure 37.2, the T-C link failure may register an alarm, which will instantly tell the manager the cause of network congestion. In prac- tice, however, real-time network monitoring facilities may only be available at alimited B Symptom- congestion A T \ Cause Link f o i l u h Figure 37.2 Example of symptom and cause of congestion
  4. 686 CONTAINING NETWORK OVERLOAD number of nodes or withingiven sub-network areas corresponding to thedata network technology of a particular supplier, so that the network manager may have only an incomplete knowledge of the network status as a whole. In this case instant diagnosis and resolution of problems will require some guesswork. If in Figure 37.2, exchange A is the only one with network management capabilities (perhaps because it is the only exchange that is modem enough), then the network manager associated with exchange A could not directly know that the T-C had failed. In this case he has to rely more link heavily on ‘gut feel’ and experience. So what other tools are available to help diagnosis? The monitoring of a second parameter comes in useful as an aid to diagnosis. The second parameter used is either the answerlbid ratio ( A B R ) or the answerlseizure ratio ( A S R ) . They are similar. ABR is the percentage ratio of answered calls to a given destination,comparedwiththenumber ofcallattempts (bids) intendedforthat particular destination. ASR is nearly the same, but instead is calculatedusingthe number of outgoing seizures from the originating exchange as the denominator. The only reason for using one value rather than the other mightbe that it is this value that the particular exchange normally measures. For network management purposes either ABR or ASR should be measured at each originating exchange on a destination basis. Let us assume for the remainder of this chapter that we have chosen to measure the ABR by destination as well as each route’s traffic intensity in Erlangs. Therefore, exchange A in Figure 37.2 two ABR values are at monitored, corresponding to destinations B and C, and the route traffic A-T is also measured. The ABR value is performed at A as follows number of answered calls to destination C ABR for destination C - originated at exchange A during time T (measurement at A) total number of calls to destination C originated at exchange A during same time T In the example of Figure37.2 the network manager at A, on noticing that the Erlang threshold has been exceeded on route A-T will also notice that the ABR from A to C hasalsosignificantlydeteriorated.Meanwhile, calls are still completingfrom A to exchange B. The network manager at will not know the root cause the failure, but A of this is not necessary. The manager has two courses of action to choose from 0 call personnel at exchange C to request further information on the problem, and thus receive advice on which network controls to apply 0 go ahead and implement unilateral network management controls on the basis of the diagnosis to hand Which course of action and the exact controlsused by the manager will depend on the individual circumstance. We have now seen that the monitoring of ABR values is invaluable in diagnosing congestion, but it need not be restricted to a follow-on action which is considered only after the traffic intensity threshold hasbeen exceeded. There are additional benefits to be gained by monitoringdestination ABR valuescontinuouslyandcomparing figures against threshold, or action required values. Figure 37.2 is again useful in illustration. Consider the case when a failure link T-C occurs at night or some other period of when
  5. NETWORK 687 demand is so low that, because of the low calling rateon the network as a whole, there is no resulting traffic congestion on link A-T. The traffic threshold value in this instance would not be exceeded; and if the network manager were not additionally monitoring ABR on destination C (the value have dropped to zero completion rate), the problem will in completingcalls to C might pass unnoticed, andno corrective action could taken. be In data networks it is nowadays common for trafficpolicing algorithms to regulate the influx of traffic (i.e. packets, frames or cells) from a given source. Where a given source exceeds the committed information rate (CIR), a frame rate contracted to be ‘guaranteed’ by the network at the time of set-up of the connection, then excess frames or cells may either be marked to be the first to be thrown away when congestion is encountered or may be directlydiscarded.This is achievedusingspecialsignalling techniques. Forwardexplicitcongestionnotification (FECN) and backward explicit congestion notification (BECN) messages are tagged to actual user message frames, to alerttheuserendequipmentsandtheswitchesto which theyareconnected that congestion has been encountered somewhere along the path. These messages allow alleviation action tobe triggered. Equivalent messages also exist other data protocols in (e.g. ATM) but may have other names. Automatic congestion notification procedures built into data networksavoid the may need, least at to some extent, special for monitoring congestion and alleviation measures to be taken by human network managers of data networks. Further measures available to human network managers include taking individual virtual connections out of service or re-scheduling particular computer devices run their programs to run at a later (quieter) point in time. The simplest method detecting network problems to monitor all available alarms. of is Alarms may simply activate bells and lamps, but a sophisticated network management system can have equipment alarm presentation built in, and it can be capable of some automatic problem diagnosis. As with other network status information, alarm information,alongwithABRs,etc., is most effectively presented tothenetwork manager as a combinationof graphical and tabular displays on video terminals. These allow the manager at glance to gaugeoverall network status, make a rapid diagnosis of a problems, and take corrective or fault-containing action in good time. The collation of all alarms by a central network management system (sometimes also called an umbrella network management system) has become the standard means of network monitoring and control. Specialist network management platforms (suites of computer hardware and software) have been developed specifically for this purpose, as we saw in Chapter 21. 37.3 NETWORK MANAGEMENT CONTROLS With problem the located, sort what of network management control will be appropriate? All networkmanagementactionscan be classified intooneoftwo categories 0 expansive controlactions 0 restrictive controlactions
  6. 688 CONTAINING NETWORK OVERLOAD The correct action to taken in any individual circumstance be needs to be considered in the light of a set of guiding principles. e Use all available equipment to complete calls or deliver data packets, frames or cells. e Give priority to calls or packets, most likely to complete. e Prevent exchange (nodal) congestion and its spread. e Givepriority to calls or data packets that can be completedusingonlyasmall number of links. Inan expansiveaction thenetworkmanagermakesfurtherresources or capacity available to alleviate the congestion, whereas in restrictive action, having decided that a there are insufficient resources within the network asa whole to cope with the demand, the manager can simply fail any calls coming in to the to reach (i.e. temporarily new hard congested) destination(s). It makes good sense to fail these calls close to their point of origin, because early rejection calls frees many network resourcespossible, which of as as canthen be put to good use in completingcallsbetweenunaffectedpoints of the network. 37.4 EXPANSIVE CONTROL ACTIONS There are many examples of expansive actions. Perhaps the two most worthy of note are e network restoration e temporaryalternativere-routing (TAR) Network restoration, no means a by new technique, hasbeen carried out by many net-work operators for years; but for some of them the introduction of computerized network management centres has provided an opportunity to apply centrally controlled net- work restoration for the first time. Network restorationis made possibleby providing more plant in the network than the normal traffic load requires. During times of failure this ‘spare’ or restoration plant is used to ‘stand-in’ for the faulty equipment, for example, a failed cable or transmission system, or even an entire failed exchange. By restoring service with spare equipment, the faulty line system or exchange can be removed from service and repaired more easily. Network restoration techniques have historically been applied to transmission links, where it has been usual to plan spare capacity into the network the formof analogue in spare groups, supergroups, hypergroups or as digital bit-carrying capacity. The spare I capacity is built into new transmission systems on a for N basis; i.e. 1 unit of restora- tion capacity for every N traffic-carrying units. The following example shows how 1 for N restoration works. Between two points of a network, A and B, a number of transmission systems are required to carry the traffic. These are to be provided in accordance with a l-in-4 restoration scheme. One example
  7. EXPANSIVE CONTROL A(;TIONS 689 of how this could met is with five systems, operated asfour fully loaded transmission be line plus a separate spare (which should kept warm; inactive plant tends not to work be when called into action). Automatic changeover equipment can be used to effect instant restoration of any of the other cables, should they fail. An alternative but equally valid l-in-4 configurationis to load each of the cables at four-fifths 'stand-in', should any five of the other four fail (Figure 37.3(a)). In the latter case, the failed circuits must be restored in four parts, each the other cables taking a quarter the failed portion (as of of shown in Figure 37.3(b)). In practice, not all cables are of the same capacity and it always practicable or is not economic to restore cables on a one-for-one basis as shown in Figure 37.3(a). Both methods prevail. Another common practice used for restoration is that of 'triangula- tion'(concatenatinganumber of restorationlinks via thirdpointstoenable full restoration). Figure 37.4 illustrates the principle of triangulation. In the simple example shown, a cable exists from exchange A to exchange but there is no direct restoration B, path. Restoration is provided instead by plant which is made available in the triangle of links A-C and C-B. These restoration links are also used individually to restore simpler cable failures, i.e. on the one-link connections such as A-C or B-C. Because of the scope for triangulation, restoration networks (also called protection networks) are often designed on a network-wide basis. This enables overall network costs to be minimized without seriously affecting their resilience to problems. - [ a ) E n t i r e sporecobleundernormalcondition Working cable' 1 * ' Workingcable* Working able% c 3 Failed cable A / I B R e s t o r e ds t a n d b y t - 5 Each working cab!e carries its normal load t Cable 5 ' s t a n d s - I n ' to carry the e n t t r e l o a d of c a b l e I N o r m a l l y c a b l e 5 i s idle (bl Spread load under normol condition Working cable 3 l Working cable a Failed cable W o r k i nc a b l e 2 4 % E a c h c a b l e n o r m a l l y o n l y 80 per cent loaded O n f a i l u r e of any one cable, four 2 0 per cent components a r e s p r e a d over the other four F a b l e s Figure 37.3 Two methods of 1 in 4 restoration
  8. 690 OVERLOAD CONTAINING NETWORK r cable Failed A It JJ 0 C ? Figure 3 . Restoration by triangulation 74 The use of idle capacity via third points is the basis of a second expansive action, termed temporary alternative re-routing (TAR). This method is generally invoked only from computer-controlled switches where routing data changes can be made easily. It involves either routing calls,to a particular destination temporarily different routes, via or allowing moreroute-overflowoptionsthannormal. In Figure 37.5, somedirect capacity between A and B has failed, resulting in congestion. On noticing this, the network manager has also observed that the routes A-C and C-B are currently lightly loaded (again by chance this traffic has a different busy hour from that of A-B). A temporary overflow route via C would help to relieve congestion from A to B. The manager can adapt the normal routing choice list (used at exchange A) to include a temporary overflow via C . This expedient route path is termed a temporary alternative route, or TAR. Directroute r e m a i n s f i r s t choice . A Failure / - 0 Overt low v i a C H temporarily permitted TAR*\ / U *Transit traffic from A to B v i a C not normally allowed Figure 3 . Temporaryalternative re-routing 75
  9. RESTRICTIVE ACTIONS 691 0 sub-network ring El crossconnect Figure 3 . Alternative paths A-B in an SDH network made up of sub-network rings 76 In modem transmission technology(SDH and SONET as discussed in Chapter 13), restoration capabilities are built-in. Thus in both SDH and SONET it is intended that highly resilient transmission networks should be built up from inter-meshed ring sub- networks. Thering topology alone leads to the possibility of alternative routing around the surviving ring arc, should oneside of the ring become broken due to a link failure. Crossconnect points between ring sub-networksfurther ensuremultitude a of alternative paths through larger networks, as clear from Figure37.6. The possibilities is are limited only by the capabilities of the network planner to dimension the network andtopologyappropriatelyandtheability of the network management system to execute the necessary path changes at times when individual links fail come back into or service. In data networks (e.g. packet, frame and cell-switched networks) alternative routing is usually automatically inherent within the normal routing algorithms of the network, and is undertaken automatically by affected switches, sometimes in conjunction with the network management system. Where temporary alternative routing can be undertaken the or at data voice switching level, it may make sense to include all available transmission capacity into the switched network rather than keep some capacity free for transmission restoration level (e.g. using SDH). This saves the need for additional restoration switching equipment at the transmission level and in addition will afford much better network performance to customers when no links are out of service, because more bandwidth will normally be available. 37.5 RESTRICTIVE CONTROL ACTIONS Unfortunately, there will always be some condition under which no further expansive action is possible (in Figure 37.5, routes A-C and C-B may already be busy with their
  10. 692 OVERLOAD CONTAINING NETWORK own direct traffic, or may not be large enough for the extra demand imposed by A-B traffic). In this state, congestion cannotbe alleviated. Even worse, all calls made in vain to the problematic destination will be a nuisance to other callers, because these fruitless calls lead to congestion traffic attempting to reach other destination. In this case, the of best action is to refuse (or at least restrain) calls to the affected destination as near to their points of origination as possible. Perhaps the most flexible restrictive actions is of call gapping (in the case of a data network, the similar and equivalent action is called pacing, POWcontrol or ingress control). Under call gapping or flow control the traffic demand is ‘diluted’ at all originating exchanges. A restrictednumber of callattempts(ordataframes)tothe affected destination are allowed to pass from each originating exchange into the network as a whole. Within the wider network, this reduces the network overload, relieves congestion of traffic to other destinations, giving a better chance of completion. There are two principle sub-variantsof call gapping, as shown in Figure 37.7. These are thel-in-N and I-in-T types. The I-in-N method allows every Nth call to pass into the network. The remaining proportion of calls, ( N - I ) / N , are failed immediately at their originating exchange (in this case, A). These callers hear networkbusy tone. The l-in-T method, by comparison, performs a similar call dilution by allowing only 1 call every T seconds to mature. A C B [Congested destination) 4 Traffic source L Call gapping applied here on The overall c a l l s to B network (Greater completion is reduced success C to unaffected destination) Call gapping type Effect 1-in-N Of N calls generated at a particular exchange for the affected destination, only one is allowed to mature into a n e t w o r k call a t t e m p t . All others are failed at A regardless 1 -in- T Only 1 call attempt every T seconds is permitted to mature beyond the originating exchange Figure 37.7 Callgapping
  11. RESTRICTIVE CONTROL ACTIONS 693 In the first method, the valueof N may be varied, to control the level of call dilution (for N = 2, 50% of calls would be allowed into the network; for N = 3, 33%, etc.). Similarly, in the l-in-T method, the value T may be adjusted according to the level of congestion. The greater the value at which T or N are set at any particular instant in time, the greater the diluting effect on calls. In data flow control, any incoming frames over and above a rate allowed by a given threshold value areeithermarkedfor preferential discarding (i.e. will be thrown away before other frames should congestion be encountered) or may be immediately discarded. N If congestion continues to get worse, the value or T can be increased accordingly. With a very large value of N or T, nearly all calls are blocked at the originating point. Theaction of completeblocking is quiteradical,butnonethelessit is sometimes necessary. This measure may be appropriate following a public disaster (earthquake, riot, major fire, etc.). Frequently in these conditions, the public are given only one telephone number as a point enquiry, and inevitably there an instant flood afcalls of is Exchange Network Network statusinformation Control information y-tz management Network manager Figure 3 . A network management system 78 Figure 37.9 Network management centre. AT&T's Network Operations Centre at Bedminster in New Jersey, USA. It controls the AT&T worldwide intelligent network, the most advanced telecommunications network in the world. The centre controls data and voicecallsover2.3 billion circuit miles worldwide, handling more than 75 million cals a day. It is active 24 hours a day, 365 days a year. (Courtesy o AT&T) f
  12. 694 OVERLOAD CONTAINING NETWORK to the number, fewof which can be completed. In this instance, call gapping is a powerful tool for diluting calls,therebyincreasingthelikelihood of successful call completion for other network users. 37.6 NETWORK MANAGEMENT SYSTEMS Today it is common for network management computer systems to be developed as an integral partof the modernswitches (i.e. exchanges) or sub-networks which they control. Direct datalink connections between exchange processors or sub-networks andnetwork management systems allow real-time network status information tobe presented to the network managementsystems, and for traffic control signals tobe returned. In this way, switch data changes or other network control methods made quickly. An example be can might be a switch routing data change to amend routing patterns by the addition of TARS. Another exampleof a control signal might be an indication to the exchange to perform call gapping on calls to an appropriate destination. Figure 37.8 illustrates the functional architecture of a typical network management system, but we discuss the subject in more depth in Chapter 27. Network management systems can be procured either from computer manufacturers or computer software companies, but they are increasingly being offered by switch manufacturers as integral parts new telephone or dataexchanges. As the competition of between suppliers becomes ever greater, perhaps we will have to wait only a few years before someone achieves a truly ‘self-healing’ network, one that remains congestion- free without the continuous assistance of human network managers.
nguon tai.lieu . vn