Xem mẫu

VNU Journal of Science, Natural Sciences and Technology 24 (2008) 147-161 Learning approaches to support dynamics in communication networks Abdelhamid Mellouk1,*, Saïd Hoceïni1, Saida Ziane1, Malika Bourennane2 1LISSI/SCTIC Laboratory, IUT Creteil/Vitry University Paris XII, France. 122, rue Paul Armangot, 94400 Vitry sur Seine, France 2Department of Computer Science, University Es Senia, Algeria Received 31 October 2007 Abstract, In the context of modern high-speed communication networks, decision reactivity is often complicated by the notion of guaranteed Quality of Service (QoS), which can either be related to time, packet loss or bandwidth requirements: constraints related to various types of QoS make some algorithms not acceptable. Due to emerging real-time and multimedia applications, efficient routing of information packets in dynamically changing communication network requires that as the load levels, traffic patterns and topology of the network change, the decision policy also adapts. We focused in this paper on QoS based mechanisms by developing a neuro-dynamic programming to construct dynamic state-dependent policies. In this paper, we present an accurate description of the current state- of-the-art and give an overview of our work in the use of reinforcement learning concepts focused on communications networks. We focus our attention by developing a system based on this paradigm and study the use of reinforcement learning approaches in three different communication networking domains: wired networks, mobile ad hoc networks, and packet router’s scheduling networks. Keywords: Self-Depedent Mechanism Decision, Quality of Service based Routing, Multi Path Routing. Dynamic Networks, Reinforcement Learning, Adaptive Scheduling. 1. Introduction* Today, providing a good quality of service (QoS) in irregular traffic networks is an important challenge. Besides, the impressive emergence and the important demand of the rising generation of real-time Multi-service (such as Data, Voice VoD, Video-Conference, etc.) over communication heterogeneous networks, require scalability while considering _______ * Corresponding author. E-mail: mellouk@univ-paris12.fr a continuous QoS. This emergence of rising generation Internet required intensive studies these last years which were based on QoS routing for heterogeneous networks on the one hand and on the backbone architecture level of communication networks characterized by a high and irregular traffic on the other hand [1]. The basic function of QoS routing is to find a network path which satisfies the given constraints and optimize the resource utilization. The integration of QoS parameters increases the complexity of the used routing 147 148 Abdelhamid Mellouk et al. / VNU Journal of Science, Natural Sciences and Technology 24 (2008) 147-161 algorithms. Thus, the problem of determining a So, it makes an effective use of the graph QoS route that satisfies two or more path structure on a network, as opposed to single constraints (for example, delay and cost) is path routing which superimposes a logical known to be NP- complete [2]. A difficulty is routing tree upon the network topology. We that the time required to solve the Multi- find in literature many and various approaches Constrained Optimal path problem exactly that have been proposed to take into account the cannot be upper-bounded by a polynomial QoS requirement. The reader can refer to [8] for function. Hence the focus has been on the an overview. development of pseudo-polynomial time algorithms, heuristics and approximation algorithms for multi- constrained QoS paths [3]. Constraints imposed by QoS requirements, such as bandwidth, delay, or loss, are referred to as QoS constraints, and the associated At present, several studies have been routing is referred to as QoS routing which is a conducted on QoS routing algorithms which part of Constrained-Based Routing (CBR). integrate the QoS requirements problematic for the routing algorithm. [4] introduce heuristics to find a source-to-destination path that satisfies Interest in constrained-based routing has been steadily growing in the Networks. Based on heuristics used in all of these approaches to two or more additive constraints on edge reduce their complexity, we can classified it in weights. [5] has proposed a polynomial time approximation algorithm for k multi-constrained path which uses a shortest path algorithm such as Dijkstra’s [6,7] propose a randomized heuristic that employs two phases. In the first one, a shortest path is computed for each of the k QoS constraints as well as for a linear combination of all k constraints. The second phase performs a randomized breadth-first search for a solution of k multi-constrained problem. In [3], authors suggest that QoS routing in realistic networks could not be NP-complete regarding to a particular class of networks (topology and link weight structure). Due this complexity, QoS routing problems are divided on several classes according to some aspects. For example, we distinguish the single path routing problem and the multipath routing problem, where routers maintain multiple distinct paths of arbitrary costs between a source and a destination. The Multipath routing offers several advantages like good bandwidth, bounding delay variation, minimizing delay, and improved fault tolerance. three main categories: Label Switching/Reservation Approaches-spurred by approaches like ATM PNNI, MPLS or GMPLS. With MPLS, fixed length labels are attached to packets at an ingress router, and forwarding decisions are based on these labels in the interior routers of the label-switched path. MPLS Traffic Engineering allows overriding the default routing protocol, thus forwarding over paths not normally considered. A resource reservation protocol such as RSVP must be employed to reserve the required resources. Another Architecture proposed for providing Internet QoS is the Differentiated Services architecture. Diffserv scales well by pushing complexity to network domain boundaries. Multi-Constrained Path Approaches (MCP) - The goal of all of these approaches is to retrieve the shortest path among the set of feasible paths between two nodes. Considerable work in the literature has focused on a special case of the MCP problem known as the Restricted Shortest Path (RSP) problem. The goal is to find the least-cost path among those Abdelhamid Mellouk et al. / VNU Journal of Science, Natural Sciences and Technology 24 (2008) 147-161 149 that satisfy only one constraint. An overview of these approaches can be found in [9]. Inductive approaches- To be able to make an optimal routing decision, according to relevant performance criteria, a network node requires to have a complete knowledge of the entire network state and an accurate prediction of the evolution of the networks and its dynamics. This, however, is impossible unless the routing algorithm is capable of adapting to the network state changes in almost real time. Thus, it is necessary to design intelligent and adaptive optimizing routing algorithms which take into account the network state and its evolution. We need to talk about QoS based state dependent routing algorithm. In this contribution, we present an accurate description of the current state-of-the-art and give an overview of our work in the use of reinforcement learning concepts focused on communication networks. We focus our attention by developing a system based on this paradigm called KOCRA for K Optimal Constrained path Routing Algorithm. Basically, these inductive approaches selects routes based on flow QoS requirements and network resource availability. After developing in section 2 the concept of routing in high speed networks, we present in section 3 the family of inductive approaches. After, we present our works based on reinforcement learning approaches in three different communication networking domains: wired networks, mobile ad hoc networks, and packet router’s scheduling networks. Last section concludes and gives some perspectives of this work. between ASes is not optimized at the network level, but rather it is based on the business relationships between organizations. The fully-independent management actions in each AS are expressed in terms of a policy-based routing strategy which primarily controls the outbound traffic of an AS and can include conflicting policies. A global solution for QoS routing over all the ASes must be able to handle both the differing QoS provisioning mechanisms and service specifications. This latter solution of building models of large ISP’s is so complex to obtain [10]. For this, Routing is divided onto two classes: IGP and EGP. IGP, such as OSPF or IS-IS, compute the interior paths in one AS, while EGP, such as BGP, is responsible for the selection of the inter-domain paths. To fulfill application QoS requirements, many ISPs have deployed mechanisms to provide differentiated services in their networks. In fact, in the last decade, the development of none of QoS routing proposals has turned out to be sufficiently appealing to become deployed in practice. This is because ISPs have preferred to overprovision their networks rather than deliver and manage QoS [11]. In the IGP or EGP cases, a routing algorithm is based on the hop-by-hop shortest-path paradigm. The source of a packet specifies the address of the destination, and each router along the route forwards the packet to a neighbor located “closest” to the destination. The best optimal path is chosen according to given criteria. When the network is heavily loaded, some of the routers introduce an excessive delay while others are under-utilized. In some cases, this non-optimized usage of the network resources may introduce not only 2. Routing problem As Internet is a large collection of more than 25,000 independent domains called autonomous systems (Ases), the cooperation excessive delays but also high packet loss rate. Among routing algorithms extensively employed in the same AS routers, one can note: distance vector algorithm such as RIP and the link state algorithm such as OSPF or IS-IS [12]. 150 Abdelhamid Mellouk et al. / VNU Journal of Science, Natural Sciences and Technology 24 (2008) 147-161 3. Inductive approaches Modern communication networks is becoming a large complex distributed system composed by higher interoperating complex sub-systems based on several dynamic parameters. The drivers of this growth have included changes in technology and changes in regulation. In this context, the famous methodology approach that allows us to formulate this problem is dynamic programming which, however, is very complex to be solved exactly. The most popular formulation of the optimal distributed routing problem in a data network is based on a multi-commodity flow optimization whereby a separable objective function is minimized with respect to the types of flow subject to multi- commodity flow constraints [13], [14]. In order sufficiently explored, RL algorithms converge to solve stochastic shortest path routing problems. Finally, algorithms for RL are distributed algorithms that take into account the dynamics of the network where initially no model of the network dynamics is assumed to be given. Then, the RL algorithm has to sample, estimate and build the model of pertinent aspects of the environment. Many of works has done to investigate the use of inductive approaches based on artificial neuronal intelligence together with biologically inspired techniques such as reinforcement learning and genetic algorithms, to control network behavior in real-time so as to provide users with the QoS that they request, and to improve network provide robustness and resilience [16-18]. to design adaptive algorithms for dynamic networks routing problems, many of works are largely oriented and based on the Reinforcement Learning (RL) notion [15]. The salient feature of RL algorithms is the nature of their routing table entries which are probabilistic. In such algorithms, to improve the routing decision quality, a router tries out different links to see if they produce good routes. This mode of operation is called exploration. Information learnt during this exploration phase is used to take future decisions. This mode of operation is called exploitation. Both exploration and exploitation phases are necessary for effective routing and the choice of the outgoing interface is the action taken by the router. In RL algorithms, those learning and evaluation modes are assumed to happen continually. Note that, the RL algorithms assigns credit to actions based on reinforcement from the environment. In the case where such credit assignment is conducted systematically over large number of routing 4. KOCRA system based reinforcement learning in routing wired networks Our system, called “K Optimal Constrained path Routing Algorithm (KOCRA)”, contains three stages. The objective of the first stage is to select the K Best candidate paths according to the cost cumulative path from the source and the destination nodes (for simplicity, we consider here all link costs equal to 1). The second stage is used to integrate the dynamics of traffic. For this, a continuous end-to-end delay among the K Best selected Paths is computed using a reinforcement Q- learning function. In order to force the router to take the alternative routes regarding to the second stage, we used a third one which compute automatically a probability affected to each path based on packet delivery time obtained by the second stage and the time latency in queuing file associated for each path. decisions, so that all actions have been Abdelhamid Mellouk et al. / VNU Journal of Science, Natural Sciences and Technology 24 (2008) 147-161 151 4.1. First stage: constructing K-best paths First of all, in spite of exploring the entire network environment which needs large computational time and space memory, our approach reduces this environment to K best no loop paths in terms of cost cumulative links. Thus, each router maintains a link state database as map of the network topology. We used a label setting algorithm based on the optimality principle and being a generalization of Dijkstra`s algorithm [6]. In order to find these K best paths, a variant of Dijkstra`s algorithm proposed in [19] was used. By using a pertinent data structure, the space complexity is O(Kmn), where K is the number of paths, m (resp. n) is the number of edges (resp. the number of links). The time complexity can be kept at O(knlog(kn)+k2mn) [27]. When a network link changes its state (i.e., goes up or down, or its utilization is increased or decreased), the network is flooded with a link state advertisement (LSA) message. This message can be issued periodically or when the actual link state change exceeds a certain relative or absolute threshold. Obviously, there is tradeoff between the frequency of state updates (the accuracy of the link state database) and the cost of performing those updates. In our approach, the link state information is updated when the actual link state change. Once the link state database at each router is updated, the router computes the K optimal paths. time to transfer a packet to its destination. This value is computed by a variant of Q-Routing algorithm which is considered as an asynchronous relaxation of the Bellman-Ford algorithm used in distance vector protocols. Typically, the packet delivery time includes three variables: the packet transmission time, the packet treatment time in the router and the latency in the waiting queue. In our case, the packet transmission time is not taken into account. In fact, this parameter can be neglected in comparison to the other ones and has no effect on the routing process. In this approach, each router x maintains in a Q-table a collection of values of Q(x, y, d) for every destination d and for every interface y. This value reflects a delay of delivering a packet for destination d via interface s. Then, the router x forwards the packet to the best next router y determined from the Q-table. Just after receiving this packet, the router y provides x an estimate of its best Q value to reach the destination. This new information is then added in the Q- values of the router x. The reinforcement signal T employed in the Q-learning algorithm can be defined as the minimum of the sum of the estimated Q (x, y, d) sent by the router y neighbor of router x and the latency in waiting queue qx corresponding to router x. T min q Q(x,y,d) (1) yneighborof x Where Q(x, y, d), denote the estimated time 4.2. Second stage: Q-learning lgorithm for optimizing the end-to-end delay by the router x so that the packet p reaches its destination d through the router y. This parameter does not include the latency in the After finding our K best Optimal Paths based on link costs, the second step is to distribute the traffic on these K candidate paths. For this, we use another criteria based on the end-to-end delay. The reinforcement signal which is chosen corresponds to the estimated waiting queue of the router x. The packet is sent to the router y which determines the optimal path to send this packet. Once the choice of the next router is made, the router y puts the packet in the waiting queue, and sends back the value T as a ... - tailieumienphi.vn
nguon tai.lieu . vn