Xem mẫu

INTRODUCTION TO COMPUTER SCIENCE HANDOUT #8. REGULAR EXPRESSIONS K5 & K6, Computer Science Department, Vaên Lang University Second semester -- Feb, 2002 Instructor: Traàn Ñöùc Quang Major themes: 1. Introduction 2. Algebraic Laws for Regular Expressions Reading: Sections 10.5 and 10.7. 8.1 INTRODUCTION In the previous handout, we have studied a finite automaton which is, in a sense, a machine recognizing character strings with some patterns. In this handout, we meet regular expressions which is an algebraic way to describe a string pattern. Regular expressions are analogous to the algebra of arithmetic expressions with which we are all familiar, and the relational algebra that we met in Handout #3. Interestingly, the set of patterns that can be expressed in the regular-expression algebra is exactly the same set of patterns that can be described by automata. For example, the regular expression a | bc* can express the patterns described by the fol- lowing automaton. a B start A b D c 46 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #8. REGULAR EXPRESSIONS A set of strings denoted by a regular expression E is called a regular language and can be referred to as L(E). Given two regular languages L and M, we can construct new languages by applying the operators below one or more times to these languages: 1. The union of two languages L and M, denoted L È M, is the set of strings that are in either L or M. For example, if L = {00, 01, 11} and M = {aa, ab}, the union L È M = {00, 01, 11, aa, ab}. 2. The concatenation of languages L and M, denoted LM, is the set of strings that can be formed by taking any string in L and concatenating it with any string in M. For the above example, we have LM = {00aa, 01aa, 11aa, 00ab, 01ab, 11ab}. 3. The closure (star, or Kleene closure) of a language L is denoted L* and represents the set of strings that can be formed by taking any number of strings from L and concatenating all of them. Algebra of regular expressions, like all kinds of algebras, starts with some elementary expressions, usually constants and/or variables denoting languages. We then construct more expressions by applying the set of three operators (union, concatenation, star) to these elemetary expressions and to previously constructed expressions. In particular: 1. Empty string is denoted by the regular expression e. 2. A symbol a is denoted by the regular expression a. 3. Suppose R and S are two regular expressions denoting the languages L(R) and L(S), respectively. a) R +S (or R | S) is a regular expression for the union of the languages L(R) and L(S). b) RS is a regular expression for the concatenation of the languages L(R) and L(S). c) R* is a regular expression for the Kleen closure of the language L(R). For a simple example, suppose letter is any character in the alphabet, digit is any decimal digits, and under is the symbol _. We can define a regular expression as a pattern of legal identifiers in C as follows: identifiers = (letter|under)(letter|digit|under)* As in arithmetic expressions, we can use parentheses to group regular expressions. 8.2 ALGEBRAIC LAWS FOR REGULAR EXPRESSIONS It is possible for two regular expressions to denote the same language, just as two arithmetic expressions can denote the same function of their operands. As an example, the arithmetic expressions x + y and y + x each denote the same function of x and y, 8.3 GLOSSARY 47 because addition is commutative. So are the regular expressions R + S and S + R. We now list some common algebraic laws for regular expressions with no proofs. More useful laws can be found in the textbook. 1. Commutativity of union. (R | S) º (S | R) 2. Associativity of union. ((R | S) | T) º (R | (S | T)) 3. Associativity of concatenation. ((RS)T) º (R(ST)) 4. Left distributivity of concatenation over union. (R(S | T)) º (RS | RT) 5. Right distributivity of concatenation over union. ((S | T)R) º (SR | TR) 6. Idempotence of union. (R | R) º R 7. RR* º R*R 8.3 GLOSSARY Regular Expression: Bieåu thöùc chính quy. Arithmetic Expression: Bieåu thöùc soá hoïc. Algebra: Ñaïi soá. algebra of regular expressions: ñaïi soá bieåu thöùc chính quy. algebra of arithmetic expressions: ñaïi soá bieåu thöùc soá hoïc. set algebra: ñaïi soá taäp hôïp. relational algebra: ñaïi soá quan heä. Algebraic law: Luaät ñaïi soá (tính chaát ñaïi soá). Commutative law, commutativity: Luaät giao hoaùn, tính giao hoaùn. Associative law, associativity: Luaät keát hôïp, tính keát hôïp. Distributive law, distributivity: Luaät phaân phoái, tính phaân phoái. Idempotence: Tính luõy ñaúng. Regular language, regular set: Ngoân ngöõ chính quy, taäp chính quy. Union: Pheùp hôïp. Concatenation: Pheùp gheùp noái. Star, Kleene closure: Pheùp toaùn sao, pheùp laáy bao ñoùng Kleene. ... - tailieumienphi.vn
nguon tai.lieu . vn