Xem mẫu

Stochastic Discourse Modeling in Spoken Dialogue Systems Using Semantic Dependency Graphs Jui-Feng Yeh, Chung-Hsien Wu and Mao-Zhu Yang Department of Computer Science and Information Engineering National Cheng Kung University No. 1, Ta-Hsueh Road, Tainan, Taiwan, R.O.C. {jfyeh, chwu, mzyang}@csie.ncku.edu.tw Abstract This investigation proposes an approach to modeling the discourse of spoken dia-logue using semantic dependency graphs. By characterizing the discourse as a se-quence of speech acts, discourse modeling becomes the identification of the speech act sequence. A statistical approach is adopted to model the relations between words in the user’s utterance using the semantic dependency graphs. Dependency relation between the headword and other words in a sentence is detected using the semantic dependency grammar. In order to evaluate the proposed method, a dia-logue system for medical service is devel-oped. Experimental results show that the rates for speech act detection and task-completion are 95.6% and 85.24%, re-spectively, and the average number of turns of each dialogue is 8.3. Compared with the Bayes’ classifier and the Partial-Pattern Tree based approaches, we obtain 14.9% and 12.47% improvements in ac-curacy for speech act identification, re-spectively. 1 Introduction It is a very tremendous vision of the computer technology to communicate with the machine us-ing spoken language (Huang et al., 2001; Allen at al., 2001). Understanding of spontaneous language is arguably the core technology of the spoken dia-logue systems, since the more accurate information obtained by the machine (Higashinaka et al., 2004), the more possibility to finish the dialogue task. Practical use of speech act theories in spoken lan-guage processing (Stolcke et al. 2000; Walker and Passonneau 2001; Wu et al., 2004) have given both insight and deeper understanding of verbal com-munication. Therefore, when considering the whole discourse, the relationship between the speech acts of the dialogue turns becomes ex-tremely important. In the last decade, several prac-ticable dialogue systems (McTEAR, 2002), such as air travel information service system, weather forecast system, automatic banking system, auto-matic train timetable information system, and the Circuit-Fix-it shop system, have been developed to extract the user’s semantic entities using the se-mantic frames/slots and conceptual graphs. The dialogue management in these systems is able to handle the dialogue flow efficaciously. However, it is not applicable to the more complex applications such as “Type 5: the natural language conversa-tional applications” defined by IBM (Rajesh and Linda, 2004). In Type 5 dialog systems, it is possi-ble for the users to switch directly from one ongo-ing task to another. In the traditional approaches, the absence of precise speech act identification without discourse analysis will result in the failure in task switching. The capability for identifying the speech act and extracting the semantic objects by reasoning plays a more important role for the dia-log systems. This research proposes a semantic dependency-based discourse model to capture and share the semantic objects among tasks that switch during a dialog for semantic resolution. Besides 937 Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 937–944, Sydney, July 2006. 2006 Association for Computational Linguistics acoustic speech recognition, natural language un-derstanding is one of the most important research issues, since understanding and application restric-tion on the small scope is related to the data struc-tures that are used to capture and store the meaningful items. Wang et al. (Wang et al., 2003) applied the object-oriented concept to provide a new semantic representation including semantic class and the learning algorithm for the combina-tion of context free grammar and N-gram. Among these approaches, there are two essential issues about dialogue management in natural lan-guage processing. The first one is how to obtain the semantic object from the user’s utterances. The second is a more effective speech act identification approach for semantic understanding is needed. Since speech act plays an important role in the de-velopment of dialogue management for dealing with complex applications, speech act identifica-tion with semantic interpretation will be the most important topic with respect to the methods used to control the dialogue with the users. This paper proposes an approach integrating semantic de-pendency graph and history/discourse information to model the dialogue discourse (Kudo and Ma-tsumoto, 2000; Hacioglu et al., 2003; Gao and Su-zuki, 2003). Three major components, such as semantic relation, semantic class and semantic role are adopted in the semantic dependency graph (Gildea and Jurasfky, 2002; Hacioglu and Ward, 2003). The semantic relations constrain the word sense and provide the method for disambiguation. Semantic roles are assigned when the relation es-tablished among semantic objects. Both semantic relations and roles are defined in many knowledge resources or ontologies, such as FrameNet (Baker et al., 2004) and HowNet with 65,000 concepts in Chinese and close to 75,000 English equivalents, is a bilingual knowledge-base describing relations between concepts and relations between the attrib-utes of concepts with ontological view (Dong and Dong 2006). Generally speaking, semantic class is defined as a set with the elements that are usually the words with the same semantic interpretation. Hypernyms that are superordinate concepts of the words are usually used as the semantic classes just like the Hypernyms of synsets in WordNet (http://www.cogsci.princeton.edu/~wn/) or defini-tions of words’ primary features in HowNet. Be-sides, the approach for understanding tries to find the implicit semantic dependency between the con- cepts and the dependency structure between con-cepts in the utterance are also taken into consideration. Instead of semantic frame/slot, se-mantic dependency graph can keep more informa-tion for dialogue understanding. 2 Semantic Dependency Graph Since speech act theory is developed to extract the functional meaning of an utterance in the dialogue (Searle, 1979), discourse or history can be defined as a sequence of speech acts, Ht ={SA ,SA2,....SAt−1,SAt} , and accordingly the speech act theory can be adopted for discourse modeling. Based on this definition, the discourse analysis in semantics using the dependency graphs tries to identify the speech act sequence of the dis-course. Therefore, discourse modeling by means of speech act identification considering the history is shown in Equation (1). By introducing the hidden variable Di, representing the i-th possible depend-ency graph derived from the word sequence W. The dependency relation, rk , between word wk and headword wkh is extracted using HowNet and de-noted as DR(wk ,wkh ) ≡ r . The dependency graph which is composed of a set of dependency relations in the word sequence W is defined as D (W) ={DRi (w ,wh ),DR2 (w2,w2h ),...,DRm−1(wm−1,wm−1)h )} . The probability of hypothesis SAt given word se-quence W and history Ht-1 can be described in Equation (1). According to the Bayes’ rule, the speech act identification model can be decomposed into two components, P(SAt | D ,W,Ht−1 ) and P(D |W,Ht−1 ), described in the following. SA* = argmaxP(SAt |W,Ht−1 ) SA = argmax P(SAt ,D |W,Ht−1 ) (1) SA D = argmax P(SAt | D ,W,Ht−1 )×P(D |W,Ht−1 ) SA i where SA* and SAt are the most probable speech act and the potential speech act at the t-th dialogue turn, respectively. W={w1,w2,w3,…,wm} denotes the word sequence extracted from the user’s utteance without considering the stop words. Ht-1 is the his-tory representing the previous t-1 turns. 938 2.1 Speech act identification using semantic dependency with discourse analysis In this analysis, we apply the semantic dependency, word sequence, and discourse analysis to the iden-tification of speech act. Since Di is the i-th possible dependency graph derived from word sequence W, speech act identification with semantic dependency can be simplified as Equation (2). P(SAt | D ,W,Ht−1 )≅ P(SAt | D ,Ht−1 ) (2) According to Bayes’ rule, the probability P SAt | D ,Ht−1 can be rewritten as: P(SAt | D ,Ht−1 )= P(D ,Ht−1 | SAt )P(SAt ) (3) P D ,Ht−1 | SA P SA SA As the history is defined as the speech act se-quence, the joint probability of Di and Ht-1 given the speech act SAt can be expressed as Equation (4). For the problem of data sparseness in the training corpus, the probability, P D ,SA1,SA2,...,SAt−1 | SAt , is hard to obtain and the speech act bi-gram model is adopted for ap-proximation. P(D ,Ht−1 | SAt ) = P(D ,SA ,SA2,...,SAt−1 | SAt ) (4) ≅ P(D ,SAt−1 | SAt ) For the combination of the semantic and syntactic structures, the relations defined in HowNet are employed as the dependency relations, and the hy-pernym is adopted as the semantic concept accord-ing to the primary features of the words defined in HowNet. The headwords are decided by the algo-rithm based on the part of speech (POS) proposed by Academia Sinica in Taiwan. The probabilities of the headwords are estimated according to the probabilistic context free grammar (PCFG) trained on the Treebank developed by Sinica (Chen et al., 2001). That is to say, the headwords are extracted according to the syntactic structure and the de-pendency graphs are constructed by the semantic relations defined in HowNet. According to previ-ous definition with independent assumption and the bigram smoothing of the speech act model us-ing the back-off procedure, we can rewrite Equa-tion (4) into Equation (5). P(D ,SAt−1 |SAt ) m−1 =α P(DRk (wk ,wkh ),SAt−1 | SAt )+ (5) k=1 m−1 (1−α) P(DRk (wk ,wkh )| SAt ) k=1 where α is the mixture factor for normalization. According to the conceptual representation of the word, the transformation function, f (⋅) , trans- forms the word into its hypernym defined as the semantic class using HowNet. The dependency relation between the semantic classes of two words will be mapped to the conceptual space. Also the semantic roles among the dependency relations are obtained. On condition thatSAt , SAt−1 and the re-lations are independent, the equation becomes P(DRi (wk ,wkh ),SAt−1 | SAt ) ≅ P(DRk ( f (wk ), f (wkh )),SAt−1 | SAt ) (6) = P(DRi ( f (wk ), f (wkh ))| SAt )P(SAt−1 | SAt ) The conditional probability, P(DRi ( f (wk ), f (wkh ))| SAt ) and P(SAt−1 | SAt ) , are estimated according to Equations (7) and (8), re- spectively. P(DRk ( f (wk ), f (wkh ))| SAt ) (7) C( f (wk ), f (wkh ),r ,SAt ) C(SAt ) P(SAt−1 | SAt ) = C(SAS1,SAt ) (8) where C(⋅) represents the number of events in the training corpus. According to the definitions in Equations (7) and (8), Equation (6) becomes prac-ticable. 939 2.2 Semantic dependency analysis using word sequence and discourse Although the discourse can be expressed as the m−1 P(D ,W) = P(DRk (wk ,wkh )) (12) k=1 speech act sequence Ht ={SA ,SA2,....SAt−1,SAt} , the dependency graph i is determined mainly by W, but not Ht−1 . The probability that defines se-mantic dependency analysis using the words se-quence and discourse can be rewritten in the following: P(D |W,Ht−1 ) The probability of the dependency relation be-tween words is defined as that between the con-cepts defined as the hypernyms of the words, and then the dependency rules are introduced. The probability P(r | f (wk ), f (wkh )) is estimated from Equation (13). P(DRk (wk ,wkh )) = P(D |W,SAt−1,SAt−2,...,SA ) ≅ P(D |W) and P(D |W) = P(D ,W) ` (9) ≡ P(DRk ( f (wk ), f (wkh ))) (13) = P(r | f (wk ), f (wkh )) C(r , f (wk ), f (wkh )) C( f (wk ), f (wkh )) (10) According to Equations (11), (12) and (13), Equa-tion (10) is rewritten as the following equation. Seeing that several dependency graphs can be gen-erated from the word sequence W, by introducing the hidden factor Di, the probability P(W) can be the sum of the probabilities P(D ,W)as Equation (11). P(W) = ∑ P(D ,W) (11) D : yield(D )=W Because Di is generated from W, Di is the suffi-cient to represent W in semantics. We can estimate the joint probability P(D ,W) only from the de- pendency relations Di. Further, the dependency relations are assumed to be independent with each other and therefore simplified as m−1 P(DRk (wk ,wkh )) P(D |W) = k=1 m−1 P(DRk (wk ,wkh )) D : yield(D )=W k=1 (14) m−1 C(r , f (wk ), f (wkh )) k=1 C( f (wk ), f (wkh )) m−1 C(r , f (wk ), f (wkh )) D : yield(D )=W k=1 C( f (wk ), f (wkh )) where function, f (⋅) , denotes the transformation from the words to the corresponding semantic classes. Figure 1. Speech acts corresponding to multiple services in the medical domain 940 3 Experiments In order to evaluate the proposed method, a spoken dialogue system for medical domain with multiple services was investigated. Three main services: registration information service, clinic information service, and FAQ information service are used. This system mainly provides the function of on-line registration. For this goal, the health education documents are provided as the FAQ files. And the inference engine about the clinic information ac-cording to the patients’ syndromes is constructed according to a medical encyclopedia. An example is illustrated as figure 2: System:午安。需要服務嗎? Good Afternoon. May I help you? User:是的,我要看醫生。 Yes, I should be diagnosed by doctor. System:請問您要掛哪一科? Which department would you like to make an appoint-ment for? User:我不知道。 I do not know. System:請問你有那些症狀? What symptoms are with you? User:我喉嚨痛又發燒。而且全身酸痛。 I am with a sore throat and a fever. I ached all over, too. System:好的,這些症狀多久了? Well, how long have you had these symptoms? User:好幾天了。 For several days. System:建議您掛內科醫師。 My suggestion is that you may visit the Internal Medi-cine. User:請問內科何時有門診? What’s the visiting schedule of the Internal Medicine? System:您可以參考醫院門診表,決定您的時間,我們的時 間有…。 You can check the Services Schedule and decide a con-venient time for you. The Available time for now is.... Figure 2 An example of dialog 12 Speech acts are defined and shown in Figure 1. Every service corresponds to the 12 speech acts with different probabilities. The acoustic speech recognition engine embed-ded in dialog system based on Hidden Markov Models (HMMs) was constructed. The feature vec-tor is parameterized on 26 MFCC coefficients. The decoding strategy is based on a classical Viterbi algorithm. The evaluation results by the character error rate (CER) for a Chinese speech recognition system is 18.3 percent and the vocabulary size of the language is 25,132. 3.1 Analysis of corpus The training corpus was collected using the on-line recording from National Cheng Kung University Hospital in the first phase and the Wizard-of-Oz method in the second phase. Totally, there are 1,862 dialogues with 13,986 sentences in the cor-pus. The frequencies of the speech acts used in the system are shown in Figure 3. Cancel registration 2.75 Confirmation (clinic) 4.35 Confirmatin (others) 4.70 Dr. and Clinic 9.76 FAQ 10.71 Registration 11.56 Clinic information 13.46 Greeting 12.81 Time 13.96 Dr.`s inforamtion 9.11 Registration revision 2.70 Others 4.10 0 2 4 6 8 10 12 14 Figure 3 Frequencies for each speech act The number of dialogue turns is also important to the success of the dialogue task. According to the observation of the corpus, we can find that the dia-logues with more than 15 turns usually failed to complete the dialogue, that is to say, the common ground cannot be achieved. These failed dialogues were filtered out from the training corpus before conducting the following experiments. The distri-bution of the number of turns per dialogue is shown in Figure 4. 350 300 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Length (Turns) Figure 4. The distribution of the number of turns per dialogue 3.2 Precision of speech act identification re-lated to the corpus size 941 ... - tailieumienphi.vn
nguon tai.lieu . vn