中文版 | English
题名

基于深度学习的中文口语理解研究

其他题名
RESEARCH ON CHINESE SPOKEN LANGUAGE UNDERSTANDING BASED ON DEEP LEARNING
姓名
姓名拼音
XU Na
学号
12132368
学位类型
硕士
学位专业
电子科学与技术
学科门类/专业学位类别
08 工学
导师
杨双华
导师单位
计算机科学与工程系
论文答辩日期
2024-05-12
论文提交日期
2024-06-24
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

随着深度学习技术的飞速发展,人机对话成为了一种更便捷、更受欢迎的人机交互方式。作为人机对话系统的关键环节,口语理解任务兼具学术与应用价值,主要包含两个子任务:意图识别和槽位填充。其中意图识别通常被定义为分类的任务,而槽位填充则被视为序列标注的任务。

当前的口语理解研究主要面向英文语料。然而,中文口语理解面临着中文特有的词汇边界不清晰的问题,因此更为困难。之前的工作要么缺少词汇信息,要么受到分词错误的困扰。为了避免分词错误带来的不利影响,同时有效融合词汇信息,本文提出了一个面向中文口语理解的词汇和局部特征感知网络(Word and Local Feature Aware Network, WLAN)。该模型在避免分词的同时整合词汇信息和局部特征来充分挖掘话语的语义信息。此外,据我们所知,我们首次利用与预测意图可能同时出现的潜在槽位作为意图对槽位的具体指导。在两个被广泛使用的中文口语理解数据集上的实验结果表明,我们的模型取得了优异的性能表现。

受限于任务范式,以往的口语理解研究主要面向简单的话语场景,难以应对如多意图、槽位不连续、槽位有重叠或嵌套等复杂话语场景。为此,本文对口语理解任务进行了范式迁移,放弃了传统的灵活性较差的序列标注范式,转而采用序列生成的方式直接生成槽位序列。具体地,我们重新构建了口语理解任务,定义了适用于不同复杂话语场景的目标序列形式,并提出了一种新颖的基于指针机制的生成式口语理解模型(Generative Spoken Language Understanding, GSLU)。此外,为实现意图对槽位的显式指导,我们将意图状态传递到目标状态中,以指导槽位序列的生成。在两个被广泛使用的简单中文口语理解数据集上的实验结果表明,GSLU的性能优于基于传统范式的基线模型,证明了范式迁移的有效性。同时本文说明了如何为复杂话语构建目标序列,以将GSLU应用于更复杂的话语场景,为复杂口语理解任务提供了新的思路和可能。

关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2024-06
参考文献列表

[1]  EDLUND J, GUSTAFSON J, HELDNER M, et al. Towards human-like spoken dialogue sys- tems[J]. Speech communication, 2008, 50(8-9): 630-645.
[2]  LIN T E, WU Y, HUANG F, et al. Duplex conversation: Towards human-like interaction in spoken dialogue systems[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining. 2022: 3299-3308.
[3]  HEMPHILL C T, GODFREY J J, DODDINGTON G R. The ATIS spoken language systems pilot corpus[C]//Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990. 1990.
[4]  COUCKE A, SAADE A, BALL A, et al. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces[A]. 2018.
[5]  TENG D, QIN L, CHE W, et al. Injecting word information with multi-level word adapter for Chinese spoken language understanding[C]//ICASSP 2021-2021 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 8188-8192.
[6]  GORIN A L, RICCARDI G, WRIGHT J H. How may I help you?[J]. Speech communication, 1997, 23(1-2): 113-127.
[7]  WARD K, NOVICK D G. On the need for a theory of integration of knowledge sources for spoken language understanding[C]//Proceedings of the AAAI-94 Workshop on the Integration of Natural Language and Speech Processing. 1994: 23-30.
[8]  DOWDING J, GAWRON J M, APPELT D, et al. Gemini: a natural language system for spoken- language understanding[C]//Proceedings of the 31st annual meeting on Association for Com- putational Linguistics. 1993: 54-61.
[9]  SENEFF S. TINA: A natural language system for spoken language applications[J]. Computa- tional linguistics, 1992, 18(1): 61-86.
[10]  HAFFNER P, TUR G, WRIGHT J H. Optimizing SVMs for complex call classification[C]// 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Pro- ceedings.(ICASSP’03).: Vol. 1. IEEE, 2003: I-I.
[11]  MCCALLUM A, NIGAM K, et al. A comparison of event models for naive bayes text clas- sification[C]//AAAI-98 workshop on learning for text categorization: Vol. 752. Madison, WI, 1998: 41-48.
[12]  SCHAPIRE R E, SINGER Y. BoosTexter: A boosting-based system for text categorization[J]. Machine learning, 2000, 39: 135-168.
[13]  RAYMOND C, RICCARDI G. Generative and discriminative algorithms for spoken language understanding[C]//Interspeech 2007-8th Annual Conference of the International Speech Com- munication Association. 2007.
[14]  LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recog- nition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[15]  ELMAN J L. Finding structure in time[J]. Cognitive science, 1990, 14(2): 179-211.
[16]  BAHDANAU D, CHO K H, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations, ICLR 2015. 2015.
[17]  VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[18]  XU P, SARIKAYA R. Convolutional neural network based triangular crf for joint intent detec- tion and slot filling[C]//2013 ieee workshop on automatic speech recognition and understanding. IEEE, 2013: 78-83.
[19]  RAVURI S V, STOLCKE A. Recurrent neural network and LSTM models for lexical utterance classification.[C]//Interspeech. 2015: 135-139.
[20]  YAO K, PENG B, ZHANG Y, et al. Spoken language understanding using long short- term memory neural networks[C]//2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014: 189-194.
[21]  ZHANG X, WANG H. A joint model of intent determination and slot filling for spoken language understanding.[C]//IJCAI: Vol. 16. 2016: 2993-2999.
[22]  LIU B, LANE I. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling[J]. Interspeech 2016, 2016.
[23]  HAKKANI-TÜR D, TÜR G, CELIKYILMAZ A, et al. Multi-domain joint semantic frame parsing using bi-directional rnn-lstm.[C]//Interspeech. 2016: 715-719.
[24]  CHEN Q, ZHUO Z, WANG W. Bert for joint intent classification and slot filling[A]. 2019.
[25]  GOO C W, GAO G, HSU Y K, et al. Slot-gated modeling for joint slot filling and intent predic- tion[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018: 753-757.
[26]  QIN L, CHE W, LI Y, et al. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding[C]//EMNLP-IJCNLP. 2019: 2078-2087.
[27]  HAIHONG E, NIU P, CHEN Z, et al. A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 5467-5471.
[28]  WANG Y, SHEN Y, JIN H. A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling[C]//Proceedings of the 2018 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018: 309-314.
[29]  QIN L, LIU T, CHE W, et al. A co-interactive transformer for joint slot filling and intent detec- tion[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 8193-8197.
[30]  LI X, MENG Y, SUN X, et al. Is Word Segmentation Necessary for Deep Learning of Chinese Representations?[C]//Proceedings of the 57th Annual Meeting of the Association for Compu- tational Linguistics. 2019: 3242-3252.
[31]  LIU Y, MENG F, ZHANG J, et al. CM-Net: A Novel Collaborative Memory Network for Spo- ken Language Understanding[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 1051-1060.
[32]  ZHANG W, JIANG L, ZHANG S, et al. A Bert Based Joint Learning Model with Feature Gated Mechanism for Spoken Language Understanding[C]//ICASSP. IEEE, 2022: 7512-7516.
[33]  ZHU Z, HUANG P, HUANG H, et al. A graph attention interactive refine framework with contextual regularization for jointing intent detection and slot filling[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 7617-7621.
[34]  QIN L, XU X, CHE W, et al. AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling[C]//Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 1807-1816.
[35]  QIN L, WEI F, XIE T, et al. GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing (Volume 1: Long Papers). 2021: 178-188.
[36]  SONG M, YU B, QUANGANG L, et al. Enhancing joint multiple intent detection and slot fill- ing with global intent-slot co-occurrence[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022: 7967-7977.
[37]  JIANG S, ZHU S, CAO R, et al. SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling[C]//Proceedings of the 61st Annual Meeting of the Association for Computa- tional Linguistics (Volume 5: Industry Track). 2023: 668-675.
[38]  RONGALI S, SOLDAINI L, MONTI E, et al. Don’t parse, generate! a sequence to sequence architecture for task-oriented semantic parsing[C]//Proceedings of the web conference 2020. 2020: 2962-2968.
[39]  HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[40]  GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural networks, 2005, 18(5-6): 602-610.
[41]  LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[42]  KIM Y. Convolutional Neural Networks for Sentence Classification[C/OL]//MOSCHITTI A, PANG B, DAELEMANS W. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguis- tics, 2014: 1746-1751. https://aclanthology.org/D14-1181. DOI: 10.3115/v1/D14-1181.
[43]  ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification [J]. Advances in neural information processing systems, 2015, 28.
[44]  CONNEAU A, SCHWENK H, LE CUN Y, et al. Very deep convolutional networks for text classification[C]//15th Conference of the European Chapter of the Association for Computa- tional Linguistics, EACL 2017. Association for Computational Linguistics (ACL), 2017: 1107- 1116.
[45]  SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks [J]. Advances in neural information processing systems, 2014, 27.
[46]  VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in neural informa- tion processing systems, 2015, 28.
[47]  SEE A, LIU P J, MANNING C D. Get To The Point: Summarization with Pointer-Generator Networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2017.
[48]  DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[A]. 2018.
[49]  RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[Z]. 2018.
[50]  KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[J]. Advances in neural information processing systems, 2020, 33: 18661-18673.
[51]  GAO T, YAO X, CHEN D. SimCSE: Simple Contrastive Learning of Sentence Embeddings [C]//2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021. Association for Computational Linguistics (ACL), 2021: 6894-6910.
[52]  车万翔. 自然语言处理: 基于预训练模型的方法[M]. 电子工业出版社, 2021.
[53]  RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a 
unified text-to-text transformer[J]. Journal of machine learning research, 2020, 21(140): 1-67.
[54]  LEWIS M, LIU Y, GOYAL N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computa- tional Linguistics, 2020.
[55]  ZHANG Y, YANG J. Chinese NER Using Lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, 2018.
[56]  LI X, YAN H, QIU X, et al. FLAT: Chinese NER Using Flat-Lattice Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6836-6842.
[57]  GUI T, MA R, ZHANG Q, et al. CNN-Based Chinese NER with Lexicon Rethinking.[C]//ijcai: Vol. 2019. 2019.
[58]  MA R, PENG M, ZHANG Q, et al. Simplify the Usage of Lexicon in Chinese NER[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 5951-5960.
[59]  何晗. 自然语言处理入门[M]. 人民邮电出版社, 2019.
[60]  HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[A]. 2015.
[61]  SUN J. Jieba chinese word segmentation tool[Z]. 2012.
[62]  CHE W, FENG Y, QIN L, et al. N-ltp: A open-source neural chinese language technology platform with pretrained models: Vol. 50[A]. 2020.
[63]  HE H. HanLP: Han language processing[J]. URL: https://github.com/hankcs/HanLP, 2014.
[64]  WANG S, HUANG M, DENG Z, et al. Densely connected CNN with multi-scale feature atten- tion for text classification.[C]//IJCAI: Vol. 18. 2018: 4468-4474.
[65]  COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(ARTICLE): 2493-2537.
[66]  CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks[C]//CVPR. 2019: 5177-5186.
[67]  LAFFERTY J D, MCCALLUM A, PEREIRA F C. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the Eighteenth Inter- national Conference on Machine Learning. 2001: 282-289.
[68]  KINGMA D P, BA J. Adam: A method for stochastic optimization[A]. 2014.
[69]  MESNIL G, HE X, DENG L, et al. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding.[C]//Interspeech. 2013: 3771-3775.
[70]  QIN L, XIE T, CHE W, et al. A Survey on Spoken Language Understanding: Recent Ad- vances and New Frontiers[C/OL]//ZHOU Z H. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conferences on Artificial Intelligence Organization, 2021: 4577-4584. https://doi.org/10.24963/ijcai.2021/622.
[71]  LOSHCHILOV I, HUTTER F. Fixing weight decay regularization in adam[Z]. 2018.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP391.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/779113
专题工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
徐娜. 基于深度学习的中文口语理解研究[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132368-徐娜-计算机科学与工程(2294KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[徐娜]的文章
百度学术
百度学术中相似的文章
[徐娜]的文章
必应学术
必应学术中相似的文章
[徐娜]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。