南方科技大学知识苑(SUSTech KC): 多任务学习的损失函数平衡策略与模型设计

题名	多任务学习的损失函数平衡策略与模型设计
其他题名	LOSS FUNCTION BALANCING STRATEGY AND MODEL DESIGN FOR MULTI-TASK LEARNING
姓名	梁思聪
姓名拼音	LIANG Sicong
学号	11930663
学位类型	硕士
学位专业	080900 电子科学与技术
学科门类/专业学位类别	08 工学
导师	张宇
导师单位	计算机科学与工程系
论文答辩日期	2022-05-08
论文提交日期	2022-06-11
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	近些年来，随着大数据和计算机算力的发展，机器学习技术已经被广泛的应用在了许多任务中。传统上，许多机器学习问题都是孤立的，即为每个任务单独训练一个模型。然而，许多现实世界中的问题本质上是多任务的。例如：自动驾驶汽车应该能够同时识别车道标记，控制车速转向等，以保证车辆能在复杂的交通环境中安全的行驶。与此同时，人类非常擅长同时解决多个任务，而非分离任务并孤立地处理它们。由此启发，研究者们提出了多任务学习（Multi-Task Learning，MTL）。多任务学习是机器学习中的一种学习范式，其目的是利用多个相关任务中包含的有用信息来提高模型在所有任务上的泛化性能。进入到深度学习时代之后，多任务学习的挑战主要集中在两方面：如何设计一种能够更好的平衡多任务学习优化过程的损失函数平衡策略，以及如何设计一个能够在多个任务上都表现优异的模型。针对多任务学习的损失函数平衡策略，本文提出了一种基于转换函数的多任务学习损失函数平衡策略。在对已有的相关工作进行分析，并指出他们的局限性后，基于训练损失较大的任务在多任务学习的优化过程中应当更受关注这一直观想法，本文提出使用转换函数对训练损失函数进行转换，从而平衡多个任务。通过实验验证，该方法可以很好地与其他多任务学习模型相结合，并在同构多任务学习和异构多任务学习场景下都具有良好的性能。针对多任务学习的模型设计，本文提出了一种基于个性化注意力机制的联邦多任务学习模型。由注意力机制被应用到多任务学习模型设计中的成功经验启发，本文提出将注意力机制推广应用到与多任务学习十分相似的个性化联邦学习领域。通过实验验证，该方法可以在不增加额外的通信开销的前提下，很好地提升现有的联邦学习算法在非独立同分布的数据上的性能。
其他摘要	In recent years, with the development of big data and computer computing power, machine learning technology has been widely used in many tasks. Traditionally, many machine learning problems are isolated, i.e. a model is trained separately for each task. However, many real-world problems are inherently multitasking. For example, a vehicle autonomous driving system should be able to perform multiple tasks at the same time, such as recognizing lanes and traffic signs, controlling the speed and steering, etc., to ensure the safe driving of the vehicle in the complex traffic environment. At the same time, humans are very good at solving many tasks simultaneously, rather than separating tasks and working on them in isolation. Inspired by this, researchers proposed Multi-Task Learning（MTL）. Multi-Task learning is a learning paradigm in machine learning which aims to improve the generalization performance of a model across all tasks by exploiting the useful information contained in multiple related tasks. After entering the era of deep learning, the challenges of multi-task learning mainly focus on two aspects: how to design a loss function balancing strategy that can better balance the optimization process of multi-task learning, and how to design a model that can perform well on multiple tasks. For the loss function balancing strategy of multi-task learning, this paper proposes a multi-task learning loss function balancing strategy based on transformation function. After analyzing the existing related work and pointing out their limitations, based on the intuitive idea that tasks with larger training loss should be more concerned in the optimization process of multi-task learning, this paper proposes to use the transformation function to transform the training loss function to balance multiple tasks. Experiments show that the proposed method can be well combined with other multi-task learning models, and has good performance in both homogeneous multi-task learning and heterogeneous multi-task learning scenarios. For the model design of multi-task learning, this paper proposes a federated multi-task learning model based on personalized attention mechanisms. Inspired by the successful experience of applying the attention mechanism to the design of multi-task learning models, this paper propose to extend the attention mechanism to the field of personalized federated learning which is very similar to multi-task learning. Experiments show that the proposed method can well improve the performance of existing federated learning algorithms on non-IID data without adding additional communication overhead.
关键词	多任务学习深度学习损失函数平衡策略模型设计个性化联邦学习
其他关键词	Multi-task Learning Deep Learning Loss Function Balancing Strategy Model Design Personalized Federated Learning
语种	中文
培养类别	独立培养
入学年份	2019
学位授予年份	2022-06
参考文献列表	[1] SHEN D, WU G, SUK H I. Deep learning in medical image analysis[J]. Annual Review of Biomedical Engineering, 2017, 19: 221-248. [2] CARUANA R. Multitask learning[J]. Machine Learning, 1997, 28(1): 41-75. [3] GONG T, LEE T, STEPHENSON C, et al. A comparison of loss weighting strategies for multitask learning in deep neural networks[J]. IEEE Access, 2019, 7: 141627-141632. [4] KENDALL A, GAL Y, CIPOLLA R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7482-7491. [5] CHEN Z, BADRINARAYANAN V, LEE C Y, et al. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks[C]//International Conference on Machine Learning. PMLR, 2018: 794-803. [6] SENER O, KOLTUN V. Multi-task learning as multi-objective optimization[J]. Advances in Neural Information Processing Systems, 2018, 31. [7] GUO M, HAQUE A, HUANG D A, et al. Dynamic task prioritization for multitask learning[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 270-287. [8] YU T, KUMAR S, GUPTA A, et al. Gradient surgery for multi-task learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 5824-5836. [9] ANDO R K, ZHANG T, BARTLETT P. A framework for learning predictive structures from multiple tasks and unlabeled data[J]. Journal of Machine Learning Research, 2005, 6(11). [10] ARGYRIOU A, EVGENIOU T, PONTIL M. Multi-task feature learning[J]. Advances in Neural Information Processing Systems, 2006, 19. [11] OBOZINSKI G, TASKAR B, JORDAN M. Multi-task feature selection[R]. Department of Statistics, University of California, Berkeley, 2006. [12] JACOB L, VERT J P, BACH F. Clustered multi-task learning: A convex formulation[J]. Advances in Neural Information Processing Systems, 2008, 21. [13] BAKKER B, HESKES T. Task clustering and gating for bayesian multitask learning[J]. Journal of Machine Learning Research, 2003, 4: 83-99. [14] BONILLA E V, CHAI K, WILLIAMS C. Multi-task gaussian process prediction[J]. Advances in Neural Information Processing Systems, 2007, 20. [15] XUE Y, LIAO X, CARIN L, et al. Multi-task learning for classification with dirichlet process priors[J]. Journal of Machine Learning Research, 2007, 8(1). [16] ZHANG Y, YEUNG D Y, XU Q. Probabilistic multi-task feature selection[J]. Advances in Neural Information Processing Systems, 2010, 23. [17] MISRA I, SHRIVASTAVA A, GUPTA A, et al. Cross-stitch networks for multi-task learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3994-4003. [18] LIU P, QIU X, HUANG X J. Adversarial multi-task learning for text classification[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1-10. [19] LONG M, CAO Z, WANG J, et al. Learning multiple tasks with multilinear relationship networks[J]. Advances in Neural Information Processing Systems, 2017, 30. [20] YANG Y, HOSPEDALES T. Deep multi-task representation learning: A tensor factorization approach[C]//5th International Conference on Learning Representations. 2017. [21] BAXTER J. A bayesian/information theoretic model of learning to learn via multiple task sampling[J]. Machine Learning, 1997, 28(1): 7-39. [22] DUONG L, COHN T, BIRD S, et al. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 845-850. [23] YANG Y, HOSPEDALES T. Trace norm regularised deep multi-task learning[C]//5th International Conference on Learning Representations. 2017. [24] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25. [26] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. [27] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Eﬀicient convolutional neural networks for mobile vision applications[A]. 2017. [28] LIU S, JOHNS E, DAVISON A J. End-to-end multi-task learning with attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1871-1880. [29] CHENNUPATI S, SISTU G, YOGAMANI S, et al. Multinet++: Multi-stream feature aggregation and geometric loss strategy for multi-task learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 0-0. [30] MEHTA N, LEE D, GRAY A. Minimax multi-task learning and a generalized loss compositional paradigm for mtl[J]. Advances in Neural Information Processing Systems, 2012, 25. [31] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 41-48. [32] KUMAR M, PACKER B, KOLLER D. Self-paced learning for latent variable models[J]. Advances in Neural Information Processing Systems, 2010, 23. [33] PENTINA A, SHARMANSKA V, LAMPERT C H. Curriculum learning of multiple task[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 5492-5500. [34] LI C, YAN J, WEI F, et al. Self-paced multi-task learning[C]//Thirty-First AAAI Conference on Artificial Intelligence. 2017. [35] MURUGESAN K, CARBONELL A. Self-paced multitask learning with shared knowledge[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 2522-2528. [36] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2980-2988. [37] LIN X, ZHEN H, LI Z, et al. Pareto multi-task learning[C]//Advances in Neural Information Processing Systems. 2019: 12037-12047. [38] MAHAPATRA D, RAJAN V. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization[C]//International Conference on Machine Learning. PMLR, 2020: 6597-6607. [39] DÉSIDÉRI J A. Multiple-gradient descent algorithm (mgda) for multiobjective optimization[J]. Comptes Rendus Mathematique, 2012, 350(5-6): 313-318. [40] ZHANG Z, LUO P, LOY C C, et al. Facial landmark detection by deep multi-task learning[C]//European Conference on Computer Vision. Springer, 2014: 94-108. [41] GAO Y, MA J, ZHAO M, et al. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3205-3214. [42] ROSENBAUM C, KLINGER T, RIEMER M. Routing networks: Adaptive selection of non-linear functions for multi-task learning[C]//International Conference on Learning Representations. 2018. [43] FRANCESCHI L, FRASCONI P, SALZO S, et al. Bilevel programming for hyperparameter optimization and meta-learning[C]//International Conference on Machine Learning. PMLR, 2018: 1568-1577. [44] SAENKO K, KULIS B, FRITZ M, et al. Adapting visual category models to new domains[C]//European Conference on Computer Vision. Springer, 2010: 213-226. [45] VENKATESWARA H, EUSEBIO J, CHAKRABORTY S, et al. Deep hashing network for unsupervised domain adaptation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5018-5027. [46] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. [47] KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//3rd International Conference on Learning Representations. 2015. [48] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3213-3223. [49] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from rgbd images[C]//European Conference on Computer Vision. Springer, 2012: 746-760. [50] ZHANG Y, YANG Q. A survey on multi-task learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021. [51] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4510-4520. [52] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. [53] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-eﬀicient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics. PMLR, 2017: 1273-1282. [54] LI T, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[C]//Proceedings of Machine Learning and Systems. 2020. [55] KARIMIREDDY S P, KALE S, MOHRI M, et al. Scaffold: Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning. PMLR, 2020: 5132-5143. [56] ACAR D A E, ZHAO Y, MATAS R, et al. Federated learning based on dynamic regularization[C]//International Conference on Learning Representations. 2020. [57] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. PMLR, 2017: 1126-1135. [58] FALLAH A, MOKHTARI A, OZDAGLAR A. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach[J]. Advances in Neural Information Processing Systems, 2020, 33: 3557-3568. [59] COLLINS L, HASSANI H, MOKHTARI A, et al. Exploiting shared representations for personalized federated learning[C]//International Conference on Machine Learning. PMLR, 2021: 2089-2099. [60] LI X, JIANG M, ZHANG X, et al. Fedbn: Federated learning on non-iid features via local batch normalization[C]//International Conference on Learning Representations. 2020. [61] LI T, HU S, BEIRAMI A, et al. Ditto: Fair and robust federated learning through personalization[C]//International Conference on Machine Learning. PMLR, 2021: 6357-6368. [62] SMITH V, CHIANG C K, SANJABI M, et al. Federated multi-task learning[J]. Advances in Neural Information Processing Systems, 2017, 30. [63] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: A survey[J]. Computational Visual Media, 2022: 1-38. [64] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141. [65] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19. [66] WANG J, CHEN Y, CHAKRABORTY R, et al. Orthogonal convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Cmputer Vision and Pattern Recognition. 2020: 11505-11515. [67] HOU Q, ZHOU D, FENG J. Coordinate attention for eﬀicient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13713-13722. [68] WANG X, LI L, YE W, et al. Transferable attention for domain adaptation[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 33. 2019: 5345-5352. [69] WANG Y, ZHANG Z, HAO W, et al. Attention guided multiple source and target domain adaptation[J]. IEEE Transactions on Image Processing, 2020, 30: 892-906. [70] NETZER Y, WANG T, COATES A, et al. Reading digits in natural images with unsupervised feature learning[C]//NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 2011. [71] HULL J J. A database for handwritten text recognition research[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(5): 550-554. [72] GANIN Y, LEMPITSKY V. Unsupervised domain adaptation by backpropagation[C]//International Conference on Machine Learning. PMLR, 2015: 1180-1189. [73] PENG X, BAI Q, XIA X, et al. Moment matching for multi-source domain adaptation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1406-1415. [74] EVERINGHAM M, WINN J. The pascal visual object classes challenge 2012 (voc2012) development kit[J]. Pattern Analysis, Statistical Modelling and Computational Learning, Tech. Rep, 2011, 8: 5. [75] DAQUAN Z, HOU Q, CHEN Y, et al. Rethinking bottleneck structure for eﬀicient mobile network design[M]//European Conference on Computer Vision. 2020. [76] BOYD S, BOYD S P, VANDENBERGHE L. Convex optimization[M]. Cambridge University Press, 2004. [77] MCDIARMID C, et al. On the method of bounded differences[J]. Surveys in Combinatorics, 1989, 141(1): 148-188.
所在学位评定分委会	计算机科学与工程系
国内图书分类号	TP181
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/335684
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	梁思聪. 多任务学习的损失函数平衡策略与模型设计[D]. 深圳. 南方科技大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
11930663-梁思聪-计算机科学与工（3684KB）	--	--	限制开放	--	请求全文