南方科技大学知识苑(SUSTech KC): 基于强化学习的数字微流控芯片液滴路径规划算法研究

题名	基于强化学习的数字微流控芯片液滴路径规划算法研究
姓名	杨容权
姓名拼音	YANG Rongquan
学号	11930641
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	袁博
导师单位	计算机科学与工程系
论文答辩日期	2022-05-08
论文提交日期	2022-06-13
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	数字微流控生物芯片是一种创新的微型实验室生化反应平台，它直接操纵微型液滴执行各种生化反应协议。数字微流控生物芯片在临床诊断中显示出巨大的优势，例如新型冠状病毒检测、常规 DNA 分析、骨质检测等。为了优化芯片的设计和生化反应协议的执行流程，亟须高质量的计算机辅助设计工具来实现设计自动化。液滴路径规划是设计自动化流程中的关键步骤之一，其结果直接影响着芯片的性能，其中的一大挑战是芯片电极缺陷带来的可靠性问题。芯片上的电极在使用过程中会出现退化，使得芯片环境随时间动态变化。位于退化电极上的液滴进行移动时，会因为驱动力不足，而移动失败，最后导致错误的生化反应发生，得到错误的实验结果。当前有望解决动态液滴路径规划问题的一种方式是通过强化学习方法构建起一套更可靠的算法框架。强化学习框架下的智能体以反馈的方式学习移动液滴的策略，具有捕捉电极潜在健康状况的能力，执行可靠的液滴移动操作。然而，将强化学习用于液滴路径规划存在着一些挑战：（1）如何构建起多个智能体之间的合作机制，同时避免智能体之间受到虚假奖励信号的干扰，以此实现可靠、快速的动态液滴路径规划；（2）平均累计回报只能作为强化学习训练时的一个观测指标，并不真正对应液滴路径规划的优化目标。如何构造针对动态液滴路径规划问题的算法性能评价指标，以对算法的性能进行公平、准确、全面的评估。为了应对上述挑战，本文提出一种基于合作型多智能体强化学习的液滴路径规划算法，采用了集中式学习和分布式执行的框架，智能体之间能够形成很好的协作，且同时适用于传统的数字微流控生物芯片和微点阵生物芯片。与此同时，本文提出了两个全新的评估算法性能的指标：成功率和平均完成步长。为了便于强化学习算法的训练和性能评估，本文开发了数字微流控生物芯片流体级综合的仿真平台。该仿真平台支持多种类型的液滴路径规划任务，以及可变的芯片尺寸、液滴数量、障碍区域面积和数量等。同时，该环境还支持在液滴移动过程中的动态约束检测，可以模拟特定的电极退化场景，对算法进行有效性测试和验证。通过在仿真平台上的实验结果分析，对比目前已知的相关最好的算法，本文提出的算法在多个评价指标下都表现出了更优异的性能。
其他摘要	As an innovative platform for miniaturizing laboratory procedures, the digital mi- crofluidic biochips demonstrate great advantages in clinical diagnostics by manipulating discrete nano/picoliter droplets to automatically execute biochemical protocols, such as COVID-19 testing, DNA analysis, bone detection. To optimize the design of chips and the execution of biochemical reaction protocols, high-quality computer-aided design tools are urgently used for design automation. Droplet routing, as one of the key steps in the design automation process, directly affects the performance of the biochip. One of the major challenges is the reliability issue caused by electrode degradation, which brings about droplet transportation failing and incorrect fluidic operations. One of the current promising approaches to solving the dynamic droplet routing prob- lem is reinforcement learning. The agent under the reinforcement learning framework learns the strategy of moving the droplet in a feedback manner, can capture the potential health of the electrodes, and performs reliable droplet movement operations. However, there are some challenges in using reinforcement learning for droplet routing: (1) How to build a cooperation mechanism between multiple agents, while avoiding the interfer- ence of false reward signals between agents, to achieve reliable and fast dynamic droplet routing; (2) The average cumulative return, as an observation metric during the training stage, is inconsistent with optimization goal of droplet routing. How to construct problem- specific evaluation metrics for the dynamic droplet routing problem so as to evaluate the algorithm performance fairly, accurately, and comprehensively. To meet the above challenges, this paper proposes a droplet routing algorithm based on cooperative multi-agent reinforcement learning with utilizes the centralized learning and distributed execution framework, where the agents make good cooperation and suit for both conventional digital microfluidic biochips and microelectrode-dot-array biochips. This paper proposes two new metrics for evaluating the performance of the algorithm: success rate and average completion steps. To facilitate the training and performance evaluation of reinforcement learning, a simulation platform for fluid-level synthesis of digital microfluidic biochips is developed in this paper, which supports multiple types of droplet path planning tasks, as well as variable chip size, droplet number, obstacle number, etc. Besides, the environment also supports dynamic constraint detection in the process of droplet movement, which can simulate specific electrode degradation scenarios to test and verify the effectiveness of the algorithm. Through the analysis of the experimental results on the simulation platform, compared with the state-of-the-art algorithms, the algorithm proposed in this paper has shown better performance in multiple evaluation metrics.
关键词	数字微流控生物芯片液滴路径规划多智能体强化学习
其他关键词	Digital Microfluidic Biochip Droplet Routing Multi-agent Reinforcement Learning
语种	中文
培养类别	独立培养
入学年份	2019-09
学位授予年份	2022-06
参考文献列表	[1] The National Institutes of Health, U.S. Rapid Acceleration of Diagnostics[EB/OL]. [2021-1-3 0].https://www.nih.gov/research-training/medical-research-initiatives/radx. [2] The National Institutes of Health, U.S. NIH Delivering New COVID-19 Testing Technologies to Meet U.S. Demand[EB/OL]. [2021-1-30].https://www.nih.gov/news-events/news-releases/ nih-delivering-new-covid-19-testing-technologies-meet-us-demand. [3] KESZOCZE O, WILLE R, DRECHSLER R. Exact Design of Digital Microfluidic Biochips [M]. Springer, 2019. [4] GRIMMER A, KLEPIC B, HO T Y, et al. Sound Valve-control for Programmable Microfluidic Devices[C]//Asia and South Pacific Design Automation Conference. IEEE, 2018: 40-45. [5] POLLACK M G, SHENDEROV A D, FAIR R B. Electrowetting-based Actuation of Droplets for Integrated Microfluidics[J]. Lab on a Chip, 2002, 2(2): 96-101. [6] ZHONG Z, LI Z, CHAKRABARTY K, et al. Micro-electrode-dot-array Digital Microfluidic Biochips: Technology, Design Automation, and Test Techniques[J]. IEEE Transactions on Biomedical Circuits and Systems, 2018, 13(2): 292-313. [7] MOMTAHEN S, TAAJOBIAN M, JAHANIAN A. Drug Discovery Applications: A Cus- tomized Digital Microfluidic Biochip Architecture/CAD Flow[J]. IEEE Nanotechnology Mag- azine, 2019, 13(5): 25-34. [8] Government Accountability Oﬀice，U.S. Newborn Screening Timeliness: Most States Had Not Met Screening Goals, but Some Are Developing Strategies to Address Barriers.[EB/OL]. [2016-12-15].https://www.gao.gov/products/GAO-17-196. [9] HOPCROFT J, SCHWARTZ J, SHARIR M. On the Complexity of Motion Planning for Multi- ple Independent Objects; PSPACE- Hardness of the “Warehouseman’s Problem”[J]. the Inter- national Journal of Robotics Research, 1984, 3(4): 76-88. [10] SU F, CHAKRABARTY K, FAIR R B. Microfluidics-based Biochips: Technology Issues, Implementation Platforms, and Design-Automation Challenges[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(2): 211-223. [11] DONG C, CHEN T, GAO J, et al. On the Droplet Velocity and Electrode Lifetime of Digital Mi- crofluidics: Voltage Actuation Techniques and Comparison[J]. Microfluidics and Nanofluidics, 2015, 18(4): 673-683. [12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning[A]. 2013. [13] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the Game of Go Without Human Knowledge[J]. Nature, 2017, 550(7676): 354-359. [14] GU S, HOLLY E, LILLICRAP T, et al. Deep Reinforcement Learning for Robotic Manipu- lation with Asynchronous Off-policy Updates[C]//IEEE International Conference on Robotics and Automation. IEEE, 2017: 3389-3396. [15] HE D, XIA Y, QIN T, et al. Dual Learning for Machine Translation[C]//Advances in Neural Information Processing Systems: volume 29. Curran Associates, Inc., 2016. [16] BOHRINGER K F. Modeling and Controlling Parallel Tasks in Droplet-based Microfluidic Systems[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(2): 334-344. [17] SU F, HWANG W, CHAKRABARTY K. Droplet Routing in the Synthesis of Digital Microflu- idic Biochips[C]//Design Automation & Test in Europe Conference: volume 1. IEEE, 2006: 1-6. [18] CHO M, PAN D Z. A high-performance Droplet Routing Algorithm for Digital Microfluidic Biochips[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(10): 1714-1724. [19] YUH P H, YANG C L, CHANG Y W. BioRoute: A Network-flow-based Routing Algorithm for the Synthesis of Digital Microfluidic Biochips[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(11): 1928-1941. [20] HUANG T W, HO T Y. A Fast Routability-and Performance-driven Droplet Routing Algo- rithm for Digital Microfluidic Biochips[C]//IEEE International Conference on Computer De- sign. IEEE, 2009: 445-450. [21] LI Z, LAI K Y T, CHAKRABARTY K, et al. Droplet Size-Aware and Error-Correcting Sample Preparation Using Micro-Electrode-Dot-Array Digital Microfluidic Biochips[J]. IEEE Trans- actions on Biomedical Circuits and Systems, 2017, 11(6): 1380-1391. [22] PAN I, DASGUPTA P, RAHAMAN H, et al. Ant Colony Optimization Based Droplet Routing Technique in Digital Microfluidic Biochip[C]//International Symposium on Electronic System Design. IEEE, 2011: 223-229. [23] PAN I, SAMANTA T. Eﬀicient Droplet Router for Digital Microfluidic Biochip Using Parti- cle Swarm Optimizer[C]//International Conference on Communication and Electronics System Design: volume 8760. SPIE, 2013: 472-481. [24] JUÁREZ J, BRIZUELA C, MARTÍNEZ I, et al. A Genetic Algorithm for the Routing of Droplets in DMFB: Preliminary Results[C]//IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2014: 3808-3815. [25] JUÁREZ J, BRIZUELA C A, MARTÍNEZ-PÉREZ I M. An Evolutionary Multi-objective Opti- mization Algorithm for the Routing of Droplets in Digital Microfluidic Biochips[J]. Information Sciences, 2018, 429: 130-146. [26] YUH P H, SAPATNEKAR S, YANG C L, et al. A Progressive-ILP based Routing Algorithm for Cross-referencing Biochips[C]//ACM/IEEE Design Automation Conference. IEEE, 2008: 284-289. [27] YUH P H, LIN C C Y, HUANG T W, et al. A SAT-based Routing Algorithm for Cross- referencing Biochips[C]//International Workshop on System Level Interconnect Prediction. IEEE, 2011: 1-7. [28] KESZOCZE O, WILLE R, CHAKRABARTY K, et al. A General and Exact Routing Method- ology for Digital Microfluidic Biochips[C]//IEEE/ACM International Conference on Computer- Aided Design. IEEE, 2015: 874-881. [29] LIANG T C, ZHONG Z. Adaptive Droplet Routing in Digital Microfluidic Biochips using Deep Reinforcement Learning[C]//International Conference on Machine Learning. PMLR, 2020: 6050-6060. [30] LIANG T C, ZHOU J, CHAN Y S, et al. Parallel Droplet Control in MEDA Biochips us- ing Multi-agent Reinforcement Learning[C]//International Conference on Machine Learning. PMLR, 2021: 6588-6599. [31] DEVLIN S, YLINIEMI L, KUDENKO D, et al. Potential-based Difference Rewards for Multi- agent Reinforcement Learning[C]//International Conference on Autonomous Agents and Multi- agent Systems. 2014: 165-172. [32] SU F, CHAKRABARTY K. High-Level Synthesis of Digital Microfluidic Biochips[J]. ACM Journal on Emerging Technologies in Computing Systems, 2008, 3(4). [33] LI Z, HO T Y, LAI K Y T, et al. High-level Synthesis for Micro-electrode-dot-array Digital Microfluidic Biochips[C]//ACM/EDAC/IEEE Design Automation Conference. 2016: 1-6. [34] LIANG T C, ZHONG Z, PAJIC M, et al. Extending the Lifetime of MEDA Biochips by Selective Sensing on Microelectrodes[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(11): 3531-3543. [35] ZHONG Z, LIANG T C, CHAKRABARTY K. Enhancing the Reliability of MEDA Biochips using IJTAG and Wear Leveling[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 40(10): 2063-2076. [36] VERHEIJEN H, PRINS M. Reversible Electrowetting and Trapping of Charge: Model and Experiments[J]. Langmuir, 1999, 15(20): 6616-6620. [37] SU F, OZEV S, CHAKRABARTY K. Test Planning and Test Resource Optimization for Droplet-based Microfluidic systems[J]. Journal of Electronic Testing, 2006, 22(2): 199-210. [38] XU T, CHAKRABARTY K. Functional Testing of Digital Microfluidic Biochips[C]//IEEE International Test Conference. IEEE, 2007: 1-10. [39] DRYGIANNAKIS A I, PAPATHANASIOU A G, BOUDOUVIS A G. On the Connection be- tween Dielectric Breakdown Strength, Trapping of Charge, and Contact Angle Saturation in Electrowetting[J]. Langmuir, 2009, 25(1): 147-152. [40] HO T Y, ZENG J, CHAKRABARTY K. Digital Microfluidic Biochips: A Vision for Functional Diversity and More Than Moore[C]//IEEE/ACM International Conference on Computer-Aided Design. IEEE, 2010: 578-585. [41] BUSONIU L, BABUSKA R, DE SCHUTTER B. A Comprehensive Survey of Multiagent Reinforcement Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Ap- plications and Reviews), 2008, 38(2): 156-172. [42] ALMAHDI S, YANG S Y. An Adaptive Portfolio Trading System: A Risk-return Portfolio Optimization using Recurrent Reinforcement Learning with Expected Maximum Drawdown [J]. Expert Systems with Applications, 2017, 87: 267-279. [43] GOTTESMAN O, JOHANSSON F, KOMOROWSKI M, et al. Guidelines for Reinforcement Learning in Healthcare[J]. Nature Medicine, 2019, 25(1): 16-18. [44] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level Control Through Deep Rein- forcement Learning[J]. Nature, 2015, 518(7540): 529-533. [45] SILVER D, HUANG A, MADDISON C J, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search[J]. Nature, 2016, 529(7587): 484-489. [46] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent Reinforcement Learning: An Overview[J]. Innovations in multi-agent systems and applications-1, 2010: 183-221. [47] SUTTON R S, BARTO A G. Reinforcement learning: An Introduction[M]. MIT press, 2018. [48] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement Learning: A Survey[J]. Journal of Artificial Intelligence research, 1996, 4: 237-285. [49] OTTERLO M V, WIERING M. Reinforcement Learning and Markov Decision Processes[M]// Reinforcement learning. Springer, 2012: 3-42. [50] ZHANG K, YANG Z, BAŞAR T. Multi-agent Reinforcement Learning: A Selective Overview of Theories and Algorithms[J]. Handbook of Reinforcement Learning and Control, 2021: 321- 384. [51] HU J, WELLMAN M P, et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm[C]//International Conference on Machine Learning. Citeseer, 1998: 242-250. [52] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust Region Policy Optimization[C]// International Conference on Machine Learning. PMLR, 2015: 1889-1897. [53] BUSONIU L, BABUSKA R, DE SCHUTTER B. Multi-agent Reinforcement Learning: A Sur- vey[C]//International Conference on Control, Automation, Robotics and Vision. IEEE, 2006: 1-6. [54] LAURENT G J, MATIGNON L, FORT-PIAT L, et al. The World of Independent Learners is not Markovian[J]. International Journal of Knowledge-based and Intelligent Engineering Systems, 2011, 15(1): 55-64. [55] LOWE R, WU Y I, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative- Competitive Environments[J]. Advances in Neural Information Processing Systems, 2017, 30. [56] WATKINS C J, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292. [57] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]// International Conference on Machine Learning. PMLR, 2014: 387-395. [58] WILLIAMS R J. Simple Statistical Gradient-following Algorithms for Connectionist Reinforce- ment Learning[J]. Machine Learning, 1992, 8(3): 229-256. [59] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//International Conference on Machine Learning. PMLR, 2016: 1928-1937. [60] SUTTON R S. Integrated Architectures for Learning, Planning, and Reacting based on Ap- proximating Dynamic Programming[M]//Machine Learning Proceedings 1990. Elsevier, 1990: 216-224. [61] BRAFMAN R I, TENNENHOLTZ M. R-max-a General Polynomial Time Algorithm for Near- optimal Reinforcement Learning[J]. Journal of Machine Learning Research, 2002, 3(10): 213- 231. [62] BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo Tree Search Methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1): 1-43. [63] OLIEHOEK F A, SPAAN M T, VLASSIS N. Optimal and Approximate Q-value Functions for Decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32: 289-353. [64] OLIEHOEK F A, AMATO C. A Concise Introduction to Decentralized POMDPs[M]. Springer, 2016. [65] PANAIT L, LUKE S. Cooperative Multi-agent Learning: the State of the Art[J]. Autonomous Agents and Multi-agent Systems, 2005, 11(3): 387-434. [66] TUYLS K, WEISS G. Multiagent Learning: Basics, Challenges, and Prospects[J]. AI Maga- zine, 2012, 33(3): 41-41. [67] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-Decomposition Networks For Coopera- tive Multi-agent Learning based on Team Reward[C]//International Conference on Autonomous Agents and Multi-agent Systems. 2018: 2085-2087. [68] SARTORETTI G, KERR J, SHI Y, et al. PRIMAL: Pathfinding via Reinforcement and Imitation Multi-agent Learning[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2378-2385. [69] HAUSKNECHT M, STONE P. Deep Recurrent Q-learning for Partially Observable MDPs[C]// AAAI Fall Symposium Series. AAAI, 2015: 29-37.
所在学位评定分委会	计算机科学与工程系
国内图书分类号	TN492
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/335692
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	杨容权. 基于强化学习的数字微流控芯片液滴路径规划算法研究[D]. 深圳. 南方科技大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
11930641-杨容权-计算机科学与工（18627KB）	--	--	限制开放	--	请求全文