[1] MARCH J G. Exploration and exploitation in organizational learning[J]. Organization science, 1991, 2(1): 71-87.
[2] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. nature, 2016, 529(7587): 484-489.
[4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[5] JIN C, LIU Q, MIRYOOSEFI S. Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms[J]. Advances in neural information processing systems, 2021, 34: 13406-13418.
[6] BERNARDIN K, STIEFELHAGEN R. Evaluating multiple object tracking performance: the clear mot metrics[J]. EURASIP Journal on Image and Video Processing, 2008, 2008: 1-10.
[7] KAPTUROWSKI S, CAMPOS V, JIANG R, et al. Human-level Atari 200x faster[A]. 2022.
[8] YU F, XIAN W, CHEN Y, et al. Bdd100k: A diverse driving video database with scalable annotation tooling: volume 2[A]. 2018: 6.
[9] GOTTESMAN O, JOHANSSON F, KOMOROWSKI M, et al. Guidelines for reinforcement learning in healthcare[J]. Nature medicine, 2019, 25(1): 16-18.
[10] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//International conference on machine learning. PMLR, 2020: 1597-1607.
[11] GRILL J B, STRUB F, ALTCHÉ F, et al. Bootstrap your own latent-a new approach to self supervised learning[J]. Advances in neural information processing systems, 2020, 33: 21271-21284.
[12] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.
[13] TAJBAKHSH N, SHIN J Y, GURUDU S R, et al. Convolutional neural networks for medical image analysis: Full training or fine tuning?[J]. IEEE transactions on medical imaging, 2016, 35(5): 1299-1312.
[14] O’NEILL M. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: A Field Guide to Genetic Programming: Lulu. com, 2008, 250 pp, ISBN 978-1-4092-0073-4[M]. Springer, 2009.
[15] ABBEEL P, QUIGLEY M, NG A Y. Using inaccurate models in reinforcement learning[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 1-8.
[16] BURNS K, YU T, FINN C, et al. Offline reinforcement learning at multiple frequencies[C]//Conference on Robot Learning. PMLR, 2023: 2041-2051.
[17] LEE S, LEE D Y, IM S, et al. Clinical decision transformer: Intended treatment recommendation through goal prompting[A]. 2023.
[18] ZHAO K, ZOU L, ZHAO X, et al. User Retention-oriented Recommendation with Decision Transformer[C]//Proceedings of the ACM Web Conference 2023. 2023: 1141-1149.
[19] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[C]//International conference on machine learning. PMLR, 2019: 2052-2062.
[20] KUMAR A, FU J, SOH M, et al. Stabilizing off-policy q-learning via bootstrapping error reduction[J]. Advances in Neural Information Processing Systems, 2019, 32.
[21] JAQUES N, GHANDEHARIOUN A, SHEN J H, et al. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog[A]. 2019.
[22] WU Y, TUCKER G, NACHUM O. Behavior regularized offline reinforcement learning[A]. 2019.
[23] PENG X B, KUMAR A, ZHANG G, et al. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning[A]. 2019.
[24] SIEGEL N Y, SPRINGENBERG J T, BERKENKAMP F, et al. Keep doing what worked:Behavioral modelling priors for offline reinforcement learning[A]. 2020.
[25] JANNER M, FU J, ZHANG M, et al. When to trust your model: Model-based policy optimization[J]. Advances in neural information processing systems, 2019, 32.
[26] LU C, BALL P J, PARKER-HOLDER J, et al. Revisiting design choices in model-based offline reinforcement learning[A]. 2021.
[27] LEE B J, LEE J, KIM K E. Representation balancing offline model-based reinforcement learning[C]//9th International Conference on Learning Representations, ICLR 2021. 2021.
[28] HISHINUMA T, SENDA K. Weighted model estimation for offline model-based reinforcement learning[J]. Advances in neural information processing systems, 2021, 34: 17789-17800.
[29] ARGENSON A, DULAC-ARNOLD G. Model-based offline planning[A]. 2020.
[30] MATSUSHIMA T, FURUTA H, MATSUO Y, et al. Deployment-efficient reinforcement learning via model-based offline optimization[A]. 2020.
[31] YU T, KUMAR A, RAFAILOV R, et al. Combo: Conservative offline model-based policy optimization[J]. Advances in neural information processing systems, 2021, 34: 28954-28967.
[32] KIDAMBI R, RAJESWARAN A, NETRAPALLI P, et al. Morel: Model-based offline reinforcement learning[J]. Advances in neural information processing systems, 2020, 33: 21810-21823.
[33] YU T, THOMAS G, YU L, et al. Mopo: Model-based offline policy optimization[J]. Advances in Neural Information Processing Systems, 2020, 33: 14129-14142.
[34] LEVINE S, KUMAR A, TUCKER G, et al. Offline reinforcement learning: Tutorial, review, and perspectives on open problems[A]. 2020.
[35] KUMAR A, ZHOU A, TUCKER G, et al. Conservative q-learning for offline reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 1179-1191.
[36] FUJIMOTO S, GU S S. A minimalist approach to offline reinforcement learning[J]. Advances in neural information processing systems, 2021, 34: 20132-20145.
[37] ASHVIN N, ABHISHEK G, MURTAZA D, et al. AWAC: Accelerating Online Reinforcement Learning with Offline Datasets[A]. 2020.
[38] UCHENDU I, XIAO T, LU Y, et al. Jump-start reinforcement learning[C]//International Conference on Machine Learning. PMLR, 2023: 34556-34583.
[39] LEE S, SEO Y, LEE K, et al. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble[C]//Conference on Robot Learning. PMLR, 2022: 1702-1712.
[40] REZAEIFAR S, DADASHI R, VIEILLARD N, et al. Offline reinforcement learning as antiexploration[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 36. 2022: 8106-8114.
[41] MARK M S, GHADIRZADEH A, CHEN X, et al. Fine-tuning offline policies with optimistic action selection[C]//Deep Reinforcement Learning Workshop NeurIPS 2022. 2022.
[42] ZHENG H, LUO X, WEI P, et al. Adaptive policy learning for offline-to-online reinforcement learning[A]. 2023.
[43] ZHANG H, XU W, YU H. Policy Expansion for Bridging Offline-to-Online Reinforcement Learning[A]. 2023.
[44] MAO Y, WANG C, WANG B, et al. MOORe: Model-based Offline-to-Online Reinforcement Learning[A]. 2022.
[45] RAFAILOV R, HATCH K B, KOLEV V, et al. MOTO: Offline to Online Fine-tuning forModel-Based Reinforcement Learning[C]//Workshop on Reincarnating Reinforcement Learning at ICLR 2023. 2023.
[46] GUO S, SUN Y, HU J, et al. A Simple Unified Uncertainty-Guided Framework for Offline-to Online Reinforcement Learning[A]. 2023.
[47] LATTIMORE T, SZEPESVÁRI C. Bandit algorithms[M]. Cambridge University Press, 2020.
[48] GARIVIER A, MOULINES E. On upper-confidence bound policies for switching bandit problems[C]//International Conference on Algorithmic Learning Theory. Springer, 2011: 174-188.
[49] KAUFMANN E, CAPPÉ O, GARIVIER A. On Bayesian upper confidence bounds for bandit problems[C]//Artificial intelligence and statistics. PMLR, 2012: 592-600.
[50] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[A]. 2016.
[51] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping[C]//Icml: volume 99. Citeseer, 1999: 278-287.
[52] STREHL A L, LI L, WIEWIORA E, et al. PAC model-free reinforcement learning[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 881-888.
[53] RAMESH N, MOURYA S. Reinforcement Learning using the CarRacing-v0 environment from OpenAI Gym[Z]. 2020.
[54] LANGE S, GABEL T, RIEDMILLER M. Batch reinforcement learning[M]//Reinforcement learning: State-of-the-art. Springer, 2012: 45-73.
[55] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. nature, 2015, 521(7553): 436-444.
[56] PERTSCH K, LEE Y, LIM J. Accelerating reinforcement learning with learned skill priors[C]// Conference on robot learning. PMLR, 2021: 188-204.
[57] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//International conference on machine learning. PMLR, 2017: 2778-2787.
[58] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE transactions on evolutionary computation, 2002: 182-197.
[59] AERTA K V M, NOWE A. Multi-objective reinforcement learning using sets of pareto dominating policies[J]. The Journal of Machine Learning Research, 2014: 3843-3512.
[60] YANG R, SUN X, NARASIMHAN K. A generalized algorithm for multi-objective reinforcement learning and policy adaptation[J]. Advances in neural information processing systems, 2019: 3843-3512.
[61] FU J, KUMAR A, NACHUM O, et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning[A]. 2020. arXiv: 2004.07219.
[62] GOTTESMAN O, JOHANSSON F, KOMOROWSKI M, et al. Guidelines for reinforcement learning in healthcare[J]. Nature medicine, 2019, 25(1): 16-18.
[63] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[J]. International Conference on Learning Representations, 2016.
[64] JANNER M, MORDATCH I, LEVINE S. Generative temporal difference learning for infinitehorizon prediction[A]. 2020.
[65] ROIJERS D M, VAMPLEW P, WHITESON S, et al. A survey of multi-objective sequential decision-making[J]. Journal of Artificial Intelligence Research, 2013: 67-113.
[66] RAFAILOV R, YU T, RAJESWARAN A, et al. Offline reinforcement learning from images with latent space models[C]//Learning for Dynamics and Control. PMLR, 2021: 1154-1168.
[67] YANG Y, JIANG J, ZHOU T, et al. Pareto policy pool for model-based offline reinforcement learning[J]. International Conference on Learning Representations, 2021.
[68] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.
[69] XU J, TIAN Y, MA P, et al. Prediction-guided multi-objective reinforcement learning for continuous robot control[C]//International conference on machine learning. PMLR, 2020: 10607-10616.
[70] MA P, DU T, MATUSIK W. Efficient continuous pareto exploration in multi-task learning[C]//International Conference on Machine Learning. PMLR, 2020: 6522-6531.
[71] DÉSIDÉRI J A. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization [J]. Comptes Rendus Mathematique, 2012, 350(5-6): 313-318.
[72] SENER O, KOLTUN V. Multi-task learning as multi-objective optimization[J]. Advances in neural information processing systems, 2018.
[73] TUOMAS H, AURICK Z, KRISTIAN H, et al. Soft Actor-Critic Algorithms and Applications [A]. 2019. arXiv: 1812.05905.
[74] JAGGI M. Revisiting Frank-Wolfe: Projection-free sparse convex optimization[C]//International conference on machine learning. PMLR, 2013: 427-435.
[75] TODOROV E, EREZ T, TASSA Y. Mujoco: A physics engine for model-based control[C]//2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012: 5026-5033.
[76] AURéLIEN G, OLIVIER C. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond[J]. Annual Conference on Learning Theory, 2011: 359-376.
[77] KOSTRIKOV I, FERGUS R, TOMPSON J, et al. Offline reinforcement learning with fisher divergence critic regularization[C]//International Conference on Machine Learning. PMLR, 2021: 5774-5783.
[78] VERMOREL J, MOHRI M. Multi-armed bandit algorithms and empirical evaluation[C]//European conference on machine learning. Springer, 2005: 437-448.
[79] SHERMAN J. Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix[J]. Annals of mathematical statistics, 1949, 20(4): 621.
[80] VIDMANTAS B. On Hoeffding’s Inequalities[J]. The Annals of Probability, 2004: 1650-1673.
[81] NAKAMOTO M, ZHAI S, SINGH A, et al. Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning[J]. Advances in Neural Information Processing Systems, 2024, 36.
[82] KOSTRIKOV I, NAIR A, LEVINE S. Offline Reinforcement Learning with Implicit Q-Learning[C]//International Conference on Learning Representations. 2022.
[83] AVIRAL K, AURICK Z, GEORGE T, et al. Conservative q-learning for offline reinforcement learning[J]. Advances in neural information processing systems, 2020, 34.
[84] SHENZHI W, QISEN Y, JIAWEI G, et al. Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning[J]. Advances in neural information processing systems, 2023, 37.
[85] TARASOV D, NIKULIN A, AKIMOV D, et al. CORL: Research-oriented Deep Offline Reinforcement Learning Library[C/OL]//3rd Offline RL Workshop: Offline RL as a ”Launchpad. 2022. https://openreview.net/forum?id=SyAS49bBcv.
修改评论