[1] JIANG Y, NEYSHABUR B, MOBAHI H, et al. Fantastic generalization measures and where to find them[A]. 2019.
[2] BARTLETT P L, MENDELSON S. Rademacher and Gaussian complexities: Risk bounds and structural results[J]. Journal of Machine Learning Research, 2002, 3(Nov): 463-482.
[3] GAO W, ZHOU Z H. Dropout Rademacher complexity of deep neural networks[J]. Science China Information Sciences, 2016, 59(7): 1-12.
[4] BARTLETT P L, FOSTER D J, TELGARSKY M J. Spectrally-normalized margin bounds for neural networks[J]. Advances in neural information processing systems, 2017, 30.
[5] GOLOWICH N, RAKHLIN A, SHAMIR O. Size-Independent Sample Complexity of Neural Networks[C/OL]//BUBECK S, PERCHET V, RIGOLLET P. Proceedings of Machine Learning Research: volume 75 Proceedings of the 31st Conference On Learning Theory. PMLR, 2018: 297-299. https://proceedings.mlr.press/v75/golowich18a.html.
[6] VAPNIK V, LEVIN E, LE CUN Y. Measuring the VC-dimension of a learning machine[J]. Neural computation, 1994, 6(5): 851-876.
[7] BARTLETT P, MAIOROV V, MEIR R. Almost linear VC dimension bounds for piecewise polynomial networks[J]. Advances in neural information processing systems, 1998, 11.
[8] BARTLETT P L, HARVEY N, LIAW C, et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks[J/OL]. Journal of Machine Learning Research, 2019, 20(63): 1-17. http://jmlr.org/papers/v20/17-612.html.
[9] NEYSHABUR B, BHOJANAPALLI S, MCALLESTER D, et al. Exploring generalization in deep learning[J]. Advances in neural information processing systems, 2017, 30.
[10] LIANG T, POGGIO T, RAKHLIN A, et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks[C/OL]//CHAUDHURI K, SUGIYAMA M. Proceedings of Machine Learning Research: volume 89 Proceedings of the Twenty-Second International Conference on Artifi- cial Intelligence and Statistics. PMLR, 2019: 888-896. https://proceedings.mlr.press/v89/lian g19a.html.
[11] ZHANG C, BENGIO S, HARDT M, et al. Understanding deep learning (still) requires rethink- ing generalization[J]. Communications of the ACM, 2021, 64(3): 107-115.
[12] NAGARAJAN V, KOLTER J Z. Uniform convergence may be unable to explain generalization in deep learning[J]. Advances in Neural Information Processing Systems, 2019, 32.
[13] MOHRI M, ROSTAMIZADEH A, TALWALKAR A. Foundations of machine learning[M]. MIT press, 2018.
[14] ALLEN-ZHU Z, LI Y, LIANG Y. Learning and generalization in overparameterized neural networks, going beyond two layers[J]. Advances in neural information processing systems, 2019, 32. 38
[15] ARORA S, GE R, NEYSHABUR B, et al. Stronger generalization bounds for deep nets via a compression approach[C]//International Conference on Machine Learning. PMLR, 2018: 254- 263.
[16] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
[17] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolu- tional neural networks[J]. Advances in neural information processing systems, 2012, 25.
[19] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [J]. OpenAI blog, 2019, 1(8): 9.
[20] BELKIN M, HSU D, MA S, et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off[J]. Proceedings of the National Academy of Sciences, 2019, 116(32): 15849-15854.
[21] NAKKIRAN P, KAPLUN G, BANSAL Y, et al. Deep double descent: Where bigger models and more data hurt[J]. Journal of Statistical Mechanics: Theory and Experiment, 2021, 2021 (12): 124003.
[22] LAFON M, STUDENT D M, THOMAS A. Understanding the double descent phenomenon [Z]. 2022.
[23] LI Z, XIE C, WANG Q. Asymptotic Normality and Confidence Intervals for Prediction Risk of the Min-Norm Least Squares Estimator[C]//International Conference on Machine Learning. PMLR, 2021: 6533-6542.
[24] LIAO Z, COUILLET R, MAHONEY M W. A random matrix analysis of random fourier fea- tures: beyond the gaussian kernel, a precise phase transition, and the corresponding double descent[J]. Advances in Neural Information Processing Systems, 2020, 33: 13939-13950.
[25] MARTIN C H, MAHONEY M W. Traditional and heavy-tailed self regularization in neural network models[A]. 2019.
[26] MARTIN C H, MAHONEY M W. Heavy-tailed Universality predicts trends in test accuracies for very large pre-trained deep neural networks[C]//Proceedings of the 2020 SIAM International Conference on Data Mining. SIAM, 2020: 505-513.
[27] MARTIN C H, MAHONEY M W. Implicit self-regularization in deep neural networks: Evi- dence from random matrix theory and implications for learning[J]. Journal of Machine Learning Research, 2021, 22(165): 1-73.
[28] MARTIN C H, PENG T S, MAHONEY M W. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data[J]. Nature Communications, 2021, 12 (1): 1-13.
[29] MARTIN C H, MAHONEY M W. Post-mortem on a deep learning contest: a Simpson’s paradox and the complementary roles of scale metrics versus shape metrics[A]. 2021.
[30] MARČENKO V A, PASTUR L A. Distribution of eigenvalues for some sets of random matrices [J]. Mathematics of the USSR-Sbornik, 1967, 1(4): 457. 39
[31] DAVIS R A, PFAFFEL O, STELZER R. Limit theory for the largest eigenvalues of sample covariance matrices with heavy-tails[J]. Stochastic Processes and their Applications, 2014, 124 (1): 18-50.
[32] AUFFINGER A, BEN AROUS G, PÉCHÉ S. Poisson convergence for the largest eigenvalues of heavy tailed random matrices[C]//Annales de l’IHP Probabilités et statistiques: volume 45. 2009: 589-610.
[33] SOSHNIKOV A. Poisson statistics for the largest eigenvalues of Wigner random matrices with heavy tails[J]. Electronic Communications in Probability, 2004, 9: 82-91.
[34] DAVIS R A, MIKOSCH T, PFAFFEL O. Asymptotic theory for the sample covariance matrix of a heavy-tailed multivariate time series[J]. Stochastic Processes and their Applications, 2016, 126(3): 767-799.
[35] DAVIS R A, HEINY J, MIKOSCH T, et al. Extreme value analysis for the sample autocovariance matrices of heavy-tailed multivariate time series[J]. Extremes, 2016, 19(3): 517-547.
[36] BURDA Z, JURKIEWICZ J. Heavy-tailed random matrices[A]. 2009.
[37] BELINSCHI S, DEMBO A, GUIONNET A. Spectral measure of heavy tailed band and covari- ance random matrices[J]. Communications in Mathematical Physics, 2009, 289(3): 1023-1055.
[38] HEINY J, YAO J. Limiting distributions for eigenvalues of sample correlation matrices from heavy-tailed populations[A]. 2020.
[39] YIN Y Q, BAI Z D, KRISHNAIAH P R. On the limit of the largest eigenvalue of the large dimensional sample covariance matrix[J]. Probability theory and related fields, 1988, 78(4): 509-521.
[40] JOHNSTONE I M. On the distribution of the largest eigenvalue in principal components analysis [J]. The Annals of statistics, 2001, 29(2): 295-327.
[41] BAIK J, SILVERSTEIN J W. Eigenvalues of large sample covariance matrices of spiked pop- ulation models[J]. Journal of multivariate analysis, 2006, 97(6): 1382-1408.
[42] BAIK J, AROUS G B, PÉCHÉ S. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices[J]. The Annals of Probability, 2005, 33(5): 1643-1697.
[43] PAUL D. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model [J]. Statistica Sinica, 2007: 1617-1642.
[44] JOHNSTONE I M, LU A Y. On consistency and sparsity for principal components analysis in high dimensions[J]. Journal of the American Statistical Association, 2009, 104(486): 682-693.
[45] BOUCHAUD J P, MÉZARD M. Universality classes for extreme-value statistics[J]. Journal of Physics A: Mathematical and General, 1997, 30(23): 7997.
[46] MENG X, YAO J. Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping[A]. 2021.
[47] BLUM A L, RIVEST R L. Training a 3-node neural network is NP-complete[J]. Neural Net- works, 1992, 5(1): 117-127.
[48] AUER P, HERBSTER M, WARMUTH M K. Exponentially many local minima for single neurons[J]. Advances in neural information processing systems, 1995, 8. 40
[49] KESKAR N S, SOCHER R. Improving generalization performance by switching from adam to sgd[A]. 2017.
[50] ZHOU P, FENG J, MA C, et al. Towards theoretically understanding why sgd generalizes better than adam in deep learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 21285-21296.
[51] HODGKINSON L, MAHONEY M. Multiplicative noise and heavy tails in stochastic optimiza- tion[C]//International Conference on Machine Learning. PMLR, 2021: 4262-4274.
[52] SIMSEKLI U, SENER O, DELIGIANNIDIS G, et al. Hausdorff dimension, heavy tails, and generalization in neural networks[J]. Advances in Neural Information Processing Systems, 2020, 33: 5138-5151.
[53] BARSBEY M, SEFIDGARAN M, ERDOGDU M A, et al. Heavy tails in sgd and compress- ibility of overparametrized neural networks[J]. Advances in Neural Information Processing Systems, 2021, 34: 29364-29378.
[54] NEYSHABUR B, BHOJANAPALLI S, SREBRO N. A pac-bayesian approach to spectrally- normalized margin bounds for neural networks[A]. 2017.
[55] MANDT S, HOFFMAN M D, BLEI D M. Stochastic gradient descent as approximate bayesian inference[A]. 2017.
[56] WANG J, WANG C, LIN Q, et al. Adversarial attacks and defenses in deep learning for image recognition: A survey[J]. Neurocomputing, 2022.
[57] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks [A]. 2013.
[58] MAGNUS J R, NEUDECKER H. Matrix differential calculus with applications in statistics and econometrics[M]. John Wiley & Sons, 2019.
[59] RAGHUNATHAN A, STEINHARDT J, LIANG P. Certified defenses against adversarial ex- amples[A]. 2018.
[60] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[61] THAMM M, STAATS M, ROSENOW B. Random matrix analysis of deep neural network weight matrices[J]. Physical Review E, 2022, 106(5): 054124.
修改评论