南方科技大学知识苑(SUSTech KC): 基于多视点视频自适应网络传输机制研究

题名	基于多视点视频自适应网络传输机制研究
其他题名	A STUDY OF MULTI-VIEW VIDEO BASED ADAPTIVE NETWORK TRANSMISSION MECHANISM
姓名	王璐娜
姓名拼音	WANG Luna
学号	12032191
学位类型	硕士
学位专业	080902 电路与系统
学科门类/专业学位类别	08 工学
导师	周建二
导师单位	未来网络研究院
论文答辩日期	2023-05-12
论文提交日期	2023-06-27
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	目前，360 度视频依然是沉浸式视频应用的主流业务内容，但仅能为用户提供基于固定位置的俯仰（pitch）、偏航（yaw）、翻滚（roll）三个旋转自由度（Degree of Freedom, DoF）的观看角度，用户体验及沉浸感较为有限。随着多视点视频的兴起，其将提供更多自由度的体验，在三个旋转自由度的基础上增加了前后、左右、上下三个方向的平移自由度，即六自由（6DoF），使得用户可在一个 3D 虚拟空间中任意移动位置，并获得更强烈的沉浸感体验。然而，这种身临其境的三维内容交互意味着，相较传统二维 360 度视频有着更复杂的内容生成、更大的数据存储、更低的传输延迟和更高的渲染计算能力需求。在此背景下，本文通过大量收集、阅读及整理 6DoF 视频的视图合成、渲染等同行研究成果，并发掘基于多视点视频的网络自适应传输方案等相关文献及著作，探索未来 6DoF 视频网络自适应传输技术框架和技术路线。通过大量调研、总结和实验验证，本文提出了基于多个 Multiple-Sphere-Images (MSI) 的表示形式的 6DoF 视频网络自适应传输方案，以实现三维空间任意位置改变下的三维内容有效编码、传输以及高效率多视图渲染重建。鉴于多视点视频的运动视差更容易导致用户眩晕和恶心，我们提出了一个基于传统视频用户体验质量 (Quality of Experience, QoE) 评价模型的 QoE 模型，通过考虑相邻帧的光流损失来评估晕动症（VR Sickness）的程度。此外，本文还提出了基于云-边-端通信结构的模型预测控制 (Model Predictive Control, MPC) 传输算法，通过历史周期时间内的网络吞吐量、用户浏览轨迹预测来调节 6DoF 视频的码率，并同步优化所提出的 QoE 模型。
关键词	多视点视频多球面图像六自由度沉浸式自适应传输
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-07
参考文献列表	[1] MPEG-I Website[EB/OL]. https://mpeg.chiariglione.org/standards/mpeg-i. [2] AVS Website[EB/OL]. https://standards.ieee.org/standard/1857_9-2021.html. [3] CHEN S E. View Interporation for Image Synthesis[J]. Proc Siggraph, 1993. [4] SERRANO A, KIM I, CHEN Z, et al. Motion parallax for 360 RGBD video[J]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1817-1827. [5] GORTLER S J, GRZESZCZUK R, SZELISKI R, et al. The lumigraph[C]//Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 1996: 43-54. [6] KARA P A, CSERKASZKY A, MARTINI M G, et al. Evaluation of the concept of dynamic adaptive streaming of light field video[J]. IEEE Transactions on Broadcasting, 2018, 64(2): 407-421. [7] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis[C]//European conference on computer vision. Springer, 2020: 405-421. [8] HOSSEINI M, TIMMERER C. Dynamic adaptive point cloud streaming[C]//Proceedings of the 23rd Packet Video Workshop. 2018: 25-30. [9] QIAN F, HAN B, PAIR J, et al. Toward practical volumetric video streaming on commodity smartphones[C]//Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications. 2019: 135-140. [10] BATTLE L, CHANG R, STONEBRAKER M. Dynamic prefetching of data tiles for interactive visualization[C]//Proceedings of the 2016 International Conference on Management of Data. 2016: 1363-1375. [11] MADARASINGHA C, THILAKARATHNA K. VASTile: Viewport Adaptive Scalable 360-Degree Video Frame Tiling[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 4555-4563. [12] OZCINAR C, CABRERA J, SMOLIC A. Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 217-230. [13] ZAMPOGLOU M, KAPETANAKIS K, STAMOULIAS A, et al. Adaptive streaming of complex Web 3D scenes based on the MPEG-DASH standard[J]. Multimedia Tools and Applications, 2018, 77(1): 125-148. [14] PARK J, CHOU P A, HWANG J N. Rate-utility optimized streaming of volumetric media for augmented reality[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 149-162. [15] YU A, LI R, TANCIK M, et al. Plenoctrees for real-time rendering of neural radiance fields[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5752-5761. [16] TAKIKAWA T, LITALIEN J, YIN K, et al. Neural geometric level of detail: Real-time rendering with implicit 3D shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11358-11367. [17] CHEN Z, ZHANG Y, GENOVA K, et al. Multiresolution deep implicit functions for 3d shape representation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 13087-13096. [18] BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5855-5864. [19] LINDELL D B, VAN VEEN D, PARK J J, et al. Bacon: Band-limited Coordinate Networks for Multiscale Scene Representation[C/OL]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022: 16231-16241. DOI: 10.1109/CVPR52688.2022.01577. [20] GÜL S, PODBORSKI D, SON J, et al. Cloud rendering-based volumetric video streaming system for mixed reality services[C]//Proceedings of the 11th ACM multimedia systems conference. 2020: 357-360. [21] CHEN S, DUINKHARJAV B, SUN X, et al. Instant Reality: Gaze-Contingent Perceptual Optimization for 3D Virtual Reality Streaming[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(5): 2157-2167. DOI: 10.1109/TVCG.2022.3150522. [22] YIN X, JINDAL A, SEKAR V, et al. A control-theoretic approach for dynamic adaptive video streaming over HTTP[C]//Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015: 325-338. [23] ISHIGURO H, YAMAMOTO M, TSUJI S. Omni-directional stereo for making global map [C]//Proceedings Third International Conference on Computer Vision. IEEE Computer Society, 1990: 540-541. [24] PELEG S, BEN-EZRA M, PRITCH Y. Omnistereo: Panoramic stereo imaging[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3): 279-290. [25] RICHARDT C, PRITCH Y, ZIMMER H, et al. Megastereo: Constructing high-resolution stereo panoramas[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 1256-1263. [26] ANDERSON R, GALLUP D, BARRON J T, et al. Jump: virtual reality video[J]. ACM Transactions on Graphics (TOG), 2016, 35(6): 1-13. [27] SHUM H Y, HE L W. Rendering with concentric mosaics[C]//Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 1999: 299-306. [28] SCHONBERGER J L, FRAHM J M. Structure-from-Motion Revisited[C]//IEEE Conference on Computer Vision & Pattern Recognition. 2016: 4104-4113. [29] GOESELE M, CURLESS B, SEITZ S. Multi-View Stereo Revisited[C/OL]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06): volume 2. 2006: 2402-2409. DOI: 10.1109/CVPR.2006.199. [30] OVERBECK R S, ERICKSON D, EVANGELAKOS D, et al. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality[J]. ACM Transactions on Graphics (TOG), 2018, 37(6): 1-15. [31] BROXTON M, FLYNN J, OVERBECK R, et al. Immersive light field video with a layered mesh representation[J]. ACM Transactions on Graphics (TOG), 2020, 39(4): 86-1. [32] FLYNN J, BROXTON M, DEBEVEC P, et al. Deepview: View synthesis with learned gradient descent[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2367-2376. [33] ZHOU T, TUCKER R, FLYNN J, et al. Stereo magnification: Learning view synthesis using multiplane images[A]. 2018. [34] LIANG M, HU X. Recurrent convolutional neural network for object recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3367-3375. [35] WU Y, BOOMINATHAN V, CHEN H, et al. Phasecam3d—learning phase masks for passive single view depth estimation[C]//2019 IEEE International Conference on Computational Photography (ICCP). IEEE, 2019: 1-12. [36] LIU Y, CAO X, DAI Q, et al. Continuous depth estimation for multi-view stereo[C/OL]//2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 2121-2128. DOI: 10.1109/CVPR.2009.5206712. [37] GUO K, XU F, YU T, et al. Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera[J/OL]. ACM Trans. Graph., 2017, 36(3). https://doi.org/10.1145/3083722. [38] THATTE J, BOIN J B, LAKSHMAN H, et al. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax[C/OL]//2016 IEEE International Conference onMultimedia and Expo (ICME). 2016: 1-6. DOI: 10.1109/ICME.2016.7552858. [39] BERTEL T, CAMPBELL N D F, RICHARDT C. MegaParallax: Casual 360° Panoramas with Motion Parallax[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1828-1835. DOI: 10.1109/TVCG.2019.2898799. [40] ZITNICK C L, KANG S B, UYTTENDAELE M, et al. High-quality video view interpolationusing a layered representation[J]. ACM transactions on graphics (TOG), 2004, 23(3): 600-608. [41] PENNER E, ZHANG L. Soft 3D reconstruction for view synthesis[J]. ACM Transactions on Graphics (TOG), 2017, 36(6): 1-11. [42] ATTAL B, LING S, GOKASLAN A, et al. MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images[C]//European Conference on Computer Vision. Springer, 2020: 441-459. [43] SRINIVASAN P P, TUCKER R, BARRON J T, et al. Pushing the boundaries of view extrapolation with multiplane images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 175-184. [44] MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics (TOG), 2019, 38(4): 1-14. [45] JANUS S. Intel DevMesh[EB/OL]. 2020 [2022-11]. https://devmesh.intel.com/projects/immersive-media. [46] SODAGAR I. The mpeg-dash standard for multimedia streaming over the internet[J]. IEEE multimedia, 2011, 18(4): 62-67. [47] XU Z, ZHANG X, ZHANG K, et al. Probabilistic viewport adaptive streaming for 360-degree videos[C]//2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2018:1-5. [48] BAN Y, XIE L, XU Z, et al. An optimal spatial-temporal smoothness approach for tile-based 360-degree video streaming[C]//2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017: 1-4. [49] WIJNANTS M, LIEVENS H, MICHIELS N, et al. Standards-Compliant HTTP Adaptive Streaming of Static Light Fields[C/OL]//VRST ’18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. New York, NY, USA: Association for Computing Machinery, 2018. https://doi.org/10.1145/3281505.3281539. [50] JEONG J B, LEE S, RYU I W, et al. Towards viewport-dependent 6dof 360 video tiled streaming for virtual reality systems[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 3687-3695. [51] PARK J, CHOU P A, HWANG J N. Volumetric Media Streaming for Augmented Reality [C/OL]//2018 IEEE Global Communications Conference (GLOBECOM). 2018: 1-6. DOI:10.1109/GLOCOM.2018.8647537. [52] MARTEL J N, LINDELL D B, LIN C Z, et al. Acorn: Adaptive coordinate networks for neural scene representation[A]. 2021. [53] HERTZ A, PEREL O, GIRYES R, et al. SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization[C/OL]//RANZATO M, BEYGELZIMER A, DAUPHIN Y, et al. Advances in Neural Information Processing Systems: volume 34. Curran Associates, Inc., 2021: 8820-8832. https://proceedings.neurips.cc/paper/2021/file/4a06d868d044c50af0cf9bc82d2fc19f-Paper.pdf. [54] QIAN F, JI L, HAN B, et al. Optimizing 360 video delivery over cellular networks[C]// Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges. 2016: 1-6. [55] XIE L, XU Z, BAN Y, et al. 360probdash: Improving qoe of 360 video streaming using tilebased http adaptive streaming[C]//Proceedings of the 25th ACM international conference on Multimedia. 2017: 315-323. [56] ZHANG B, ZHAO J, YANG S, et al. Subjective and objective quality assessment of panoramic videos in virtual reality environments[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2017: 163-168. [57] XU M, LI C, CHEN Z, et al. Assessing visual quality of omnidirectional videos[J]. IEEE transactions on circuits and systems for video technology, 2018, 29(12): 3516-3530. [58] FEI Z, WANG F, WANG J, et al. Qoe evaluation methods for 360-degree vr video transmission [J]. IEEE Journal of Selected Topics in Signal Processing, 2019, 14(1): 78-88. [59] YU M, LAKSHMAN H, GIROD B. A framework to evaluate omnidirectional video coding schemes[C]//2015 IEEE international symposium on mixed and augmented reality. IEEE, 2015:31-36. [60] SUN Y, LU A, YU L. Weighted-to-spherically-uniform quality evaluation for omnidirectional video[J]. IEEE signal processing letters, 2017, 24(9): 1408-1412. [61] ZHOU Y, YU M, MA H, et al. Weighted-to-spherically-uniform SSIM objective quality evaluation for panoramic video[C]//2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 2018: 54-57. [62] KETTUNEN M, HÄRKÖNEN E, LEHTINEN J. E-LPIPS: robust perceptual image similarity via random transformation ensembles[A]. 2019. [63] JEONG D K, YOO S, JANG Y. VR Sickness Measurement with EEG Using DNN Algorithm [C/OL]//VRST ’18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. New York, NY, USA: Association for Computing Machinery, 2018. https: //doi.org/10.1145/3281505.3283387. [64] KHAITAMI, WIBAWA A D, MARDI S, et al. EEG Visualization for Cybersickness Detection During Playing 3D Video Games[C/OL]//2019 International Seminar on Intelligent Technology and Its Applications (ISITIA). 2019: 325-330. DOI: 10.1109/ISITIA.2019.8937083. [65] KENNEDY R S, LANE N E, KEVIN S, et al. The International Journal of Aviation Psychology Simulator Sickness Questionnaire : An Enhanced Method for Quantifying Simulator Sickness [Z]. 1993. [66] REASON J. Motion sickness: Some theoretical and practical considerations[J]. Applied ergonomics, 1978, 9(3): 163-167. [67] KIM H G, LIM H T, LEE S, et al. VRSA Net: VR Sickness Assessment Considering Exceptional Motion for 360° VR Video[J/OL]. IEEE Transactions on Image Processing, 2019, 28(4): 1646-1660. DOI: 10.1109/TIP.2018.2880509. [68] LEE T M, YOON J C, LEE I K. Motion Sickness Prediction in Stereoscopic Videos using 3D Convolutional Neural Networks[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1919-1927. DOI: 10.1109/TVCG.2019.2899186. [69] KIM H G, BADDAR W J, LIM H T, et al. Measurement of Exceptional Motion in VR Video Contents for VR Sickness Assessment Using Deep Convolutional Autoencoder[C/OL]//VRST' 17: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology.New York, NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/3139131.3139137. [70] LEE S, KIM S, KIM H G, et al. Physiological Fusion Net: Quantifying Individual VR Sicknesswith Content Stimulus and Physiological Response[C/OL]//2019 IEEE International Conference on Image Processing (ICIP). 2019: 440-444. DOI: 10.1109/ICIP.2019.8802983. [71] LEE J Y, HAN P H, TSAI L, et al. Estimating the Simulator Sickness in Immersive VirtualReality with Optical Flow Analysis[C/OL]//SA ’17: SIGGRAPH Asia 2017 Posters. New York,NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/3145690.3145697. [72] STRAUB J, WHELAN T, MA L, et al. The Replica Dataset: A Digital Replica of Indoor Spaces[A]. 2019.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP37
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544141
专题	未来网络研究院
推荐引用方式 GB/T 7714	王璐娜. 基于多视点视频自适应网络传输机制研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032191-王璐娜-未来网络研究院（3967KB）	--	--	限制开放	--	请求全文