中文版 | English
题名

基于多视点视频自适应网络传输机制研究

其他题名
A STUDY OF MULTI-VIEW VIDEO BASED ADAPTIVE NETWORK TRANSMISSION MECHANISM
姓名
姓名拼音
WANG Luna
学号
12032191
学位类型
硕士
学位专业
080902 电路与系统
学科门类/专业学位类别
08 工学
导师
周建二
导师单位
未来网络研究院
论文答辩日期
2023-05-12
论文提交日期
2023-06-27
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
目前,360 度视频依然是沉浸式视频应用的主流业务内容,但仅能为用户提供 基于固定位置的俯仰(pitch)、偏航(yaw)、翻滚(roll)三个旋转自由度(Degree of Freedom, DoF)的观看角度,用户体验及沉浸感较为有限。随着多视点视频的兴 起,其将提供更多自由度的体验,在三个旋转自由度的基础上增加了前后、左右、 上下三个方向的平移自由度,即六自由(6DoF),使得用户可在一个 3D 虚拟空间中任意移动位置,并获得更强烈的沉浸感体验。然而,这种身临其境的三维内容 交互意味着,相较传统二维 360 度视频有着更复杂的内容生成、更大的数据存储、 更低的传输延迟和更高的渲染计算能力需求。
在此背景下,本文通过大量收集、阅读及整理 6DoF 视频的视图合成、渲染等同行研究成果,并发掘基于多视点视频的网络自适应传输方案等相关文献及著作, 探索未来 6DoF 视频网络自适应传输技术框架和技术路线。通过大量调研、总结和 实验验证,本文提出了基于多个 Multiple-Sphere-Images (MSI) 的表示形式的 6DoF 视频网络自适应传输方案,以实现三维空间任意位置改变下的三维内容有效编码、 传输以及高效率多视图渲染重建。鉴于多视点视频的运动视差更容易导致用户眩 晕和恶心,我们提出了一个基于传统视频用户体验质量 (Quality of Experience, QoE) 评价模型的 QoE 模型,通过考虑相邻帧的光流损失来评估晕动症(VR Sickness)的 程度。此外,本文还提出了基于云--端通信结构的模型预测控制 (Model Predictive Control, MPC) 传输算法,通过历史周期时间内的网络吞吐量、用户浏览轨迹预测 来调节 6DoF 视频的码率,并同步优化所提出的 QoE 模型。
关键词
语种
中文
培养类别
独立培养
入学年份
2020
学位授予年份
2023-07
参考文献列表

[1] MPEG-I Website[EB/OL]. https://mpeg.chiariglione.org/standards/mpeg-i.
[2] AVS Website[EB/OL]. https://standards.ieee.org/standard/1857_9-2021.html.
[3] CHEN S E. View Interporation for Image Synthesis[J]. Proc Siggraph, 1993.
[4] SERRANO A, KIM I, CHEN Z, et al. Motion parallax for 360 RGBD video[J]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1817-1827.
[5] GORTLER S J, GRZESZCZUK R, SZELISKI R, et al. The lumigraph[C]//Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 1996: 43-54.
[6] KARA P A, CSERKASZKY A, MARTINI M G, et al. Evaluation of the concept of dynamic adaptive streaming of light field video[J]. IEEE Transactions on Broadcasting, 2018, 64(2): 407-421.
[7] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis[C]//European conference on computer vision. Springer, 2020: 405-421.
[8] HOSSEINI M, TIMMERER C. Dynamic adaptive point cloud streaming[C]//Proceedings of the 23rd Packet Video Workshop. 2018: 25-30.
[9] QIAN F, HAN B, PAIR J, et al. Toward practical volumetric video streaming on commodity smartphones[C]//Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications. 2019: 135-140.
[10] BATTLE L, CHANG R, STONEBRAKER M. Dynamic prefetching of data tiles for interactive visualization[C]//Proceedings of the 2016 International Conference on Management of Data. 2016: 1363-1375.
[11] MADARASINGHA C, THILAKARATHNA K. VASTile: Viewport Adaptive Scalable 360-Degree Video Frame Tiling[C]//Proceedings of the 29th ACM International Conference on Mul￾timedia. 2021: 4555-4563.
[12] OZCINAR C, CABRERA J, SMOLIC A. Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 217-230.
[13] ZAMPOGLOU M, KAPETANAKIS K, STAMOULIAS A, et al. Adaptive streaming of com￾plex Web 3D scenes based on the MPEG-DASH standard[J]. Multimedia Tools and Applications, 2018, 77(1): 125-148.
[14] PARK J, CHOU P A, HWANG J N. Rate-utility optimized streaming of volumetric media for augmented reality[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 149-162.
[15] YU A, LI R, TANCIK M, et al. Plenoctrees for real-time rendering of neural radiance fields[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5752-5761.
[16] TAKIKAWA T, LITALIEN J, YIN K, et al. Neural geometric level of detail: Real-time rendering with implicit 3D shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11358-11367.
[17] CHEN Z, ZHANG Y, GENOVA K, et al. Multiresolution deep implicit functions for 3d shape representation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 13087-13096.
[18] BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5855-5864.
[19] LINDELL D B, VAN VEEN D, PARK J J, et al. Bacon: Band-limited Coordinate Networks for Multiscale Scene Representation[C/OL]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022: 16231-16241. DOI: 10.1109/CVPR52688.2022.01577.
[20] GÜL S, PODBORSKI D, SON J, et al. Cloud rendering-based volumetric video streaming system for mixed reality services[C]//Proceedings of the 11th ACM multimedia systems con￾ference. 2020: 357-360.
[21] CHEN S, DUINKHARJAV B, SUN X, et al. Instant Reality: Gaze-Contingent Perceptual Op￾timization for 3D Virtual Reality Streaming[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(5): 2157-2167. DOI: 10.1109/TVCG.2022.3150522.
[22] YIN X, JINDAL A, SEKAR V, et al. A control-theoretic approach for dynamic adaptive video streaming over HTTP[C]//Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015: 325-338.
[23] ISHIGURO H, YAMAMOTO M, TSUJI S. Omni-directional stereo for making global map [C]//Proceedings Third International Conference on Computer Vision. IEEE Computer Society, 1990: 540-541.
[24] PELEG S, BEN-EZRA M, PRITCH Y. Omnistereo: Panoramic stereo imaging[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3): 279-290.
[25] RICHARDT C, PRITCH Y, ZIMMER H, et al. Megastereo: Constructing high-resolution stereo panoramas[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 1256-1263.
[26] ANDERSON R, GALLUP D, BARRON J T, et al. Jump: virtual reality video[J]. ACM Trans￾actions on Graphics (TOG), 2016, 35(6): 1-13.
[27] SHUM H Y, HE L W. Rendering with concentric mosaics[C]//Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 1999: 299-306.
[28] SCHONBERGER J L, FRAHM J M. Structure-from-Motion Revisited[C]//IEEE Conference on Computer Vision & Pattern Recognition. 2016: 4104-4113.
[29] GOESELE M, CURLESS B, SEITZ S. Multi-View Stereo Revisited[C/OL]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06): volume 2. 2006: 2402-2409. DOI: 10.1109/CVPR.2006.199.
[30] OVERBECK R S, ERICKSON D, EVANGELAKOS D, et al. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality[J]. ACM Transactions on Graphics (TOG), 2018, 37(6): 1-15.
[31] BROXTON M, FLYNN J, OVERBECK R, et al. Immersive light field video with a layered mesh representation[J]. ACM Transactions on Graphics (TOG), 2020, 39(4): 86-1.
[32] FLYNN J, BROXTON M, DEBEVEC P, et al. Deepview: View synthesis with learned gra￾dient descent[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2367-2376.
[33] ZHOU T, TUCKER R, FLYNN J, et al. Stereo magnification: Learning view synthesis using multiplane images[A]. 2018.
[34] LIANG M, HU X. Recurrent convolutional neural network for object recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3367-3375.
[35] WU Y, BOOMINATHAN V, CHEN H, et al. Phasecam3d—learning phase masks for pas￾sive single view depth estimation[C]//2019 IEEE International Conference on Computational Photography (ICCP). IEEE, 2019: 1-12.
[36] LIU Y, CAO X, DAI Q, et al. Continuous depth estimation for multi-view stereo[C/OL]//2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 2121-2128. DOI: 10.1109/CVPR.2009.5206712.
[37] GUO K, XU F, YU T, et al. Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera[J/OL]. ACM Trans. Graph., 2017, 36(3). https://doi.org/10.1145/3083722.
[38] THATTE J, BOIN J B, LAKSHMAN H, et al. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax[C/OL]//2016 IEEE International Conference onMultimedia and Expo (ICME). 2016: 1-6. DOI: 10.1109/ICME.2016.7552858.
[39] BERTEL T, CAMPBELL N D F, RICHARDT C. MegaParallax: Casual 360° Panoramas with Motion Parallax[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1828-1835. DOI: 10.1109/TVCG.2019.2898799.
[40] ZITNICK C L, KANG S B, UYTTENDAELE M, et al. High-quality video view interpolationusing a layered representation[J]. ACM transactions on graphics (TOG), 2004, 23(3): 600-608.
[41] PENNER E, ZHANG L. Soft 3D reconstruction for view synthesis[J]. ACM Transactions on Graphics (TOG), 2017, 36(6): 1-11.
[42] ATTAL B, LING S, GOKASLAN A, et al. MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images[C]//European Conference on Computer Vision. Springer, 2020: 441-459.
[43] SRINIVASAN P P, TUCKER R, BARRON J T, et al. Pushing the boundaries of view extrapola￾tion with multiplane images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 175-184.
[44] MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: Prac￾tical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics (TOG), 2019, 38(4): 1-14.
[45] JANUS S. Intel DevMesh[EB/OL]. 2020
[2022-11]. https://devmesh.intel.com/projects/immersive-media.
[46] SODAGAR I. The mpeg-dash standard for multimedia streaming over the internet[J]. IEEE multimedia, 2011, 18(4): 62-67.
[47] XU Z, ZHANG X, ZHANG K, et al. Probabilistic viewport adaptive streaming for 360-degree videos[C]//2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2018:1-5.
[48] BAN Y, XIE L, XU Z, et al. An optimal spatial-temporal smoothness approach for tile-based 360-degree video streaming[C]//2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017: 1-4.
[49] WIJNANTS M, LIEVENS H, MICHIELS N, et al. Standards-Compliant HTTP Adaptive Streaming of Static Light Fields[C/OL]//VRST ’18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. New York, NY, USA: Association for Computing Machinery, 2018. https://doi.org/10.1145/3281505.3281539.
[50] JEONG J B, LEE S, RYU I W, et al. Towards viewport-dependent 6dof 360 video tiled streaming for virtual reality systems[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 3687-3695.
[51] PARK J, CHOU P A, HWANG J N. Volumetric Media Streaming for Augmented Reality [C/OL]//2018 IEEE Global Communications Conference (GLOBECOM). 2018: 1-6. DOI:10.1109/GLOCOM.2018.8647537.
[52] MARTEL J N, LINDELL D B, LIN C Z, et al. Acorn: Adaptive coordinate networks for neural scene representation[A]. 2021.
[53] HERTZ A, PEREL O, GIRYES R, et al. SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization[C/OL]//RANZATO M, BEYGELZIMER A, DAUPHIN Y, et al. Advances in Neural Information Processing Systems: volume 34. Curran Associates, Inc., 2021: 8820-8832. https://proceedings.neurips.cc/paper/2021/file/4a06d868d044c50af0cf9bc82d2fc19f-Paper.pdf.
[54] QIAN F, JI L, HAN B, et al. Optimizing 360 video delivery over cellular networks[C]// Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Chal￾lenges. 2016: 1-6.
[55] XIE L, XU Z, BAN Y, et al. 360probdash: Improving qoe of 360 video streaming using tile￾based http adaptive streaming[C]//Proceedings of the 25th ACM international conference on Multimedia. 2017: 315-323.
[56] ZHANG B, ZHAO J, YANG S, et al. Subjective and objective quality assessment of panoramic videos in virtual reality environments[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2017: 163-168.
[57] XU M, LI C, CHEN Z, et al. Assessing visual quality of omnidirectional videos[J]. IEEE transactions on circuits and systems for video technology, 2018, 29(12): 3516-3530.
[58] FEI Z, WANG F, WANG J, et al. Qoe evaluation methods for 360-degree vr video transmission [J]. IEEE Journal of Selected Topics in Signal Processing, 2019, 14(1): 78-88.
[59] YU M, LAKSHMAN H, GIROD B. A framework to evaluate omnidirectional video coding schemes[C]//2015 IEEE international symposium on mixed and augmented reality. IEEE, 2015:31-36.
[60] SUN Y, LU A, YU L. Weighted-to-spherically-uniform quality evaluation for omnidirectional video[J]. IEEE signal processing letters, 2017, 24(9): 1408-1412.
[61] ZHOU Y, YU M, MA H, et al. Weighted-to-spherically-uniform SSIM objective quality evaluation for panoramic video[C]//2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 2018: 54-57.
[62] KETTUNEN M, HÄRKÖNEN E, LEHTINEN J. E-LPIPS: robust perceptual image similarity via random transformation ensembles[A]. 2019.
[63] JEONG D K, YOO S, JANG Y. VR Sickness Measurement with EEG Using DNN Algorithm [C/OL]//VRST ’18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. New York, NY, USA: Association for Computing Machinery, 2018. https: //doi.org/10.1145/3281505.3283387.
[64] KHAITAMI, WIBAWA A D, MARDI S, et al. EEG Visualization for Cybersickness Detection During Playing 3D Video Games[C/OL]//2019 International Seminar on Intelligent Technology and Its Applications (ISITIA). 2019: 325-330. DOI: 10.1109/ISITIA.2019.8937083.
[65] KENNEDY R S, LANE N E, KEVIN S, et al. The International Journal of Aviation Psychology Simulator Sickness Questionnaire : An Enhanced Method for Quantifying Simulator Sickness [Z]. 1993.
[66] REASON J. Motion sickness: Some theoretical and practical considerations[J]. Applied ergonomics, 1978, 9(3): 163-167.
[67] KIM H G, LIM H T, LEE S, et al. VRSA Net: VR Sickness Assessment Considering Exceptional Motion for 360° VR Video[J/OL]. IEEE Transactions on Image Processing, 2019, 28(4): 1646-1660. DOI: 10.1109/TIP.2018.2880509.
[68] LEE T M, YOON J C, LEE I K. Motion Sickness Prediction in Stereoscopic Videos using 3D Convolutional Neural Networks[J/OL]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(5): 1919-1927. DOI: 10.1109/TVCG.2019.2899186.
[69] KIM H G, BADDAR W J, LIM H T, et al. Measurement of Exceptional Motion in VR Video Contents for VR Sickness Assessment Using Deep Convolutional Autoencoder[C/OL]//VRST' 17: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology.New York, NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/3139131.3139137.
[70] LEE S, KIM S, KIM H G, et al. Physiological Fusion Net: Quantifying Individual VR Sicknesswith Content Stimulus and Physiological Response[C/OL]//2019 IEEE International Conference on Image Processing (ICIP). 2019: 440-444. DOI: 10.1109/ICIP.2019.8802983.
[71] LEE J Y, HAN P H, TSAI L, et al. Estimating the Simulator Sickness in Immersive VirtualReality with Optical Flow Analysis[C/OL]//SA ’17: SIGGRAPH Asia 2017 Posters. New York,NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/3145690.3145697.
[72] STRAUB J, WHELAN T, MA L, et al. The Replica Dataset: A Digital Replica of Indoor Spaces[A]. 2019.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP37
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544141
专题未来网络研究院
推荐引用方式
GB/T 7714
王璐娜. 基于多视点视频自适应网络传输机制研究[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12032191-王璐娜-未来网络研究院(3967KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[王璐娜]的文章
百度学术
百度学术中相似的文章
[王璐娜]的文章
必应学术
必应学术中相似的文章
[王璐娜]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。