Chang Wen Chen
Chair Professor of Visual Computing, The Hong Kong Polytechnic University

Contemporary Visual Computing for 6G Semantic Communications

Edward J. Delp
The Charles William Harrison Distinguished Professor of Electrical and Computer Engineering, Purdue University

Deepfakes: How This Technology Will Change the World Forever

Zoe Liu
Co-Founder and CTO, Visionular Inc.

AI-Driven Compression: Technologies Innovating Visual Experiences

Contemporary Visual Computing for 6G Semantic Communications

Chang Wen Chen - Chair Professor of Visual Computing - The Hong Kong Polytechnic University

Abstract

This talk shall focus on contemporary visual computing research trends with critical implications for 6G semantic communications. Semantic communication was initially proposed by Weaver and Shannon 70+ years ago in the early 1950s in which they outlined the classical definition of three levels of communications: the technical problem, the semantic problem, and the effectiveness problem. Until 5G, most researchers and practitioners have been working on the first technical problem. For 6G, semantic communication becomes necessary to handle the overwhelming volume of visual data among all IP traffic. We firmly believe that a paradigm-shifting framework needs to be designed to transport the volumetric visual data under the 6G mobile communication architecture. We show that recent technical advances in contemporary visual computing bear great potential for 6G semantic communication. Among the volumetric visual data, a significant portion of them has been acquired for machine intelligence purposes. Therefore, structured extraction and representation of the semantics from these visual data are desired to facilitate the 6G semantic communication. For contemporary visual computing, the well-structured scene graph generation (SGG) approaches have been demonstrated capable of representing compactly the logical relationship among the subjects and objects detected from the visual data. We shall show that the unique capability of structured SGG can be applied to 6G semantic communication towards future advances in integrating visual computing with 6G.

Bio

Chang Wen Chen is currently Chair Professor of Visual Computing at The Hong Kong Polytechnic University. Before his current position, he served as Dean of the School of Science and Engineering at The Chinese University of Hong Kong, Shenzhen from 2017 to 2020, and concurrently as Deputy Director at Peng Cheng Laboratory from 2018 to 2021. Previously, he has been an Empire Innovation Professor at the State University of New York at Buffalo (SUNY) from 2008 to 2021 and the Allan Henry Endowed Chair Professor at the Florida Institute of Technology from 2003 to 2007. He received his BS degree from the University of Science and Technology of China in 1983, his MS degree from the University of Southern California in 1986, and his PhD degree from the University of Illinois at Urbana Champaign (UIUC) in 1992.

He has served as an Editor-in-Chief for IEEE Trans. Multimedia (2014-2016) and IEEE Trans. Circuits and Systems for Video Technology (2006-2009). He has received many professional achievement awards, including ten (10) Best Paper Awards in premier publication venues, the prestigious Alexander von Humboldt Award in 2010, the SUNY Chancellor’s Award for Excellence in Scholarship and Creative Activities in 2016, and UIUC ECE Distinguished Alumni Award in 2019. He is an IEEE Fellow (2005), a SPIE Fellow (2007), and a Member of the Academia Europaea (2021).

Deepfakes: How this Technology Will Change the World Forever

Edward J. Delp - Professor of Electrical and Computer Engineering - Purdue University

Abstract

In this talk, I will present an overview of the current state of generated and manipulated media, such as Deep Fakes, and describe how these methods work, where they are being used and how to detect them. I will also describe the history of manipulated media content and how we got where we are today. An outline of my talk is:

  • Historical Overview of Manipulated Media
  • The disappearing Russians
  • Hollywood and CGI
  • Cheapfakes and Deepfakes
  • Seeing is not believing
  • An overview of the technology
  • The threat of manipulated media
  • Where is this all going in 5, 10, 20 years?
  • Is help coming?

delp

Bio

Edward J. Delp was born in Cincinnati, Ohio. He received the B.S.E.E. (cum laude) and M.S. degrees from the University of Cincinnati, and the Ph.D. degree from Purdue University. In May 2002 he received an Honorary Doctor of Technology from the Tampere University of Technology in Tampere, Finland.
He is currently the Charles William Harrison Distinguished Professor of Electrical and Computer Engineering and Professor of Biomedical Engineering.
His research interests include machine learning, image and video compression, multimedia security, medical imaging, multimedia systems, communication and information theory.
Dr. Delp is a Fellow of the IEEE, a Fellow of the SPIE, a Fellow of the Society for Imaging Science and Technology (IS&T), and a Fellow of the American Institute of Medical and Biological Engineering. In 2004 he received the Technical Achievement Award from the IEEE Signal Processing Society for his work in image and video compression and multimedia security. In 2008 Dr. Delp received the Society Award from IEEE Signal Processing Society (SPS). This is the highest award given by SPS and it cited his work in multimedia security and image and video compression.

AI-Driven Compression: Technologies Innovating Visual Experiences

Zoe Liu, Co-Founder and CTO, Visionular Inc.

Abstract

Our talk aims to provide an in-depth look at how AI is revolutionizing video compression, focusing on industry developments, practical customer needs, and the tangible improvements in end-user experience driven by these technologies. We will cover several key areas as below:

  1. AI in Standards: We will review developments in video codec standards like MPEG ECM and AOM’s AVM, emphasizing machine learning advancements for BD-rate gains and challenges in complexity and hardware implementation.
  2. End-to-End Neural Network Compression: We will discuss cutting-edge research on new neural network architectures for video compression, focusing on industry responses and anticipated adoption.
  3. GenAI Video Compression: We will explore the challenges and opportunities of GenAI videos, like digital humans, which feature characteristics like very low noise, sharp edges, bright colors, and dynamic motion, necessitating different encoder settings and comparing them with other computer-generated content such as screen content, animations, and gaming.
  4. Video Quality Assessment: We will highlight collaborations with industry leaders, exploring methods for evaluating grainy videos and trends like high-density, low-power VMAF calculations on Nvidia GPUs.
  5. Playback Enhancement: We will discuss player-side synthesis work, including AV1’s Film Grain synthesis, leveraging decoder-side computational power in modern mobile devices to enhance playback and reduce power consumption.
  6. Ultra-Low Latency Live: We will share insights on low-delay live streaming using AV1 encoding, its optimization for low bandwidth and low complexity, and potential in real-time 3D applications and immersive video sharing. We will demonstrate that AV1 encoding can be optimized to remain competitive in compression efficiency while achieving very low complexity comparable to OpenH264.
  7. Joint CPU+GPU Optimization: In the industry, for VOD, CPU encoders are widely used for encoding the most-viewed videos, while GPUs are utilized for long-tail content. For live streaming, CPUs are preferred for live event streaming, whereas GPUs are deployed for 24/7 live channel operations. We will explore the optimization of existing standards using a combination of GPU and CPU, highlighting how GPUs enhance machine learning processing for more effective content-adaptive encoding while maintaining high throughput.
zoe-liu

Bio

Zoe Liu received her B.E. degree (Honors) and M.E. degree from Tsinghua University in Beijing in 1995 and 2000, respectively, and her Ph.D. degree from Purdue University, West Lafayette, Indiana, in 2004, all in electrical engineering. She was a Software Engineer with the Google WebM Team in Mountain View, California, where she was a key contributor to the royalty-free video codec standard AOM/AV1. With over 20 years of experience, Zoe has been dedicated to the design and development of innovative products in video codec and real-time video communications. She played a crucial role as a core engineer behind products such as Apple FaceTime, Apple Airplay, TangoMe Video Call, and Google Glass Video Call.

Currently, she is the Co-Founder and CTO of Visionular Inc. based in Los Altos, California, a startup providing cutting-edge video encoding, enhancement, and streaming solutions to enterprise customers globally. Visionular has launched 100+ customers worldwide and has nearly 100 full-time employees across the US, London, Bangalore, Hangzhou, and Beijing. Zoe was a speaker at Google I/O in 2018 and has published approximately 50 international conference papers and journal articles. Her main research interests include video compression, image processing, and machine learning.