(The special sessions are sorted alphabetically)

Artificial Intelligence in Music

Technologies have always played a crucial role in the evolution of music and sound, ranging from the creation of musical instruments to sound recording, and from radio broadcasting to the Walkman. Over 90% of videos on YouTube feature music. The connection between artificial intelligence (AI) and music was first mentioned together in an article as far back as 1969. Researchers have explored numerous technological advancements to enhance music composition, recommendation, and performance. In recent years, there has been a renewed interest in the application of AI to music, which has made significant strides in various aspects including algorithms, human-machine interfaces, recommendations, copyright detection, and the sociological perspectives on creativity.

In this session, we welcome three groups of researchers who are at the forefront of the field of Artificial Intelligence in Music. These experts will share insights from their research, the current advancements, and the state of the art in this dynamic area. Additionally, we will engage in a discussion about the future trends and directions that this field might take. This is a unique opportunity to understand the intersection of AI and music from the leaders who are shaping its future.

  • Dr. Yung-Hsiang Lu (yunglu@purdue.edu): Elmore Family School of Electrical and Computer Engineering, Purdue University, U.S.A.

Artificial Intelligence with Internet of Things (AIOT)

The Artificial Intelligence of Things (AIoT) combines artificial intelligence (AI) technology with the Internet of Things (IoT) infrastructure in order to develop more effective IoT operations, increase interactions between humans and machines, optimize data management and analytics. An essential aspect of AIoT is the application of AI to a specific thing. In its most basic form, this entails conducting AI on the device, also known as edge computing, with no external connections required. AIoT does not require an Internet; it is just a development of the IoT concept, and the resemblance ends there. The combined potential of AI and IoT promises to unleash untapped consumer value across a wide range of business verticals, including edge analytics, autonomous cars, customized fitness, remote medical care, precision cultivation, intelligent retail, automated upkeep, and manufacturing machinery automation.

Topics of interest include but are not limited to:

  • Artificial Intelligence for Automated Vehicles
  • Artificial Intelligence for Intelligent Traffic Monitoring
  • Artificial Intelligence for Manufacturing and Production
  • Artificial Intelligence for Smart Healthcare Systems
  • Artificial Intelligence for Smart Home Automation Systems
  • Artificial Intelligence for Smart Irrigation
  • Dr. Sujeet More (sujeetmore7@gmail.com): Department of Computer Engineering, Trinity College of Engineering and Research, Pune, India

Emerging Technologies in 3D Optical Imaging and Analysis

This special session is dedicated to exploring the forefront of 3D technology, focusing on the latest advancements in optical imaging, data processing, and its applications across various domains. As 3D technology rapidly evolves, it presents unique challenges and fresh opportunities in areas such as sensor innovation, calibration techniques, and data handling.

The session aims to showcase recent developments in 3D optical imaging and surface measurement technologies, delve into the challenges and solutions associated with 3D data processing and compression, and discuss the creation of dimensional standards and calibration artifacts. Additionally, we will explore the burgeoning applications of 3D technologies in enhancing machine and robotic vision systems and the pivotal role of 3D content in the immersive environments of virtual and augmented reality.

The main topics of interest include, but are not limited to:

  • 3D optical imaging
  • 3D data signal processing
  • 3D imaging system calibration
  • 3D data analysis / 3D data compression
  • 3D machine/robot vision methods, and other 3D applications
  • 3D contents for virtual reality and augmented reality
  • Dr. Jae-Sang Hyun (hyun.jaesang@yonsei.ac.kr): School of Mechanical Engineering, Yonsei University, Seoul, Korea

Exploring Generative AI Technologies in Multimedia Signal Processing

In recent years, the emerging of Generative Artificial Intelligence (AI) technologies has revolutionized the landscape of multimedia signal processing, which has gained significant attention and popularity in recent years due to their ability to capture complex patterns and generate realistic samples across various domains, including images, text, audio, and more. This special session aims to delve into the forefront of this transformative intersection, offering opportunities for researchers to present the development and latest advancements, challenges, and opportunities in leveraging Generative AI for multimedia signal processing, enlightening, and exchanging insights on these new technologies.

With the proliferation of Generative AI techniques such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and transformer-based architectures, the possibilities for multimedia content creation, compression, enhancement, and understanding have expanded exponentially. This special session will feature presentations on cutting-edge research spanning various aspects of multimedia signal processing empowered by AI technology, especially in terms of Generative AI. Generative models have the potential to revolutionize many aspects of artificial intelligence and creativity. As research in this field continues to advance, we can expect to see even more exciting developments and applications of generative models in the future.

We invite researchers, academics, industry professionals, and students to contribute to this topic, sharing their expertise, insights, and innovative technologies that shape the future of multimedia signal processing in the era of Generative AI. Topics of interest include, but are not limited to:

  • Visual signal compression with generative artificial intelligence technology
  • Generative visual content compression and processing
  • Human/machine-centric applications incorporating Generative AI in multimedia experiences
  • Generative AI applications in content creation and manipulation
  • Cross-modal generation: integrating vision, audio, and text.
  • Meng Wang (mwang98-c@my.cityu.edu.hk): Department of Computer Science, City University of Hong Kong
  • Junru Li (lijunru@bytedance.com): ByteDance Inc.
  • Li Zhang (lizhang.idm@bytedance.com): ByteDance Inc.

Latent Space Metrics in AI to Improve Multi-Object Detection (MOD), Tracking (MOT), Re-ID and 3D Segmentation Tasks

Multi-object detection, multi-object tracking object Re-ID and segmentation are common tasks in multimedia video. The performance of models to find, track, and re-ID objects within a camera feed and across camera feeds are key to many applications such as: surveillance, anomaly detection, motion prediction, 3D medical image guided surgery, 3D segmentation, 2D to 3D view synthesis (NeRF), video to speech, etc. Current state of the art AI backbone models perform very well on benchmark data, but still fall short on real-world data. This session will focus on the use of metrics developed in latent/feature space of AI models which then informs the selection of data to test or train these models. The metrics and techniques discussed in the session are used to improve multiple object detection (MOD), multiple object tracking (MOD), matching (Re-ID), and 3D segmentation tasks. Generative data from Diffusion Models and Variational Auto Encoders (VAEs) is explored for the purpose of creating samples from the latent space to supplement and augment the real-world training data with the goal of improving the performance, robustness, and resilience to adversarial attack.

  • Dr. Lauren Christopher (lachrist@purdue.edu): Department of Electrical and Computer Engineering, IUPUI, Indianapolis, IN, U.S.A.
  • Dr. Paul Salama (psalama@purdue.edu): Department of Electrical and Computer Engineering, IUPUI, Indianapolis, IN, U.S.A.

Reproducible Neural Visual Coding

Homepage >>

In recent years, we have witnessed the exponential growth of research and development explorations on learning-based visual coding. These learned coding approaches, regardless of their focus on image, video, or 3D point cloud, have demonstrated remarkable improvement in coding efficiency compared to traditional solutions developed for decades.

Although international standard organizations such as JPEG, MPEG, etc., have devoted efforts to promote learning-based visual coding techniques, they are often criticized for the lack of reproducibility. Reproducibility concerns the complexity and generalization of the underlying coding model, which is vital for faithfully evaluating the performance of these methods and ensuring the adoption in practical applications. The complexity herein includes computational complexity and memory (space) consumption in both training and inference. The generalization ensures the applicability of the trained model in various data domains, even for unseen data.

This special session seeks original contributions reporting and discussing the reproducibility of recently emerged neural visual coding solutions. It targets a mixed audience of researchers and product developers from several communities, i.e., multimedia coding, machine learning, computer vision, etc. The topics of interest include, but are not limited to:

  • Efficient Neural visual coding for image, video, 3D point cloud, etc.
  • Model complexity analysis of neural visual coding;
  • Model generalization studies of neural visual coding;
  • Standardization activity overview and relevant techniques summarization
  • Technical alignment of training and testing, e.g., dataset, procedural steps, etc., for fair comparison
  • Dr. Zhan Ma (mazhan@nju.edu.cn): Electronic Science and Engineering School of Nanjing University, China
  • Dr. Dong Tian (Dong.tian@interdigital.com): InterDigital, New York, NY, U.S.A.